More than 1 year has passed since last update.

新SkyWayのクイックスタートで物体検出を追加してみよう

Last updated at 2023-07-16Posted at 2023-07-11

はじめに

ひょんなことからWebRTCに興味が湧き、それをSkyWayで簡単に利用できるとのことなので触ってみましょう。
新SkyWayのクイックスタートで作成するサンプルアプリケーションにて、subscribeした相手の映像で物体検出をできるようにしてみます。

環境

OS: macOS Ventura 13.4.1
Node.js: v18.16.1
npm: 9.5.1
ml5.js: 0.12.2

新SkyWayのチュートリアルを進める

新SkyWay JavaScript SDKのクイックスタートを参考に、まずはサンプルアプリケーションを作りましょう。
環境構築は『NPMを利用する場合』を実施します。

物体検出の実装

クイックスタートが終わったら、ml5.jsを使って物体検出を実装してみます。

ml5.jsとは？

ml5.jsとは、機械学習をアーティストや開発者、学生などの幅広いユーザーに親しみやすいものにすることを目指したライブラリで、TensorFlow.jsがベースとなっています。
これを使うとブラウザ上で物体検出や画像分類が可能になります。

実装

ml5.jsをインストールします。
依存関係のWARNが出力されますが、このまま進めます。

npm i ml5

実装は以下のようになります。
新SkyWayのクイックスタートで実装したmain.jsを編集しています。
アプリケーションIDとシークレットキーは、各々のものを設定してください。

main.js

import { nowInSec, SkyWayAuthToken, SkyWayContext, SkyWayRoom, SkyWayStreamFactory, uuidV4 } from '@skyway-sdk/room';
import { objectDetector } from 'ml5';

const token = new SkyWayAuthToken({
  jti: uuidV4(),
  iat: nowInSec(),
  exp: nowInSec() + 60 * 60 * 24,
  scope: {
    app: {
      id: 'ここにアプリケーションIDをペーストしてください',
      turn: true,
      actions: ['read'],
      channels: [
        {
          id: '*',
          name: '*',
          actions: ['write'],
          members: [
            {
              id: '*',
              name: '*',
              actions: ['write'],
              publication: {
                actions: ['write'],
              },
              subscription: {
                actions: ['write'],
              },
            },
          ],
          sfuBots: [
            {
              actions: ['write'],
              forwardings: [
                {
                  actions: ['write'],
                },
              ],
            },
          ],
        },
      ],
    },
  },
}).encode('ここにシークレットキーをペーストしてください');

// 物体検出の実行
const runObjectDetection = async (video, canvas) => {
  const detector = await objectDetector('cocossd');
  const ctx = canvas.getContext('2d');

  const detectObjects = () => {
    detector.detect(video, (err, results) => {
      if (err) {
        console.error(err);
        return;
      }

      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;

      // 描画領域をクリア
      ctx.clearRect(0, 0, canvas.width, canvas.height);

      results.forEach(result => {
        // 物体の領域を描画
        ctx.beginPath();
        ctx.rect(result.x, result.y, result.width, result.height);
        ctx.lineWidth = 2;
        ctx.strokeStyle = 'red';
        ctx.fillStyle = 'red';
        ctx.stroke();
        ctx.fillText(
          `${result.label} (${Math.round(result.confidence * 100)}%)`,
          result.x,
          result.y > 20 ? result.y - 5 : 20
        );
      });

      // 再帰的に物体検出を実行
      detectObjects();
    });
  }

  detectObjects();
}


(async () => {
  const localVideo = document.getElementById('local-video');
  const buttonArea = document.getElementById('button-area');
  const remoteMediaArea = document.getElementById('remote-media-area');
  const roomNameInput = document.getElementById('room-name');

  const myId = document.getElementById('my-id');
  const joinButton = document.getElementById('join');

  const { audio, video } =
    await SkyWayStreamFactory.createMicrophoneAudioAndCameraStream();
  video.attach(localVideo);
  await localVideo.play();

  joinButton.onclick = async () => {
    if (roomNameInput.value === '') return;

    const context = await SkyWayContext.Create(token);
    const room = await SkyWayRoom.FindOrCreate(context, {
      type: 'p2p',
      name: roomNameInput.value,
    });
    const me = await room.join();

    myId.textContent = me.id;

    await me.publish(audio);
    await me.publish(video);

    const subscribeAndAttach = (publication) => {
      if (publication.publisher.id === me.id) return;

      const subscribeButton = document.createElement('button');
      subscribeButton.textContent = `${publication.publisher.id}: ${publication.contentType}`;
      buttonArea.appendChild(subscribeButton);

      subscribeButton.onclick = async () => {
        const { stream } = await me.subscribe(publication.id);

        let newMedia;
        let newCanvas;
        switch (stream.track.kind) {
          case 'video':
            newMedia = document.createElement('video');
            newCanvas = document.createElement('canvas');
            newMedia.playsInline = true;
            newMedia.autoplay = true;

            // ストリームのvideo要素が準備されてから物体検出を開始する
            newMedia.onloadedmetadata = () => {
              // canvasのサイズをビデオのサイズに合わせる
              newCanvas.width = newMedia.videoWidth;
              newCanvas.height = newMedia.videoHeight;
              runObjectDetection(newMedia, newCanvas);
            };

            // video要素とcanvas要素を同じdivで囲む
            const mediaDiv = document.createElement('div');
            mediaDiv.style.position = 'relative';
            mediaDiv.style.width = '100%';
            mediaDiv.style.height = '100%';
            mediaDiv.appendChild(newMedia);

            newCanvas.style.position = 'absolute';
            newCanvas.style.top = '0';
            newCanvas.style.left = '0';
            mediaDiv.appendChild(newCanvas);

            remoteMediaArea.appendChild(mediaDiv);
            break;
          case 'audio':
            newMedia = document.createElement('audio');
            newMedia.controls = true;
            newMedia.autoplay = true;
            break;
          default:
            return;
        }
        stream.attach(newMedia);
      };
    };

    room.publications.forEach(subscribeAndAttach);
    room.onStreamPublished.add((e) => subscribeAndAttach(e.publication));
  };
})();

動作確認

動作確認してみましょう。
以下のコマンドでローカルサーバーを起動し、ローカルサーバーのアドレスをブラウザで開いてアプリケーションを起動します。

npm run dev

相手のカメラ映像を表示させて、しばし待ってみるとちゃんと物体検出ができていることがわかります。
ただし、リアルタイムに物体検出させているので、ちょっとパフォーマンスに影響が出てるかもしれません。

おわりに

新SkyWayのクイックスタートがわかりやすかったので、思ったより詰まらず進めることができました。
検出した特定の物体についてconfidenceが90%以上なら〇〇させる、といったことも試してみると面白いかもしれません。

参考にさせていただいた記事

p5.js と ml5.js の組み合わせでブラウザ上でのリアルタイム物体検出を試す（COCO-SSD を利用）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up