More than 5 years have passed since last update.

WebRTCでPosenetが動くか試してみて、出来た。

Last updated at 2019-05-17Posted at 2019-05-16

スマフォカメラの映像からPosenetで姿勢判定出来るなら、WebRTCでも出来るのではないか、と思い、試してみました。
イメージとしては、ライブ配信されて来た映像に、姿勢推定ラインを描画させる、ということになります。何か用途があるのだろうか、と考える前に、やってみることにしました。

TensorflowJS Posenet
https://github.com/tensorflow/tfjs-models/tree/master/posenet

WebRTCプラットフォーム

NTTコミュニケーションズの「Skyway」を利用させて頂きました。Community Editionなら無料です。
アカウント登録を行い、ログイン。
「アプリケーション作成」から、「アプリケーション説明文」「利用可能ドメイン名」（https://は不要）を入力し、あとはデフォルトで「作成」し、APIキーを取得しておきます。

まずはWebRTCを動作させる

サンプルがありますので、APIキーをセットして、自分のサーバにアップすれば動作します。

サンプル
https://github.com/skyway/skyway-js-sdk/tree/master/examples/p2p-media
ドキュメント
https://webrtc.ecl.ntt.com/skyway-js-sdk-doc/ja/

・APIキーのセット

key.js

window.__SKYWAY_KEY__ = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx';

と別途JSを用意して、HTMLで取り込んでおきます。サンプルでは、このkey.jsのパスが「../_shared」となっているので、自分の好きなパスに書き換えて下さい。

<script type="text/javascript" src="./key.js"></script>

peerIDはランダムに付与されますが、好きなIDにすることも出来ます。例えば「sender」というIDで固定したい場合、以下のようになっているインスタンス部分を

script.js

const peer = new Peer({
    key: window.__SKYWAY_KEY__,
    debug: 3,
});

次のように変更します。

script.js

const peer = new Peer("sender",{
    key:   window.__SKYWAY_KEY__,
    debug: 3,
 );

peerIDはユニークが原則なので、他の人とビデオチャットするときは、別途にHTML・固有のpeerIDを持つJSを作成するか、他の言語を駆使してpeerIDをユニークに付与してあげる必要があります。ここでは、とりあえずスマフォで表示するsp.html・sp.js（peerID=sender）、PCで表示するpc.html・pc.js（peerID=receiver）の２つを用意しました。
これで、どちらからでもpeerIDを入力してCALLすることで、スマフォの映像と音声がPCに、PCの映像と音声がスマフォに表示されます。やってみるとわかりますが、4G回線でも快適に動作します。
（２つが近いとハウリングしますので注意）

iOS Safariへの対応

サンプルの状態では映像が出ないので、２つのVideoタグに「autoplay playsinline」の要素を追記します。

sp.html

<video id="js-remote-stream" autoplay playsinline></video>
・・・
<video id="js-local-stream" autoplay playsinline></video>

TonsorflowJSの組み込み

双方の映像が出たところで、スマフォからの映像ストリームをPCで再生する時に、姿勢推定を噛ませてみます。
まずは、tensorflowJSのCDNを最下部に組み込み。

pc.html

<script src="//cdn.webrtc.ecl.ntt.com/skyway-latest.js"></script>
<!-- 以下追記 -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@0.11.7"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/posenet@0.1.2"></script>
<!-- ここまで -->
<script src="./key.js"></script>
<script src="./pc.js"></script>

姿勢推定描画用のcanvasを追記。とりあえず最下部でも適当な場所に。縦横もとりあえず320px × 240pxとしておく。

pc.html

<div class="container">
・・・
  <canvas id="output" width="320" height="240"></canvas>
</div>

canvasエレメントとコンテキストを取得。

pc.js

const remoteId = document.getElementById('js-remote-id');
// 以下追加
const canvas = document.getElementById('output');
const ctx = canvas.getContext('2d');

Posenetモデルを取り込みます。

pc.js

// posenetモデル
let net = await posenet.load(0.75);
console.log(net);

const localStream = await navigator.mediaDevices
//・・・

送信側のビデオエレメントremoteVideoの映像を、そのままcanvasに流し込んでみます。pc.jsの下部に追記。

pc.js

  peer.on('error', console.error);
// 以下を追記
  detectPoseInRealTime(remoteVideo, net);

  function detectPoseInRealTime(video, net) {
      console.log('detectPoseInRealTime', video);

      async function poseDetectionFrame() {
          ctx.clearRect(0, 0, 320, 240);
          ctx.save();
          ctx.drawImage(video, 0, 0,320, 240);
          ctx.restore();
// video映像をcanvasでアニメーション化
          requestAnimationFrame(poseDetectionFrame);
    }
    poseDetectionFrame();

  }

スマフォブラウザでsp.htmlを表示し、接続（call）すると、canvasに映像が出ました！
あとは、アニメーションループの中で、TensorJSのPosenetを加えてみます。

＊相手の映像を映し出すvideoタグ（id="js-remote-stream"）に、width、heightが無いため、tensorflowでエラーが出ました。とりあえず、width="490" height="653"としました。これに付随して、canvasの縦横も、490 X 653 に合わせます。

pc.html

<video id="js-remote-stream" autoplay playsinline width="490" height="653"></video>
<canvas id="output" width="490" height="653"></canvas>

pc.js

ctx.clearRect(0, 0, 490, 653);
ctx.drawImage(video, 0, 0,490, 653);

そして、以下のようにdetectPoseInRealTime()を修正。

pc.js

  function detectPoseInRealTime(video, net) {
      console.log('detectPoseInRealTime', video);

      async function poseDetectionFrame() {
          let poses = [];

          const pose = await net.estimateSinglePose(video, 0.5, false, 16);
          poses.push(pose);

          ctx.clearRect(0, 0, 490, 653);
          ctx.save();
          ctx.drawImage(video, 0, 0, 490, 653);
          ctx.restore();

          poses.forEach(({score, keypoints}) => {
              if (score >= 0.1) {
                  console.log('score', score);
                  console.log('keypoints', keypoints);
                 //drawKeypoints(keypoints, 0.1, ctx);
              }
    	  });
          requestAnimationFrame(poseDetectionFrame);
    }
    poseDetectionFrame();
  }

以下のように、console.logにキーポイントがダァ～・・と表示されました！
ちょっと映像画面がカクカクになりますが、許容範囲かなあ。
あとは、コメントアウトしているdrawKeypoints()で、x,y座標とそれぞれ結ぶ線をcanvasにdrawするだけです。

console.log

[
  {
    "score": 0.9973854422569275,
    "part": "nose",
    "position": {
      "x": 143.04330914934107,
      "y": 569.8501232352988
    }
  },
  {
    "score": 0.999018669128418,
    "part": "leftEye",
    "position": {
      "x": 187.84028739572688,
      "y": 518.0897730910432
    }
  },
  {
    "score": 0.9989553689956665,
    "part": "rightEye",
    "position": {
      "x": 113.40685096336675,
      "y": 542.4379784041915
    }
  },
  {
    "score": 0.9520723819732666,
    "part": "leftEar",
    "position": {
      "x": 270.75939542110837,
      "y": 562.6329262128014
    }
  },
  {
    "score": 0.8491347432136536,
    "part": "rightEar",
    "position": {
      "x": 94.14217614384827,
      "y": 603.2268362718004
    }
  },
  {
    "score": 0.029188159853219986,
    "part": "leftShoulder",
    "position": {
      "x": 301.30582035813376,
      "y": 673.038927766792
    }
  },
  {
    "score": 0.030340874567627907,
    "part": "rightShoulder",
    "position": {
      "x": 100.20919022604684,
      "y": 648.6561616129894
    }
  },
  {
    "score": 0.008406699635088444,
    "part": "leftElbow",
    "position": {
      "x": 252.48149698322808,
      "y": 616.9771490453189
    }
  },
  {
    "score": 0.007448229473084211,
    "part": "rightElbow",
    "position": {
      "x": 102.54797286036602,
      "y": 633.0987574153916
    }
  },
  {
    "score": 0.0062928153201937675,
    "part": "leftWrist",
    "position": {
      "x": 262.2808165743344,
      "y": 625.0031441969495
    }
  },
  {
    "score": 0.01439402624964714,
    "part": "rightWrist",
    "position": {
      "x": 117.55185497438426,
      "y": 639.453360529856
    }
  },
  {
    "score": 0.0038259527646005154,
    "part": "leftHip",
    "position": {
      "x": 226.76096417673654,
      "y": 649.8077139320214
    }
  },
  {
    "score": 0.002054766518995166,
    "part": "rightHip",
    "position": {
      "x": 256.3256257672176,
      "y": 625.3370875441682
    }
  },
  {
    "score": 0.0035592580679804087,
    "part": "leftKnee",
    "position": {
      "x": -3.04821803896598,
      "y": 28.053253854458756
    }
  },
  {
    "score": 0.0052134147845208645,
    "part": "rightKnee",
    "position": {
      "x": 70.6190211334704,
      "y": 521.3860210244586
    }
  },
  {
    "score": 0.0030926992185413837,
    "part": "leftAnkle",
    "position": {
      "x": 492.85733678704855,
      "y": 185.7304230369473
    }
  },
  {
    "score": 0.002993899630382657,
    "part": "rightAnkle",
    "position": {
      "x": 494.1194420930381,
      "y": 642.5686759473872
    }
  }
]

キーポイント描画は以下を参照下さい。各部位を判別して、OpenPoseのように部位で色を分けることも出来ます。
https://github.com/tensorflow/tfjs-models/blob/master/posenet/demos/demo_util.js

実用に耐えられるか？

何とか動作している感じです。
どのような用途にせよ、もう少しカクカク感が無くなるといいかもしれません。
Core i7以上なら大丈夫なのかも。

実行環境

PC：Windows10、Core i5、メモリ8GB、Chromeバージョン: 74.0.3729.157
スマフォ：iPhene8、iOS11.4、Safari

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up