UnityとAIで実現する！MoveNet + Sentisでモーショントラッキングその2

Posted at 2025-05-24

その1はこちらから

その1ではSentisやMoveNetについて長々と説明したところで、いよいよUnityプロジェクトの準備に取り掛かりましょう。ここからは手を動かす作業がメインになります。

Unityプロジェクトの準備

Unity Hubから新規プロジェクトを作成
Unity Hubを開き、「新規プロジェクト」を選択します。今回は3Dアプリケーションとして開発を進めるので、「3D Core」テンプレートを選びました
Sentisパッケージのインポート
プロジェクトが開き、Unityエディターが表示されたら、Sentisパッケージをインポートします。
Unityエディター上部のメニューから Window > Package Managerを選択します。
Package Managerウィンドウが開いたら、左上のドロップダウンをUnity Registryに変更します。
検索バーに「Sentis」と入力し、「Unity Sentis」を見つけます。
「Install」ボタンをクリックして、パッケージをプロジェクトに導入します。
Assetsフォルダへのモデルの配置
作成したMoveNetのONNXファイルを、Unityプロジェクトの Assetsフォルダ内の任意の場所にドラッグ＆ドロップしてください。
UnityはONNXファイルを自動的に認識し、インポート処理を行います。インポートが完了すると、モデルファイルがUnityのアセットとして利用可能になります。

付属のサンプルがありますが、警告が大量に出るものとまともに動かないものがあります。
私の環境が悪い可能性(いわゆるおま環)もありますが、おそらくいくつかのサンプルは動かないと思います。

Sentisのパッケージはunsafe修飾子が使用されているようです。allow unsafe codeをtrueにしないとおそらく動きません。デフォルトではされていないので許可してください。私はたまたまOpenCVを使おうとしていたプロジェクトを使っていたので既にやっていました。
やり方はこのページ一番下を参照。

Webカメラの準備と処理

WebカメラのデータをUnityで使用する
UnityではWebCamTextureクラスを使用することで、簡単にWebカメラの映像を取得できます。以下のようなC#スクリプトを作成し、カメラからの映像を取得・表示する準備をします。

Webカメラの初期化コード

PoseEstimater.cs

[SerializeField] string targetWebcamName = "";
[SerializeField] int webcamWidth = 1280;
[SerializeField] int webcamHeight = 720;
[SerializeField] int webcamFPS = 30;
IEnumerator InitializeWebcamAndSentis()
{
    yield return Application.RequestUserAuthorization(UserAuthorization.WebCam);

    WebCamDevice[] devices = WebCamTexture.devices;
    WebCamDevice selectedDevice = devices[0];
    foreach (WebCamDevice device in devices)
    {
        if (device.name == targetWebcamName)
        {
            selectedDevice = device;
            break;
        }
    }
    webcamTexture = new WebCamTexture(selectedDevice.name, webcamWidth, webcamHeight, webcamFPS);
    uiRawImage.texture = webcamTexture;
    webcamTexture.Play();

    yield return new WaitUntil(() => webcamTexture.didUpdateThisFrame && webcamTexture.width > 16);
    isCameraInitialized = true;

    preprocessedRenderTexture = new RenderTexture(movenetInputSize.x, movenetInputSize.y, 0, RenderTextureFormat.ARGB32);
    preprocessedRenderTexture.Create();
    preprocessedTexture2D = new Texture2D(movenetInputSize.x, movenetInputSize.y, TextureFormat.RGB24, false);

}

Sentisの推論エンジンの初期化
Sentisでモデルを実行するには、まずモデルをロードし、推論エンジン（Worker）を初期化する必要があります。実際にはWebカメラの初期化の直後にやるのがいいでしょう。
この時、MoveNetを実行する際のバックエンドタイプを選択できます。GPUが乗っているPCなのでGPUComputeを使用しました。

SentisとMoveNetの初期化コード

[SerializeField] ModelAsset movenetModelAsset;
[SerializeField] BackendType backendType = BackendType.GPUCompute;
Model runtimeModel;
Worker worker;
IEnumerator InitializeWebcamAndSentis()
{
    // ==============
    // Webカメラ初期化
    // ==============
    // Webカメラ初期化の続きから

    runtimeModel = ModelLoader.Load(movenetModelAsset);
    if (runtimeModel == null)
    {
        enabled = false;
        yield break;
    }
    
    worker = new Worker(runtimeModel, backendType);
    if (worker == null)
    {
        enabled = false;
        yield break;
    }
}

カメラデータの変換
Render TextureやWebCamTextureから取得した映像は、そのままではSentisの入力テンソルとしては使えません。MoveNet thunderは256x256を入力とするので、リサイズしてからが期待する入力形式Tensor<T>に変換しなくてはなりません。
また、ONNXアセットをクリックしてヒエラルキーで見てみると、Input(1, 256, 256, 3)といった文言が書かれていると思います。
これにのっとった順番でTensorも作成しなくてはなりません。

カメラデータの変換コード

void Update()
{
    // Sentisへの入力画像準備
    Graphics.Blit(webcamTexture, preprocessedRenderTexture);
    RenderTexture.active = preprocessedRenderTexture;
    preprocessedTexture2D.ReadPixels(new Rect(0, 0, movenetInputSize.x, movenetInputSize.y), 0, 0);
    preprocessedTexture2D.Apply();
    RenderTexture.active = null;
    
    // モデルのInputに合った形式でTensorを作成 今回(1, 256, 256, 3)
    Tensor<float> inputTensor = new Tensor<float>(new TensorShape(1, movenetInputSize.y, movenetInputSize.x, 3));
    var transform = new TextureTransform();
    transform.SetTensorLayout(TensorLayout.NHWC);
    transform.SetDimensions(preprocessedTexture2D.width, preprocessedTexture2D.height, 3);
    TextureConverter.ToTensor(preprocessedTexture2D, inputTensor, transform);
}

SentisによるMoveNetモデルの実行

MoveNetへの入力と推定
前のセクションでTensor<T>に変換した画像データをSentisのWorkerに渡し推定を実行します。

MoveNetへの入力と推定コード

void Update()
{
    worker.Schedule(inputTensor);
}

MoveNet推定結果の解析とキーポイントの取得

推論が完了したら、Workerから結果のテンソルを取得します。MoveNetモデルは通常、検出されたキーポイントの座標と信頼度スコアを含むテンソルを出力します。
このoutputTensorの中には、MoveNetが推論した身体のキーポイントの座標データと、それぞれのキーポイントの検出に対する信頼度スコアが含まれています。

キーポイントの取得コード

// "movenet_inference_layer"は変換コードで指定した名前
Tensor<float> outputTensor = worker.PeekOutput("movenet_inference_layer") as Tensor<float>;

検出したキーポイントの可視化

キーポイントの座標データと信頼度スコアが取得できました。いよいよ、これらのデータをUnityの画面上に描画し、モーショントラッキングの結果を「見える化」する段階です。
ここでは、主に以下の2つの方法でキーポイントを可視化します。

キーポイントの描画
検出された各キーポイント（鼻、目、肩など）を小さな点やSphereで表示します。
ボーンの描画
キーポイント同士を線で結び、人間の骨格を表現します。これにより、体の動きがより直感的に理解できるようになります。

キーポイントの可視化コード

private List<GameObject> keypointObjects = new List<GameObject>();
int numKeypoints = outputTensorCPU.shape[2];
Vector2[] currentKeypointPositions = new Vector2[numKeypoints];
bool[] keypointVisible = new bool[numKeypoints];
Vector2 rawImageSize = uiRawImage.rectTransform.rect.size;

for (int i = 0; i < numKeypoints; i++)
{
    float y = outputTensorCPU[0, 0, i, 0]; // normalized y (0-1)
    float x = outputTensorCPU[0, 0, i, 1]; // normalized x (0-1)
    float confidence = outputTensorCPU[0, 0, i, 2]; // confidence (0-1)
    if (confidence >= confidenceThreshold)
    {
        keypointObjects[i].SetActive(true);
        keypointVisible[i] = true;

        // キーポイントのX座標を反転させるかどうかの設定を適用
        float displayX = flipKeypointsX ? (1f - x) * rawImageSize.x : x * rawImageSize.x;

        // キーポイントのY座標を反転させるかどうかの設定を適用
        float displayY = flipKeypointsY ? (1f - y) * rawImageSize.y : y * rawImageSize.y;

        Vector2 localPos = new Vector2(
            displayX - (rawImageSize.x / 2f),
            displayY - (rawImageSize.y / 2f)
        );

        keypointObjects[i].GetComponent<RectTransform>().anchoredPosition = localPos;
        currentKeypointPositions[i] = localPos;
    }
    else
    {
        keypointObjects[i].SetActive(false);
        keypointVisible[i] = false;
    }
}

for (int i = 0; i < bones.GetLength(0); i++)
{
    int kp1Index = bones[i, 0];
    int kp2Index = bones[i, 1];

    if (keypointVisible[kp1Index] && keypointVisible[kp2Index])
    {
        GameObject boneObj = boneObjects[i];
        RectTransform rt = boneObj.GetComponent<RectTransform>();

        Vector2 start = currentKeypointPositions[kp1Index];
        Vector2 end = currentKeypointPositions[kp2Index];
        Vector2 diff = end - start;
        float length = diff.magnitude;
        float angle = Mathf.Atan2(diff.y, diff.x) * Mathf.Rad2Deg;

        rt.anchoredPosition = start + diff * 0.5f;
        rt.sizeDelta = new Vector2(length, boneWidth);
        rt.localRotation = Quaternion.Euler(0, 0, angle);

        boneObj.SetActive(true);
    }
    else
    {
        boneObjects[i].SetActive(false);
    }
}

おわりに

これにて"その2"はおしまいです。
次回はこれを使った応用などの記事を書く予定ですが、この記事を書いている段階ではまだ完成していないので、"その3"が出るのはいつになるのやら...
興味がありましたら、いいねやストック、フォローするなりして気長にお待ちください。
ここにリンクを張り付けるのでストックしていただけたら通知が飛ぶかもしれません。
ではまた

unsafe codeを許可する

Edit > Project Settings...
開いたウィンドウの左側でPlayerを選択
Other Settings
Configuration > Allow unsafe codeのチェックボックスをオン
再コンパイル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

UnityとAIで実現する！MoveNet + Sentisでモーショントラッキング その2