More than 1 year has passed since last update.

Unity Sentis を使ってMoveNetを動かしてみる

Posted at 2024-06-01

はじめに

Unity Sentis を用いることで、ゲームやアプリケーションに AI モデルを統合できるようになります。
Unityさんは推しているようではありますが、新しいライブラリであることもあり、情報量は少ないです。

2024年5月30日現在、Qiitaで Unity Sentis と検索しても２件しかヒットしません。
以前は Unity Barracuda を使用することで実現できましたが、Unity6以降は Unity Sentis が主流になるようです。そのため、これから使われていくことでしょう。その際の参考となれば幸いです。

今回は、Google が公開している機械学習モデルである、MoveNet を Unity Sentis を用いて動かしてみようと思います。

このページの内容はオープンベータ版の Sentis 1.4.0-pre.3 でのものです。
そのため、最新版・正式版での実装方法とは異なる可能性があります。

開発環境

Unity : 2023.2.18f1
Sentis : Sentis 1.4.0-pre.3

GitHubレポジトリ

今回の内容は GitHub で公開しています。

MoveNetとは

を見てください。モデルは

で入手できます。

1. Sentis のインストール方法

公式を参考してインストールしてください。

インストール方法（上記サイトより引用）

To add the Sentis package to a Unity project:

Create a new Unity project or open an existing one.

To open the Package Manager, navigate to Window > Package Manager.

Click + and select Add package by name....

Enter com.unity.sentis.

Click Add to add the package to your project.

(1) 新しくUnityプロジェクトを作成するか、すでに制作したプロジェクトを開きます。
(2) Package Manager を開きます。
　画面上部の Window タブから Package Manager を選択します。
(3) + ボタン（ウインドウ左上）を押し、Add package by name を選択します。
(4) com.unity.sentis と入力します。
(5) "Add" をクリックして完了です。

2. プログラム解説

2.1. Detector.cs

実際に機械学習モデルに画像データをインプットして、推測結果を取得します。
Unity.Sentis を用いた機械学習モデル実行プログラムの要です。

最低限、ここだけ見ればOKです。

全文

Detector.cs


using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using UnityEngine;
using Unity.Sentis;

namespace MoveNet
{
    //Execute inferences using machine learning model
    public class Detector : IDisposable
    {
        public const int CLASSIFICATION_NUMBER = 17;

        public Detector(ResourceSet resources)
          => AllocateObjects(resources);

        public void Run(Texture source, float Threshold = 0.5f)
        {
            // Not used in this sample
            if (Threshold < 0.0f) _scoreThreshold = 0.0f;
            else if (Threshold > 1.0f) _scoreThreshold = 1.0f;
            else _scoreThreshold = Threshold;

            RunModel(source);
        }

        public IEnumerable<Detection> Detections()
        {
            /*
            Return Detections Result (IEnumerable<Detection>)

            Length of the IEnumerable = 17 (Classification)

            return [struct Detection (position_x, position_y, score)] * 17

            Data structure of <Detection> : See Detection.cs
            */
            return _detectionCache.Cached(new ReadOnlyCollection<float>(_detections), _scoreThreshold);
        }

        public float Threshold => _scoreThreshold;

        public void Dispose()
        {
            _worker?.Dispose();
            _worker = null;

            _buffers.preprocess?.Dispose();
            _buffers.preprocess = null;
        }

        // -------------------------------------------------------------------------------------

        ResourceSet _resources;
        Config _config;
        IWorker _worker;

        (GraphicsBuffer preprocess,
         RenderTexture processedImage) _buffers;

        DetectionCache _detectionCache;
        static float _scoreThreshold;
        static float[] _detections;

        void AllocateObjects(ResourceSet resources)
        {
            // NN model loading
            var nnmodel = ModelLoader.Load(resources.model);

            // Edit a Model
            var editedModel = Functional.Compile(
                RGB =>
                {
                    var sRGB = Functional.Pow(RGB, Functional.Tensor(1 / 2.2f));

                    // Transform values from the range [0, 1] to the range [0, 255].
                    var RGB_255 = Functional.Mul(sRGB, Functional.Tensor(255.0f));

                    return nnmodel.ForwardWithCopy(RGB_255)[0];
                },
                nnmodel.inputs[0]);

            // Private object initialization
            _resources = resources;
            _config = new Config(editedModel);
            _worker = WorkerFactory.CreateWorker(BackendType.GPUCompute, editedModel);

            _buffers.preprocess = new GraphicsBuffer(
                GraphicsBuffer.Target.Structured, _config.InputFootprint, sizeof(float));

            _buffers.processedImage = new RenderTexture(_config.InputWidth, _config.InputWidth, 3);
            _buffers.processedImage.enableRandomWrite = true;
            _buffers.processedImage.Create();

            _detectionCache = new DetectionCache(CLASSIFICATION_NUMBER);
        }

        void RunModel(Texture source)
        {
            // Preprocessing (Sampling)
            var pre = _resources.preprocess;
            pre.SetInt("Size", _config.InputWidth);
            pre.SetTexture(0, "Image", source);
            pre.SetBuffer(0, "Tensor", _buffers.preprocess); 
            pre.SetTexture(0, "processedImage", _buffers.processedImage);
            pre.Dispatch(0, _config.InputWidth / 8, _config.InputWidth / 8, 1);

            float[] ft = new float[_config.InputFootprint];
            _buffers.preprocess.GetData(ft);

            // NN worker invocation
            TensorShape newShape = new TensorShape(_config.InputWidth, _config.InputWidth, 3);
            using (var tensor = new TensorFloat(newShape, ft))
            {
                tensor.Reshape(newShape);

                _worker.Execute(tensor);
            }

            var output = _worker.PeekOutput(_config.OutputName) as TensorFloat;
            output.CompleteOperationsAndDownload();
            _detections = output.ToReadOnlyArray();

            output.Dispose();
            _detectionCache.Invalidate();
        }
    }
}//namespace MoveNet

コンストラクタ

ResourceSet (機械学習モデルとコンピュートシェーダーを保持) を受け取り、機械学習実行の準備を行います。
ResourceSet については、ResourceSet.cs を参照してください。

Detector.cs(抜粋)

void AllocateObjects(ResourceSet resources)
        {
            // NN model loading
            var nnmodel = ModelLoader.Load(resources.model);

            // Edit a Model
            var editedModel = Functional.Compile(
                RGB =>
                {
                    var sRGB = Functional.Pow(RGB, Functional.Tensor(1 / 2.2f));

                    // Transform values from the range [0, 1] to the range [0, 255].
                    var RGB_255 = Functional.Mul(sRGB, Functional.Tensor(255.0f));

                    return nnmodel.ForwardWithCopy(RGB_255)[0];
                },
                nnmodel.inputs[0]);

            // Private object initialization
            _resources = resources;
            _config = new Config(editedModel);
            _worker = WorkerFactory.CreateWorker(BackendType.GPUCompute, editedModel);

            _buffers.preprocess = new GraphicsBuffer(
                GraphicsBuffer.Target.Structured, _config.InputFootprint, sizeof(float));

            _buffers.processedImage = new RenderTexture(_config.InputWidth, _config.InputWidth, 3);
            _buffers.processedImage.enableRandomWrite = true;
            _buffers.processedImage.Create();

            _detectionCache = new DetectionCache(CLASSIFICATION_NUMBER);
        }

機械学習モデルのロード

var nnmodel = ModelLoader.Load(resources.model);

ModelLoader クラスの Load(ModelAsset) を用いてモデルのロードを行います。

機械学習モデルの編集

機械学習モデルの編集を行います。この際、Functional クラスとFunctionalExtensions クラスを使用します。
Unity 上での画像データにおける RGB の値が 0 から 1 なのに対し、MoveNet ではインプット画像の RGB の値は 0 から 255 となっています。
なので、MoveNet の一番最初に、入力された RGB の値を255倍する層を追加します。

どうやら、画像データを変換するのではなく、機械学習モデル自体に変更を加えるみたいです。機械学習モデルはそのままにして、画像データの RGB 値を 255倍すればいいのではとは思いましたが、そうではないようです。

なお、機械学習モデルにおけるインプット画像の RGB の値が 0 から 1 の場合は、この部分は不要になります。

ForwardWithCopy()とは別にForward()という関数もありますが、Forward()はモデルを破壊的に編集すると書かれています。ForwardWithCopy()では、モデルをコピーして新しく作り直します。今回はあくまで試用なので、計算速度について深く考えはせずForwardWithCopy()を使用しました。

var editedModel = Functional.Compile(
                RGB =>
                {
                    var sRGB = Functional.Pow(RGB, Functional.Tensor(1 / 2.2f));

                    // Transform values from the range [0, 1] to the range [0, 255].
                    var RGB_255 = Functional.Mul(sRGB, Functional.Tensor(255.0f));

                    return nnmodel.ForwardWithCopy(RGB_255)[0];
                },
                nnmodel.inputs[0]);

バッファの初期化

コンピュートシェーダーを使用するための準備を行います。

なお、processedImage は必須ではありません。機械学習モデルに入力される画像の確認のために作成しました。コンピュートシェーダーで RWTexture を使う際には enableRandomWrite = true と設定します。

_buffers.preprocess = new GraphicsBuffer(
                GraphicsBuffer.Target.Structured, _config.InputFootprint, sizeof(float));

_buffers.processedImage = new RenderTexture(_config.InputWidth, _config.InputWidth, 3);
_buffers.processedImage.enableRandomWrite = true;
_buffers.processedImage.Create();

実行部 ( RunModel() )

画像サンプリング

画像から各ピクセルのRGBデータを取得する部分においてはコンピュートシェーダーを用います。
ここでは、コンピュートシェーダーに必要な値を渡し、サンプリングを実行します。

_config.InputWidth は、機械学習モデルに入力する画像の横（縦）のピクセル数です。
なお、コンピュートシェーダー(Preprocess.compute) では [numthreads(8, 8, 1)] と指定しているので、各軸におけるスレッドグループの数としては、_config.InputWidth / 8 と 8 で割った数となります。

Detector.cs(抜粋)

var pre = _resources.preprocess;

pre.SetInt("Size", _config.InputWidth);
pre.SetTexture(0, "Image", source);
pre.SetBuffer(0, "Tensor", _buffers.preprocess); 
pre.SetTexture(0, "processedImage", _buffers.processedImage);
pre.Dispatch(0, _config.InputWidth / 8, _config.InputWidth / 8, 1);

Dispatch 後は GetData() でサンプリング結果の取得を行います。
_config.InputFootprint は、機械学習モデルに入力する画像の横の長さ×縦の長さ×３チャンネル（RGB）です。この辺はもう少し良い書き方があるかもしれません。

Detector.cs(抜粋)


float[] ft = new float[_config.InputFootprint];
_buffers.preprocess.GetData(ft);

先ほど取得したサンプリング結果を、機械学習モデルの入力に合うよう Reshape し、実行します。この際には、IWorker を使用します。
その後、float型配列である _detections にデータをコピーします。

Detector.cs(抜粋)

// NN worker invocation
TensorShape newShape = new TensorShape(_config.InputWidth, _config.InputWidth, 3);
using (var tensor = new TensorFloat(newShape, ft))
{
    tensor.Reshape(newShape);

    _worker.Execute(tensor);
}

var output = _worker.PeekOutput(_config.OutputName) as TensorFloat;
output.CompleteOperationsAndDownload();
_detections = output.ToReadOnlyArray();

output.Dispose();
_detectionCache.Invalidate();

結果取得用関数 ( Detections() )

他のクラスへ推論結果を渡すための public 関数です。
DetectionCache を用いて推論結果を加工し、各キーポイントごとの推論結果を返します。各キーポイントにおける推論結果は Detection 構造体として返されます。 Detection 構造体は x座標、y座標、スコア、ラベル番号、ラベル名を保持しています。

Detector.cs(抜粋)

 public IEnumerable<Detection> Detections()
{
    return _detectionCache.Cached(new ReadOnlyCollection<float>(_detections), _scoreThreshold);
}

2.2. Preprocess.compute

色空間の変更を行い、各ピクセルのRGB値を RWStructuredBuffer に詰め込みます。
色空間については次のサイトを参考にさせていただきました。

全文

#pragma kernel Preprocess

sampler2D Image;
RWStructuredBuffer<float> Tensor;
RWTexture2D<float3> processedImage;
uint Size;

#define FLT_EPSILON 1.192092896e-07

float3 PositivePow(float3 base, float3 power)
{
    return pow(max(abs(base), float3(FLT_EPSILON, FLT_EPSILON, FLT_EPSILON)), power);
}

float3 LinearToSRGB(float3 color)
{
    float3 sRGBLo = color * 12.92;
    float3 sRGBHi = (PositivePow(color, float3(1.0 / 2.4, 1.0 / 2.4, 1.0 / 2.4)) * 1.055) - 0.055;
    float3 sRGB = (color <= 0.0031308) ? sRGBLo : sRGBHi;
    return sRGB;
}

[numthreads(8, 8, 1)]
void Preprocess(uint2 id : SV_DispatchThreadID)
{
    // UV
    float2 uv = float2(0.5 + id.x, 0.5 + id.y) / Size;

    // UV gradients
    float2 duv_dx = float2(1.0 / Size, 0);
    float2 duv_dy = float2(0, -1.0 / Size);

    // Texture sample
    float3 rgb = tex2D(Image, uv, duv_dx, duv_dy).rgb;
    rgb = LinearToSRGB(rgb);
    
    // Tensor element output
    uint offs = (id.y * Size + id.x) * 3;
    Tensor[offs + 0] = rgb.r;
    Tensor[offs + 1] = rgb.g;
    Tensor[offs + 2] = rgb.b;
    
    processedImage[id] = float3(rgb.r, rgb.g, rgb.b);
}

2.3. DetectionExecutor.cs

MonoBehaviourを継承しているクラスで、推論実行のエントリーポイントになります。
画面上の Start ボタンをクリックすることで、public void Run() が呼ばれ、推論が実行されます。

また、public IEnumerable LatestDetection() を呼ぶことで、最新の推論結果を取得できます。（ビデオでも推論を可能に）

全文

DetectionExecutor.cs

using MoveNet;
using System.Collections.Generic;
using UnityEngine;

//Execute inference & Intermediary for delivery of result
public class DetectionExecutor : MonoBehaviour
{
    Detector _detector;
    [SerializeField] ResourceSet _resources = null;
    [SerializeField] Video.VideoGrapher _grapher = null;

    int _imageWidth;
    int _imageHeight;

    public int ImageWidth => _imageWidth;
    public int ImageHeight => _imageHeight;

    void Start()
    {
        _detector = new Detector(_resources);
    }

    public void Run()
    {
        _imageWidth = _grapher.ImageWidth;
        _imageHeight = _grapher.ImageHeight;
        _detector.Run(_grapher.CameraTexture);
    }

    public IEnumerable<Detection> LatestDetection() => _detector.Detections();

    void OnDisable() => _detector.Dispose();
}

2.4. MarkerVisualizer.cs

これもMonoBehaviourを継承しているクラスで、マーカー表示のエントリーポイントになります。各キーポイントの推論結果を、各マーカーに振り分けて伝達する役割を担っています。
SetAttributes() は Marker クラスで定義されています。

一部抜粋

MarkerVisualizer.cs

public void DisplayMarkers()
{
    var latestDetections = executor.LatestDetection();

    var i = 0;
    foreach (var detection in latestDetections)
    {
        if (i == _markers.Length) break;

        Debug.Log(detection.ToString());
        _markers[i++].SetAttributes(detection, executor.ImageWidth, executor.ImageHeight, _rawImageWidth);
    }

    for (; i < _markers.Length; i++) _markers[i].Hide();
}

2.5. Marker.cs

マーカーを入力画像（の表示）上に配置します。

全文

Marker.cs


using Unity.VisualScripting;
using UnityEngine;
using UnityEngine.UI;
using TMPro;
using UnityEngine.XR;

namespace MoveNet
{
    sealed class Marker : MonoBehaviour
    {
        RectTransform _xform;
        Image _panel;
        TextMeshProUGUI _label;

        public void Setup()
        {
            _xform = GetComponent<RectTransform>();
            _panel = GetComponentInChildren<Image>();
            _label = GetComponentInChildren<TextMeshProUGUI>();
        }

        public void SetAttributes(in Detection d, int w, int h, float rawImageWidth)
        {
            // Local Position
            // The range of values for d.x and d.y is [0.0, 1.0]
            // Match this to the size of the RawImage.
            var x = (d.x * 2 - 1) * rawImageWidth * 0.5f;
            var y = (d.y * 2 - 1) * rawImageWidth * 0.5f;

            _xform.localPosition = new Vector2(x, y);

            // Label
            _label.text = $"{d.labelName}";

            // Panel color
            var hue = d.labelIndex * 0.073f % 1.0f;
            var color = Color.HSVToRGB(hue, 1, 1);
            color.a = 0.4f;
            _panel.color = color;

            // Enable
            gameObject.SetActive(true);
        }

        public void Hide() => gameObject.SetActive(false);
    }
}

結果と感想

インスペクターから適当に画像を入力すると、いい感じにマーカーが配置されていることが確認できました。

Right と Left がこれでいいのかは疑問です。L と出ている方は実際には右側（右腕、右足、etc.）です。MoveNet の仕様なのでしょうか。

ひとまず動いたので今回は良しとします。
それにしても、情報がまだ少ないので、実装するのは結構骨が折れる作業です。まだよく分かっていないところも多いので、もう少し公式のサンプルが増えると良いのですが...

より良い書き方・最新版での変更点等ありましたら、気軽にコメントを残していってください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up