姿勢推論で現実で体を動かして遊べるゲームを作る

Last updated at 2025-05-21Posted at 2025-05-20

プロジェクト概要

Webカメラの画像(動画だけど1フレームだけ見たら画像)から人の姿勢を推定し、Unityのボーンに当てはめることで実世界の動きがゲーム内でさいげんでき、実体験型のゲームができる。

プロジェクトの背景

画像処理を使っていろいろできないか模索しているところにぱっと思いついただけ

プロジェクトの目的

ゲーム制作したいだけ
もしかしたら、リハビリとかの面でこういう技術が応用されたらイイね！

使用する技術(言語、ライブラリ)

Unity6.0 (6000.0.49f1)
C#
Sentis 2.3.1
MoveNet
Python ~~(バージョン忘れた)~~ -> 3.11.3
pip ~~(バージョン忘れた)~~ -> 25.1.1

Sentis

Unityが提供するローカルAIモデル推論エンジン。AIモデルをUnity上で直接実行し、リアルタイムでAI機能を活用できるツール。具体的には、ONNX形式のAIモデルをインポートして、Unityが対応する様々なプラットフォームで利用できる。例えば、顔認識、対話システム、物体検出などのAI機能を、クラウドに依存せずにゲームやアプリに統合できる。
学習済みモデルがUnityで直接動かないのでUnityで動かすための「場」を提供するもの。学習やトレーニング自体はまた別。
JVMとかそんな感じ。

MoveNet

身体の 17 のキーポイントを検出する超高速で高精度なモデルです。 TF Hub で提供され、ライトニングとサンダーとして知られる 2 つのバリアントがある。
ライトニング、サンダーの2種類のモデルがある。
ライトニングはレイテンシクリティカル(計算コスト低)なアプリケーション
サンダーは高精度(計算コスト大)を必要とするアプリケーション

インストール

pip install tensorflow tensorflow-hub tf2onnx onnx

ONNX変換コード

import tensorflow as tf
import tensorflow_hub as hub
import tf2onnx
import onnx
import numpy as np

print(f"TensorFlow version: {tf.__version__}")
print(f"tf2onnx version: {tf2onnx.__version__}")
print(f"ONNX version: {onnx.__version__}")

# 1. TensorFlow Hub から MoveNet モデルをロード
model_url = "https://tfhub.dev/google/movenet/singlepose/lightning/4"
print(f"Loading MoveNet model from: {model_url}")
movenet_loaded_model = hub.load(model_url)
movenet_infer = movenet_loaded_model.signatures['serving_default']

# 2. float32 入力を持つ新しいKerasモデルラッパーを作成
def create_float_input_movenet_model(movenet_signature):
    input_shape = (192, 192, 3)

    input_tensor_float = tf.keras.layers.Input(shape=input_shape, dtype=tf.float32, name='input_image_float')

    # 前処理レイヤー:
    # 1. float [0,1] のピクセル値を float [0,255] にスケール
    scaled_input = input_tensor_float * 255.0
    
    # 2. float [0,255] を int32 にキャスト (Lambda レイヤーでラップ)
    casted_input = tf.keras.layers.Lambda(lambda x: tf.cast(x, dtype=tf.int32), name='cast_to_int32')(scaled_input)

    # 元のMoveNetモデルにキャストした入力を渡して推論を実行
    # tf.function (movenet_signature) の呼び出しを Lambda レイヤーでラップする
    outputs = tf.keras.layers.Lambda(
        lambda x: movenet_signature(input=x)['output_0'], # ここでキーワード引数 'input' を使用
        name='movenet_inference_layer'
    )(casted_input) # casted_input をこのLambdaレイヤーに渡す

    model = tf.keras.Model(inputs=input_tensor_float, outputs=outputs)
    return model

print("Creating Keras wrapper model for float32 input...")
keras_movenet_float_input = create_float_input_movenet_model(movenet_infer)
print("Keras model created.")

# 3. KerasモデルをONNX形式に変換
onnx_model_path = "movenet_lightning_float_input.onnx"

input_spec = [tf.TensorSpec(shape=[1, 192, 192, 3], dtype=tf.float32, name='input_image_float')]

print(f"Converting Keras model to ONNX: {onnx_model_path}")
model_proto, _ = tf2onnx.convert.from_keras(
    keras_movenet_float_input,
    input_signature=input_spec,
    opset=13,
    output_path=onnx_model_path
)

print(f"ONNX model saved to: {onnx_model_path}")

# 4. Optional: Verify the ONNX model
try:
    onnx_model = onnx.load(onnx_model_path)
    print("\n--- ONNX Model Input Verification ---")
    for input_node in onnx_model.graph.input:
        print(f"Input Name: {input_node.name}")
        print(f"Input Type: {onnx.helper.tensor_dtype_to_string(input_node.type.tensor_type.elem_type)}")
        shape_dims = [d.dim_value for d in input_node.type.tensor_type.shape.dim]
        print(f"Input Shape: {shape_dims}")
except Exception as e:
    print(f"Error verifying ONNX model: {e}")

データフロー

WebCamから画像取得(WebCamTexture)
前処理(WebCamTexture -> RenderTexture -> Texture2D -> Sentis.Tensorに変換)
MoveNetモデルを実行 (SentisのModelとWorkerがMoveNetの推論を担当)
Sentisからの出力Tensor (キーポイント座標データ) を取得
Unityスクリプトで後処理 (Tensorの座標を画面座標に変換)
画面表示 (Raw Imageの上にキーポイントやボーンを描画、または3Dモデルを制御)

使用クラス

WebCamから画像取得(WebCamTexture)

using Unity.Sentis.ModelAsset

前処理(WebCamTexture -> RenderTexture -> Texture2D -> Sentis.Tensorに変換)

using UnityEngine.WebCamTexture

MoveNetモデルを実行 (SentisのModelとWorkerがMoveNetの推論を担当)

using UnityEngine.RenderTexture
using UnityEngine.Texture2D
using Unity.Sentis.Tensor

Sentisからの出力Tensor (キーポイント座標データ) を取得

using Unity.Sentis.Model
using Unity.Sentis.WorkerFactory
using Unity.Sentis.IWorker

Unityスクリプトで後処理 (Tensorの座標を画面座標に変換)

using UnityEngine.Vector2 / UnityEngine.Vector3
using UnityEngine.UI.RawImage
using UnityEngine.LineRenderer
using UnityEngine.GL

画面表示 (Raw Imageの上にキーポイントやボーンを描画、または3Dモデルを制御)

using UnityEngine.MonoBehaviour

クラス図

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up