More than 5 years have passed since last update.

機械学習+Unityで水遁・水龍弾！（誇張）

Posted at 2018-10-28

キーワード

Unity, Perception Neuron, モーションキャプチャ, 機械学習, ニューラルネット, Tensorflow

モチベーション

Unity+Tensorflowといふものを、してみむとてするなり。

結果

Youtubeが開きます。

概要

何番煎じか分かりませんが、Unity+Tensorflowネタがやりたかったのです。

文字認識などはありきたりなので、お題はNARUTOより、十二支の印の判別としました。

具体的な個々の指の形に関しては、Googleの画像検索などでよろしくお願いします。
NARUTO + 印

臨兵闘......と続くものも検索に引っ掛かりますが、子、丑、と続く方です。

学習については画像判定ではなく、モーションキャプチャを用いて取得した値を用います。これは、最終的な利用シーンとしてカメラに正対する想定をしないためです。

印の画像を見比べるに、手首と五指の回転を取ることができれば、十二種を切り分けられそうです。

構成

Anaconda Navigator 1.6.4

Tensorflow 1.4.0

Python 3.4

Unity 2018.2f1

主な参考

TensorFlow + Unity: How to set up a custom TensorFlow graph in Unity

上記ページからリンクすると思われるスクリプト（記事内リンクはリンク切れでした）

CUBE SUGAR CONTAINER

学習データ用意

先述した各回転の値を利用します。
Perception NeuronのSDK内に含まれるNeuronRobotを回転取得用のボーンとして利用しました。
図に示す34箇所のx,y,z,wのQuaternionを入力データとしました。すなわち、入力は136パラメータになります。

学習データの取得

学習データはUnityより取得します。
取得時のシーンは下図のような形です。

〇Perception NeuronのデータをUnity側で受け取りボーンを操作できるようにしておきます。
〇UnityEditor上でPlayモードで再生を行い、各印の形に手を作り、該当する印名をドロップダウンより選択します。
〇Button押下の3秒後から10秒間、ボーンをもとにした回転データをテキストに出力します。
〇何回か繰り返します。

結果

850件/印となるデータを作成しました。
同様の手順で182件/印となる正解データを作成しました。

グラフ作成

データ構造

ディープなネットワークではなく、入力後、即ソフトマックスを用いて結果を出すネットワークです。

学習用ファイル全文


# coding: utf-8

# In[1]:


# coding: utf-8
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import keras
from sklearn.model_selection import KFold
import random
import os
import os.path as path
# freeze_graph "screenshots" the graph
from tensorflow.python.tools import freeze_graph
# optimize_for_inference lib optimizes this frozen graph
from tensorflow.python.tools import optimize_for_inference_lib
from keras import backend as K


# In[2]:


print(tf.VERSION)


# In[3]:


def export_model(saver, input_node_names, output_node_name):
    # creates the 'out' folder where our frozen graphs will be saved
    if not path.exists('out'):
        os.mkdir('out')

    # an arbitrary name for our graph
    GRAPH_NAME = 'my_graph_name'

    # GRAPH SAVING - '.pbtxt'
    tf.train.write_graph(K.get_session().graph_def, 'out', GRAPH_NAME + '_graph.pbtxt')

    # GRAPH SAVING - '.chkp'
    # KEY: This method saves the graph at it's last checkpoint (hence '.chkp')
    saver.save(K.get_session(), 'out/' + GRAPH_NAME + '.chkp')

    # GRAPH SAVING - '.bytes'
    # freeze_graph.freeze_graph(input_graph_path, input_saver_def_path,
                           # input_binary, checkpoint_path, output_node_names,
                           # restore_op_name, filename_tensor_name,
                           # output_frozen_graph_name, clear_devices, "")
    freeze_graph.freeze_graph('out/' + GRAPH_NAME + '_graph.pbtxt', None, False,
                              'out/' + GRAPH_NAME + '.chkp', output_node_name,
                              "save/restore_all", "save/Const:0",
                              'out/frozen_' + GRAPH_NAME + '.bytes', True, "")

    # GRAPH OPTIMIZING
    input_graph_def = tf.GraphDef()
    with tf.gfile.Open('out/frozen_' + GRAPH_NAME + '.bytes', "rb") as f:
        input_graph_def.ParseFromString(f.read())

    output_graph_def = optimize_for_inference_lib.optimize_for_inference(
            input_graph_def, input_node_names, [output_node_name],
            tf.float32.as_datatype_enum)

    with tf.gfile.FastGFile('out/opt_' + GRAPH_NAME + '.bytes', "wb") as f:
        f.write(output_graph_def.SerializeToString())

    print("graph saved!")
    


# In[4]:


train = pd.read_csv("..\\Data\\Gesture\\gesture\\train.csv")
label = np.loadtxt("..\\Data\\Gesture\\gesture\\train_label.csv",delimiter=",")

test = pd.read_csv("..\\Data\\Gesture\\gesture\\test.csv")
test_label = np.loadtxt("..\\Data\\Gesture\\gesture\\test_label.csv",delimiter=",")


# In[5]:


data = train.loc[:,"Robot_LeftHand_X":"Robot_RightHandRing3_W"].values
test_data = test.loc[:,"Robot_LeftHand_X":"Robot_RightHandRing3_W"].values

np.random.seed(42)
np.random.shuffle(data)
np.random.seed(42)
np.random.shuffle(label)


# In[7]:


assert np.shape(data)[0] == np.shape(label)[0], "Data and label count isn't matching."
assert np.shape(test_data)[0] == np.shape(test_label)[0], "Data and label count isn't matching."


# In[9]:


# Parameters
learning_rate = 0.1
batch_size = 250


# In[10]:


# TF graph
x = tf.placeholder(tf.float32, [None, data.shape[1]], name="input_placeholder_x")
y = tf.placeholder(tf.float32, [None, 12])
W = tf.Variable(tf.zeros([data.shape[1], 12]))
b = tf.Variable(tf.zeros([12]))


# In[11]:


pred = tf.nn.softmax(tf.matmul(x, W) + b, name="output_node")
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init = tf.global_variables_initializer()


# In[46]:


train_x_all = data
train_y_all = label
test_x = test_data
test_y = test_label
print(test_x.shape)


# In[14]:


def run_train(session, train_x, train_y):
    print ("\nStart training")
    session.run(init)
    for epoch in range(10):
        total_batch = int(train_x.shape[0] / batch_size)
        print("total batch =%d"% total_batch)
        for i in range(total_batch):
            batch_x = train_x[i*batch_size:(i+1)*batch_size]
            batch_y = train_y[i*batch_size:(i+1)*batch_size]
            _, c = session.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
            if i % 1000 == 0:
                print ("Epoch #%d step=%d cost=%f" % (epoch, i, c))


# In[ ]:


with tf.Session() as session:
    run_train(session, data, label)
    print ("Test accuracy: %f" % session.run(accuracy, feed_dict={x: test_x, y: test_y}))
    saver = tf.train.Saver()
    export_model(saver, ["input_placeholder_x"], "output_node")


# In[45]:


saver = tf.train.Saver()
with tf.Session() as sess:
    saver.restore(sess, 'out/my_graph_name.chkp')
    acc = sess.run(accuracy, feed_dict={x: test_x, y: test_y})
    print("結果：{:.2f}%".format(acc * 100))

結果

精度約98％になりました。
やや低いですが、Unity+Tensorflowのつなぎ込み部分をメインにしているので、グラフの変更等はしませんでした。

関数説明

学習済みグラフの保存については、with句の中で行っています。


with tf.Session() as session:
    run_train(session, data, label)
    saver = tf.train.Saver()
    export_model(saver, ["input_placeholder_x"], "output_node")

グラフの保存方法は以下の通りです。

ここで、input_node_namesとoutput_node_nameはそれぞれ、グラフの作成時にname=""で名前を付けています。


x = tf.placeholder(tf.float32, [None, data.shape[1]], name="input_placeholder_x")
pred = tf.nn.softmax(tf.matmul(x, W) + b, name="output_node")

これらの名称およびexport_model内で値が設定されているGRAPH_NAMEについてはUnity側でも利用します。


def export_model(saver, input_node_names, output_node_name):
    if not path.exists('out'):
        os.mkdir('out')

    # an arbitrary name for our graph
    GRAPH_NAME = 'my_graph_name'

    # GRAPH SAVING - '.pbtxt'
    tf.train.write_graph(K.get_session().graph_def, 'out', GRAPH_NAME + '_graph.pbtxt')

    # GRAPH SAVING - '.chkp'
    # KEY: This method saves the graph at it's last checkpoint (hence '.chkp')
    saver.save(K.get_session(), 'out/' + GRAPH_NAME + '.chkp')

    # GRAPH SAVING - '.bytes'
    # freeze_graph.freeze_graph(input_graph_path, input_saver_def_path,
                           # input_binary, checkpoint_path, output_node_names,
                           # restore_op_name, filename_tensor_name,
                           # output_frozen_graph_name, clear_devices, "")
    freeze_graph.freeze_graph('out/' + GRAPH_NAME + '_graph.pbtxt', None, False,
                              'out/' + GRAPH_NAME + '.chkp', output_node_name,
                              "save/restore_all", "save/Const:0",
                              'out/frozen_' + GRAPH_NAME + '.bytes', True, "")

    # GRAPH OPTIMIZING
    input_graph_def = tf.GraphDef()
    with tf.gfile.Open('out/frozen_' + GRAPH_NAME + '.bytes', "rb") as f:
        input_graph_def.ParseFromString(f.read())

    output_graph_def = optimize_for_inference_lib.optimize_for_inference(
            input_graph_def, input_node_names, [output_node_name],
            tf.float32.as_datatype_enum)

    with tf.gfile.FastGFile('out/opt_' + GRAPH_NAME + '.bytes', "wb") as f:
        f.write(output_graph_def.SerializeToString())

    print("graph saved!")

正しくグラフが保存されると、frozen_my_graph_name.bytesが生成されます。
このファイルをUnityのReourcesフォルダ以下に置いておきます。

また、Unity側でグラフを作成するために次のデータをインポートしておく必要があります。

Unity ML-Agents Toolkit (Beta) (Assets以下をプロジェクトにインポート)
Unity TensorFlow Plugin　（Download here より取得）

さらに、Player Settings内のScripting Define SymbolsにENABLE_TENSORFLOWを追記し、Scripting Runntime Versionを.Net 4.xに変更しておきます。

学習済みグラフ導入部

Unityで学習済みのグラフの読み込み、実ケースでの精度を検討します。
グラフを読み込み、ボーンから毎フレーム取得した回転情報の推定結果を表示します。

グラフ使用関数全文


using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.UI;
using TensorFlow;
public class useGraph : MonoBehaviour {
    //推定結果の表示用テキスト
    public Text text;
    //表示テキスト
    private string[] inn = new string[12]
    {
        "子",
        "丑",
        "寅",
        "卯",
        "辰",
        "巳",
        "午",
        "未",
        "申",
        "酉",
        "戌",
        "亥",
    };

    //Inspectorからfrozen_my_graph_nameを設定しておく
    public TextAsset graphModel;
    //入力はPython側で[NONE,136]になっているので、それに対応した多次元配列を生成
    float[,] inputTensor = new float[1, 136];

    //関節データ取得用
    public GetBoneData getBoneData;

    TFGraph graph;
    TFSession session;
    TFSession.Runner runner;

    void Start()
    {
        InitGraph();
    }

    /// <summary>
    /// グラフの初期化を行う
    /// </summary>
    private void InitGraph()
    {
        graph = new TFGraph();

        graphModel = Resources.Load("frozen_my_graph_name") as TextAsset;
        graph.Import(graphModel.bytes);

        session = new TFSession(graph);

    }

    /// <summary>
    /// 印の推定
    /// </summary>
    public void Estimation()
    {
        //グラフのセッション開始
        session = new TFSession(graph);
        runner = session.GetRunner();

        //推定に用いるデータの取得
        inputTensor = getBoneData.GetDataFloat(inputTensor);
        TFTensor input = inputTensor;

        //データの入力～推定結果の取得
        //Python側と同じ名前を利用
        runner.AddInput(graph["input_placeholder_x"][0], input);
        runner.Fetch(graph["output_node"][0]);
        float[,] recurrentTensor = runner.Run()[0].GetValue() as float[,];

        //セッションは終了する
        session.Dispose();

        //結果の表示
        ShowResult(recurrentTensor);

    }

    /// <summary>
    /// 最も可能性の高い印を結果として表示
    /// </summary>
    /// <param name="recurrentTensor"></param>
    private void ShowResult(float[,] recurrentTensor)
    {
        Debug.Log("---------------------------");

        //12の印に対応する確率から、確率が最も高いものを選別する
        for (int i = 0; i < recurrentTensor.GetLength(0); i++)
        {
            int index = 0;
            float max = 0.0f;
            for (int j = 0; j < recurrentTensor.GetLength(1); j++)
            {
                if (recurrentTensor[i, j] > max)
                {
                    index = j;
                    max = recurrentTensor[i, j];
                }
            }
            text.text = inn[index];
            Debug.Log(index + "," + max);
        }

        Debug.Log("---------------------------");
    }

    void Update () {
        Estimation();
    }

    /// <summary>
    /// 後片付け
    /// </summary>
    private void OnApplicationQuit()
    {
        if(session != null)
            session.Dispose();
        if(graph != null)
            graph.Dispose();
    }
}

関数として呼び出されるGetDataFloatは次のような感じになります。


public float[,] GetDataFloat(Transform[] potisons, Transform[] rotations,float[,] list)
   {
       var index = 0;
       for (int i = 0; i < potisons.Length; i++)
       {
           list[0, index] = (potisons[i].localPosition.x);
           index++;
           list[0, index] = (potisons[i].localPosition.y);
           index++;
           list[0, index] = (potisons[i].localPosition.z);
           index++;
       }
       for (int i = 0; i < rotations.Length; i++)
       {
           list[0, index] = (rotations[i].localRotation.x);
           index++;
           list[0, index] = (rotations[i].localRotation.y);
           index++;
           list[0, index] = (rotations[i].localRotation.z);
           index++;
           list[0, index] = (rotations[i].localRotation.w);
           index++;
       }
       return list;
   }

動作確認

先頭付近の動画の通りになります。

無事、Unityで学習済みのTensorflowのグラフを利用できました。

反省・感想

はい。タイトルは誇張です。

実際タイトル通りのことをする場合はリズムゲームチックにしてしまい、特定のフレーム範囲の平均を結果として利用する……といった形にする必要があると思います。

また、データのとり方、回転データだけでいいの？といった問題も検討が必要かと思います。

今後の課題

Hololensとか被って……

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up