More than 5 years have passed since last update.

ML-Agentsの本のMazeサンプルを外部学習で試してみた[2019.9.13追記][完了]

Last updated at 2019-09-12Posted at 2019-09-10

はじめに

「Unityではじめる機械学習／強化学習 ML-Agent実践ゲームプログラミング」を読みながら機械学習をたしなもう…としていたのですが、外部との学習でなかなかうまいこと動かなかったので逐次書き起こしました。

そもそもの起点

本書の3章が最新の環境でうまいこと行くのかを試しています(この時点ですでに地雷くさい…)

Unity2019.3b2 Windows版(当方のWindowsは10/64bit)です
ML-Agents 0.9.3

やってきたこと

画像は本書のサポートサイト(「はじめに」にリンクがあります)から採用
AcademyやBrainの設定は本書とサポートサイトの正誤表とをにらめっこしながら適応
- MazeAcademyのBroadcast Hubに作成したMazeBrainとMazePlayerを適応済
- 基本的に違うのは、PlayerBrainの「Discrete Player Actions」の部分
  - 各ElementのValueは0,1,2,3の順に設定されているがDefault値が0固定になったので1,2,3,4に変更
  - MazeAgent.csの内容も追随する形で修正

MageAgent.cs

void MoveRobo(int action){
  :
  int dx = this.x;
  int dy = this.y;
  if(action == 1){ dy--; } // action 0 -> 1
  if(action == 2){ dy++; } // action 1 -> 2
  if(action == 3){ dx--; } // action 2 -> 3
  if(action == 4){ dx++; } // action 3 -> 4
  :
}

Python側(Anacondaを使用し、Pythonのバージョンも3.6を指定)で以下のコマンドを打ち込み(pipによる環境取り込みはすでに実行）

mlagents-learn ./config/trainer_config.yaml --run-id=Maze-1 --train

とりあえず、run-idは適当に
まだ環境は作っていないので--envは省略

…と、ここまではやったのですが…

う、思い通りに動かない…

外部Tensorflowとやりとりはやっているようですが、Python(Tensorflow)側が投げるaction値がずっーと0ばっかり…

おかげでロボットが動いてくれません（涙）。

どうやったら１～４の値を投げてくれるのやら…configの設定が必要なのか？？？

原因を突き詰めてやるぜ！…と意気込んでみたものの

いつまで待ってても埒が明かないのでこちらで原因を突き止めることにしました。
まず、ヒントとなるのは、当方がMoveRoboメソッドに追加したデバッグログ

MazeAgent.cs

// ロボットの移動
    void MoveRobo(int action)
    {
        int dx = this.x;
        int dy = this.y;
        if(action == 1) { dy--; }
        if(action == 2) { dy++; }
        if(action == 3) { dx--; }
        if(action == 4) { dx++; }
        Debug.Log(string.Format("{0}:{1}:{2}",action, dx, dy)); // 追加したとこ
        if (dx < 0 | 3 < dx || dy < 0 || 2 < dy || MAP[dy, dx] == 9) return;
        SetRoboPosition(dx, dy);
    }

実は、ここに設定されている値がずーっと0のままなのです。ということは、ここをたどればどこで値が設定されるのかがわかるはず…

まずは AgentAction メソッド…ん？

MazeAgent.cs

public override void AgentAction(float[] vectorAction, string textAction)
    {
        :

        int action = (int)vectorAction[0];
        if (action < 0) return; // ん？？？

        MoveRobo(action);

 :

あれ、0より下になったらreturnになってる…。
あ、ここの対応を忘れていたからなのかな…

というわけで、if文を修正

MazeAgent.cs

        if (action < 1) return; // これでどうだろ？

これで、もう一度学習実行…だめだ、ログが全く出てこない。
これはもっと掘り下げないと

次に、action変数のもとになったvectorAction配列の内容は…
念の為、vectorAction[0]の値がゼロのままかどうかのチェックをします。

MazeAgent.cs

        Debug.Log(string.Format("VA: {0}", vectorAction[0]);
        int action = (int)vectorAction[0];
        if (action < 1) return;

やっぱりずっとゼロが設定されていました。

では、もっと掘り下げます。
AgentActionメソッドがどこで呼ばれているのか…いちいちたどるのはめんどくさいのでVS2019のアタッチ機能を使って呼び出し階層をチェックします。vectorActionを参照する箇所にブレークポイントを設定して実行！

…階層をチェックしてみると、 Agent.AgentStepメソッドで指定しているaction変数(AgentAction構造体)がキーと見て、それを更に追っかけます

MazeAgent.cs

               AgentAction(action.vectorActions, action.textActions, action.customAction);

action.vectorActionsメンバーがどこで更新されているのでしょうか？
検索すると、 Agent.UpdateVectorActionメソッドで更新されている模様です。

MazeAgent.cs

       public void UpdateVectorAction(float[] vectorActions)
        {
            action.vectorActions = vectorActions;
        }

Agent.UpdateVectorActionメソッドの呼び出し階層を調べてみます。

Batcher.SendBatchedMessageHelperをチェックすると…

Batcher.cs

           foreach (var brainName in rlInput.AgentActions.Keys)
            {
                if (!m_currentAgents[brainName].Any())
                {
                    continue;
                }

                if (!rlInput.AgentActions[brainName].Value.Any())
                {
                    continue;
                }

                for (var i = 0; i < m_currentAgents[brainName].Count; i++)
                {
                    var agent = m_currentAgents[brainName][i];
                    var action = rlInput.AgentActions[brainName].Value[i];
                    agent.UpdateVectorAction(action.VectorActions.ToArray());
                    agent.UpdateMemoriesAction(action.Memories.ToList());
                    agent.UpdateTextAction(action.TextActions);
                    agent.UpdateValueAction(action.Value);
                    agent.UpdateCustomAction(action.CustomAction);
                }
            }

おや、ブレーン名で値を繰り返し取得しているようですね

ここでは更新された(ココ重要)vectorActionsが設定されている…ということは

Batcher.cs

            var input = m_communicator.Exchange(
                new CommunicatorObjects.UnityOutput
                {
                    RlOutput = m_currentUnityRLOutput
                });
            m_messagesReceived += 1;

この行をブレークポイントで停止させて、 m_currentUnityRLOutput を調べてみると…(イミディエイトウィンドウで表示)

(※Unity側とPython側とではJSONを使ってやり取りしているのがわかりますね)

m_currentUnityRLOutput
"{ \"agentInfos\": { \"MazeBrain\": { \"value\": [ { \"stackedVectorObservation\": [ 0, 0.333333343 ], \"storedVectorActions\": [ 0, 0, 0, 0, 0 ], \"id\": 13706 } ] }, \"MazePlayer\": { } } }"
    AgentInfos: "{ \"MazeBrain\": { \"value\": [ { \"stackedVectorObservation\": [ 0, 0.333333343 ], \"storedVectorActions\": [ 0, 0, 0, 0, 0 ], \"id\": 13706 } ] }, \"MazePlayer\": { } }"
    GlobalDone: false
    _unknownFields: null
    agentInfos_: "{ \"MazeBrain\": { \"value\": [ { \"stackedVectorObservation\": [ 0, 0.333333343 ], \"storedVectorActions\": [ 0, 0, 0, 0, 0 ], \"id\": 13706 } ] }, \"MazePlayer\": { } }"
    globalDone_: false
    pb::Google.Protobuf.IMessage.Descriptor: {Google.Protobuf.Reflection.MessageDescriptor}

ん、MazePlayerの値が空っぽ？？？

で、もう少し追っかけてみると、 Batcher.SendBrainInfo メソッドで奇妙な動きを発見。

メソッドの引数 brainKey には MazeBrain と MazePlayer という2つのブレーン名が存在していましたが、もう一つの引数 agentInfo の Keys メンバーは MazeBrain しかありません。MazePlayer はどこ？？？