More than 1 year has passed since last update.

本郷学園マイコン部Advent Calendar 2023

@scratch221171

強化学習 in Unityで悪戦苦闘するお話

Last updated at 2023-12-25Posted at 2023-12-17

卒業論文でUnity内で強化学習をやることになったので、その一部始終をこの記事を収めることにしました。

インストール等の下準備

Unityのインストール

公式サイトの手順に従ってインストール。
この記事ではUnityHubはv3.6.1を、UnityEditorは2022.3.4f1を使用していきます。

ML-Agentsのダウンロード

GitHubに飛び最新のバージョンをダウンロード。
ml-agents-developという名前でzipがダウンロードされるので、解凍してみると、

ここでUnityHubを開き、追加 > ml-agents-develop > Projectを選択。
初回はエディタがインストールされていません!的なことが言われるのでVisualStudioのチェックマークを外してインストールします。
私の場合チェックを外さないとダウンロードが進まないとかいう謎状況が発生していたのですが、詳しくは下の記事をご覧ください。

実際にアセットで遊んでみる

インストール完了後、改めてプロジェクトを開くと、エディターが開きます(当たり前ですね)。
私の場合は既にすこし触っているので、Senceの見た目が少し違うかもしれません。

画像では左下のProjectウィンドウから、Assets > ML-Agents > Examples > (任意のフォルダー) > Scenes > Unityマークのやつ をダブルクリック。するとSceneが開くので、すでに訓練済みのAIくんの所作を堪能しましょう。
(このgifはPushBlockのsceneを選択しています)

いくらでも見ていられますね。
また、2vs2のサッカーや歩行モデルくんなどいろいろ興味深いものもたくさんあります。

このままでも十分楽しめますが、私は環境構築して自分でさいきょうのえーあいを訓練してうはうはしたいので、その方法を探るべくネットでいろいろ探してみました。

強化学習の環境構築

やはり最初に引っかかるのは公式サイトですね。

残念ながら私は英語弱者なのでそんなに理解できませんが、翻訳の力も借りながらなんとかして構築していこうと思います。

まずUnityHubからML-Agentsと同じバージョンの新しい3Dコアプロジェクトを作成します。名前は任意ですが、ここでは強化学習の略語の"RL"を使用します。

パッケージのダウンロード

作成したら、上のサイトに従ってML-AgentsのAssetをインポートしますが、英語で私はちんぷんかんぷんだったので一応メモっておきます。

Assetsのインポート(Package Managerを使う方法)

Unityエディターで winodw > Package Manager をクリックするとPackageManagerというウィンドウが開きます。

そこで、左上のPackages: In ProjectのところをクリックしUnity Registryを選択。
そして下に表示された大量の候補の中からML Agentsを選択してInstallすればパッケージの導入はひとまず完了です。
思ったより簡単でしたね。

オブジェクト作成

公式ガイドそのままやります。

床、ターゲット、エージェント(AIが操作するオブジェクト)を作成して空オブジェの下にまとめます。

簡単な操作なのでさすがにエラーは吐かないかと思います。

エージェントのスクリプトの実装

Rigid Bodyコンポーネントを追加しておきます。
その後、スクリプトを公式ガイドからコピペして保存した途端にエラーを吐きました。

こちらが問題のスクリプトです。

RollerAgent.cs

using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;

public class RollerAgent : Agent
{
    Rigidbody rBody;
    void Start () {
        rBody = GetComponent<Rigidbody>();
    }

    public Transform Target;
    public override void OnEpisodeBegin()
    {
       // If the Agent fell, zero its momentum
        if (this.transform.localPosition.y < 0)
        {
            this.rBody.angularVelocity = Vector3.zero;
            this.rBody.velocity = Vector3.zero;
            this.transform.localPosition = new Vector3( 0, 0.5f, 0);
        }

        // Move the target to a new spot
        Target.localPosition = new Vector3(Random.value * 8 - 4,
                                           0.5f,
                                           Random.value * 8 - 4);
    }
}
public override void CollectObservations(VectorSensor sensor)
{
    // Target and Agent positions
    sensor.AddObservation(Target.localPosition);
    sensor.AddObservation(this.transform.localPosition);

    // Agent velocity
    sensor.AddObservation(rBody.velocity.x);
    sensor.AddObservation(rBody.velocity.z);
}
public float forceMultiplier = 10;
public override void OnActionReceived(ActionBuffers actionBuffers)
{
    // Actions, size = 2
    Vector3 controlSignal = Vector3.zero;
    controlSignal.x = actionBuffers.ContinuousActions[0];
    controlSignal.z = actionBuffers.ContinuousActions[1];
    rBody.AddForce(controlSignal * forceMultiplier);

    // Rewards
    float distanceToTarget = Vector3.Distance(this.transform.localPosition, Target.localPosition);

    // Reached target
    if (distanceToTarget < 1.42f)
    {
        SetReward(1.0f);
        EndEpisode();
    }

    // Fell off platform
    else if (this.transform.localPosition.y < 0)
    {
        EndEpisode();
    }
}

残念ながら私はC#についても素人なのでさっぱりです。ということでchatGPT君の出番です。

なるほどわからん状態ですが、おそらく30行目のpublic override void CollectObservations(VectorSensor sensor)のところがpublic class RollerAgent : Agent {}の下に入る必要があるということなのではないかと思ったので早速。

using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;

-public class RollerAgent : Agent
-{
-    ~
-}
-public override void CollectObservations(VectorSensor sensor)
-    {
-        ~
-    }

+public class RollerAgent : Agent
+{
+    public override void CollectObservations(VectorSensor sensor)
+    {
+        ~
+    }
+}

というような感じに修正したところ、

エラーがいくつか消えましたね。やはり脳死コピペはよくないねてへぺろ(・ω<)
この調子でどんどんやっていきましょう。

RollerAgent

using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;

public class RollerAgent : Agent
{
    Rigidbody rBody;
    void Start () {
        rBody = GetComponent<Rigidbody>();
    }

    public Transform Target;
    public override void OnEpisodeBegin()
    {
       // If the Agent fell, zero its momentum
        if (this.transform.localPosition.y < 0)
        {
            this.rBody.angularVelocity = Vector3.zero;
            this.rBody.velocity = Vector3.zero;
            this.transform.localPosition = new Vector3( 0, 0.5f, 0);
        }

        // Move the target to a new spot
        Target.localPosition = new Vector3(Random.value * 8 - 4,
                                           0.5f,
                                           Random.value * 8 - 4);
    }
    public override void CollectObservations(VectorSensor sensor)
    {
        // Target and Agent positions
        sensor.AddObservation(Target.localPosition);
        sensor.AddObservation(this.transform.localPosition);

        // Agent velocity
        sensor.AddObservation(rBody.velocity.x);
        sensor.AddObservation(rBody.velocity.z);
    }
    public float forceMultiplier = 10;
    public override void OnActionReceived(ActionBuffers actionBuffers)
    {
        // Actions, size = 2
        Vector3 controlSignal = Vector3.zero;
        controlSignal.x = actionBuffers.ContinuousActions[0];
        controlSignal.z = actionBuffers.ContinuousActions[1];
        rBody.AddForce(controlSignal * forceMultiplier);

        // Rewards
        float distanceToTarget = Vector3.Distance(this.transform.localPosition, Target.localPosition);

        // Reached target
        if (distanceToTarget < 1.42f)
        {
            SetReward(1.0f);
            EndEpisode();
        }

        // Fell off platform
        else if (this.transform.localPosition.y < 0)
        {
            EndEpisode();
        }
    }
}

一個だけ残ってしまいました。ということでまたもやchatGPT君のご登場です。

だそうです。コピペだから宣言で間違うはずなどないはずだなどと思いながらガイドを見返した結果、

using Unity.MLAgents;
using Unity.MLAgents.Sensors;
using Unity.MLAgents.Actuators;

が勝手に後半のコピペ用のところで

using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;

となっていました。そこでusing Unity.MLAgents.Actuators;を追加すると、コンソールからエラーがすべて消え去りました。ということでこちらが最終的なコードです。

RollerAgent

using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;
using Unity.MLAgents.Actuators;

public class RollerAgent : Agent
{
    Rigidbody rBody;
    void Start () {
        rBody = GetComponent<Rigidbody>();
    }

    public Transform Target;
    public override void OnEpisodeBegin()
    {
       // If the Agent fell, zero its momentum
        if (this.transform.localPosition.y < 0)
        {
            this.rBody.angularVelocity = Vector3.zero;
            this.rBody.velocity = Vector3.zero;
            this.transform.localPosition = new Vector3( 0, 0.5f, 0);
        }

        // Move the target to a new spot
        Target.localPosition = new Vector3(Random.value * 8 - 4,
                                           0.5f,
                                           Random.value * 8 - 4);
    }
    public override void CollectObservations(VectorSensor sensor)
    {
        // Target and Agent positions
        sensor.AddObservation(Target.localPosition);
        sensor.AddObservation(this.transform.localPosition);

        // Agent velocity
        sensor.AddObservation(rBody.velocity.x);
        sensor.AddObservation(rBody.velocity.z);
    }
    public float forceMultiplier = 10;
    public override void OnActionReceived(ActionBuffers actionBuffers)
    {
        // Actions, size = 2
        Vector3 controlSignal = Vector3.zero;
        controlSignal.x = actionBuffers.ContinuousActions[0];
        controlSignal.z = actionBuffers.ContinuousActions[1];
        rBody.AddForce(controlSignal * forceMultiplier);

        // Rewards
        float distanceToTarget = Vector3.Distance(this.transform.localPosition, Target.localPosition);

        // Reached target
        if (distanceToTarget < 1.42f)
        {
            SetReward(1.0f);
            EndEpisode();
        }

        // Fell off platform
        else if (this.transform.localPosition.y < 0)
        {
            EndEpisode();
        }
    }
}

さて、これでようやく次のステップに行けますね。

エディター画面での最終設定

ヒエラルキーウィンドウでRollerAgentオブジェクトを選択したら(公式ガイドの通りであれば、この時RollerAgentはTrainingAreaという空オブジェクトの子オブジェクトに設定されているはずです)、インスペクターウィンドウのRollerAgentスクリプトの所はこうなっているはずです。

そこにヒエラルキーウィンドウからTargetオブジェクトを選択してTargetのフィールド(ここではNone(Transform)となっている所)にドラッグ&ドロップします。

そしてコンポーネントDecision RequesterとBehavior Parametersを追加します。

Decision Requesterの

Decision Periodを10に設定し、

Behavior Parametersの

Behavior NameをRollerBall、
Vector Observation>Space Sizeを8、
Actions>Continuous Actionsを2に設定します。

これで、トレーニング前に環境をテストする準備が整いました。

環境のテスト

公式ガイドが環境のテストを勧めているのでやっておきます。
RollerAgentのpublic class RollerAgent : Agent {}の内側に、以下のコードを挿入しておきます。

public override void Heuristic(in ActionBuffers actionsOut)
{
    var continuousActionsOut = actionsOut.ContinuousActions;
    continuousActionsOut[0] = Input.GetAxis("Horizontal");
    continuousActionsOut[1] = Input.GetAxis("Vertical");
}

シーンを実行すると、矢印キーでボールを操作でき、ターゲットに設定されているキューブに触れるとランダムな位置に再出現しているのがわかります。また、プラットフォームから落ちるとリセットされる様子が確認できるはずです。

これでテストは完了です。

そして公式ガイドによると、なんとこれで環境構築はもう終了しているらしいです。

せっかくなのでトレーニングもしちゃいましょう。

AIのトレーニング

インストール

これによると、全体的な手順としては、

Unityをインストール(2022.3以上)
Phytonをインストール(3.10.12以上)
リポジトリのクローンを作成(任意)
com.unity.ml-agentsというUnityパッケージをインストールする
com.unity.ml-agents.extensionsというUnityパッケージもインストールする(任意)
mlagents-envsをインストール
mlagentsPythonパッケージをインストールする

とインストールだらけです。なお、3番目のやつは調べた感じサンプルシーンのダウンロードらしいのでこの記事では飛ばしていきます。ではやっていきましょう。

1. Unityをインストール(2022.3以上)

もうできているはずです。

2. Pythonをインストール

この記事では3.10.12を使用していきます。また、condaを使って仮想環境を管理することを推奨しているのでインストールしておきます。

インストールが完了すると、Anaconda Navigatorというランチャーが開くので、
CMD.exe Promptを起動し以下のコマンドを入力します。

conda create -n mlagents python=3.10.12 && conda activate mlagents

個々の構文はこういう感じです。

conda create -n 環境名 python=バージョン
&&
activate 環境名

このような画面が出るのでyを押してEnter。
こういう感じになり仮想環境が作成されます。コマンドプロンプトは開いたままにしておきます。

3. リポジトリのクローン

こちらで環境構築はすでに終えているのでスキップしたいところですが、後ろのほうでerror吐いてたのでやむを得ずクローンしておきます。
Anacondaのコマンドプロンプトから、

git clone --branch release_21 https://github.com/Unity-Technologies/ml-agents.git

と入力。これで完了です。

4. `com.unity.ml-agents`のインストール

既に完了しています。

5. `com.unity.ml-agents.extensions`のインストール

今回の企画とは関係ないのでスキップです。

6. `mlagents` Python packageのインストール

先ほどAnaconda Navigatorで開いたコマンドプロンプトを閉じた場合は、再度開き、activate mlagentsと入力してください。

それではいよいよMLagents(Python package)をインストールします。まずクローンしたリポジトリmlagentsのパスを調べましょう。

cd パス(筆者の場合はC:\Users\user\ml-agents)

その後、

python -m pip install ./ml-agents-envs
python -m pip install ./ml-agents

と入力します。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up