More than 3 years have passed since last update.

Multi-Car Racing(Gym)の使い方

Last updated at 2021-12-03Posted at 2021-12-02

はじめに

R2D2やGoliraなどの並列分散系強化学習で学習を行いたい時に、環境もそれらに対応をしてほしいです。

しかし、OpenAiのCar Racing環境では、並列化を行うことができません。

そこで、今回は、並列化に対応した「Multi-Car Racing」の使い方について、説明します。

実行環境は、google colab proを使っています。
また、以下のチュートリアルを参考にしました。

自身のgithubは以下の通りです。

感謝

下記コメント欄にあるように、山田@ymd_h 様に助けていただき、コードを大幅に変更いたしました。

import

!apt update && apt install xvfb
!git clone https://github.com/igilitschenski/multi_car_racing.git
!pip install ./multi_car_racing
!pip install -U imgaug==0.2.7 gym-notebook-wrapper

import gym
from gym import wrappers
import gym_multi_car_racing
import numpy as np
import gnwrapper

ワラッパークラス

class FlatMultiDisplayWrapper(gym.Wrapper):
    def __init__(self, env, nrows=1, ncols=1, empty_cell=0):
        super().__init__(env)
        self.nrows = nrows
        self.ncols = ncols
        self.empty_cell = empty_cell

    def render(self, mode="human", **kwargs):
        stack_images = super().render(mode, **kwargs)
        nstack = stack_images.shape[0]
        frame_shape = stack_images.shape[1:]

        if len(frame_shape) >= 2:
            self.flat_shape = list(frame_shape)
            self.flat_shape[0] = self.flat_shape[0]*self.nrows
            self.flat_shape[1] = self.flat_shape[1]*self.ncols
            frame = np.full(self.flat_shape, self.empty_cell, dtype=np.uint8)

            for i in range(min(nstack, self.nrows*self.ncols)):
                row = i // self.ncols
                col = i % self.ncols
                frame[row*frame_shape[0]:(row+1)*frame_shape[0],
                      col*frame_shape[1]:(col+1)*frame_shape[1]] = stack_images[i]
        else:
            # たまに、空の画像を stack した画像が返ってくる。
            frame = np.full(self.flat_shape, self.empty_cell, dtype=np.uint8)

        return frame

    def step(self, action):
        obs, rew, done, info = super().step(action)
        # Reward が gym.Env の仕様を満たしていないので変換する。元の値はinfoに入れておく
        info["original_reward"] = rew
        return obs, rew.sum(), done, info

本体

actions = np.array([[ 0, 0, 0],  # [0]: straight
                    [ 0, 1, 0],  # [1]: acceleration
                    [ 0, 0, 1],  # [2]: decelerate
                    [ 1, 0, 0],  # [3]: Turn right
                    [-1, 0, 0]]) # [4]: Turn left

env = gym.make("MultiCarRacing-v0", num_agents=2, direction='CCW',
        use_random_direction=True, backwards_flag=True, h_ratio=0.25,
        use_ego_color=False)

env = FlatMultiDisplayWrapper(env, nrows=1, ncols=2)
env = gnwrapper.Monitor(env, './', force = True)

obs = env.reset()
done = False
total_reward = 0

while not done:
    action = np.random.randint(5)
    action = actions[action]
    obs, reward, done, info = env.step([action,action])
    total_reward += reward

print("individual scores:", total_reward)

env.display()

環境に対するインプット、アウトプットは、リスト形式で送信する必要があります。

action: [[ 0, 0, 0], [ 0, 0, 1]]
reward: [-83.16498316 -62.96296296]

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up