0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

OpenAI Gym で OXゲーム

Posted at

はじめに

OpenAI Gym で OXゲーム をする方法の備忘録です。
このGymをUIと連携したり、強化学習と使ったり、発展したゲームを作ったりする予定です。

ChatGPTに手伝ってもらって1時間くらいで作成しました。

出来たもの

Gymの環境をつくりました。
大人モードとランダムなエージェントとの対戦をできるようにしました。

領地の獲得は次のように番号と場所を対応させました。

|0|1|2|
|3|4|5|
|6|7|8|

下記のように、コマンドラインから動くようにしました。

% python gym_oxgame.py
Enter your play mode (0 or 1)
 0: 2 Player
 1: VS Random agent
input : 1
|_|_|_|
|_|_|_|
|_|_|_|

It is the turn of X.
Enter your move (0-8): 4
Player's action: 4
|_|_|_|
|_|X|_|
|_|_|_|

It is the turn of O.
Agent's action: 0
|O|_|_|
|_|X|_|
|_|_|_|

It is the turn of X.
Enter your move (0-8): 2
Player's action: 2
|O|_|X|
|_|X|_|
|_|_|_|

It is the turn of O.
Agent's action: 1
|O|O|X|
|_|X|_|
|_|_|_|

It is the turn of X.
Enter your move (0-8): 6
Player's action: 6
You win!

作ったもの

Gymの環境

import gym
from gym import spaces
from gym.utils import seeding
import numpy as np


class OXGameEnv(gym.Env):
    def __init__(self):
        self.board = np.zeros((3, 3))  # 3x3の盤面を作成
        self.current_player = 1  # 最初のプレイヤーは1番
        self.action_space = spaces.Discrete(9)  # 行動空間は0から8までの整数
        self.observation_space = spaces.Box(low=-1, high=1, shape=(3, 3), dtype=np.int32)  # 盤面の状態を表す
        self.seed()

    def seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def reset(self):
        self.board = np.zeros((3, 3))
        self.current_player = 1
        return self.board

    def step(self, action):
        row = action // 3
        col = action % 3

        if self.board[row][col] != 0:
            return self.board, -10, True, {}  # すでに占有された場所には置けない

        self.board[row][col] = self.current_player

        if self._is_winner(self.current_player):
            reward = 10
            done = True
        elif self._is_board_full():
            reward = 0
            done = True
        else:
            reward = 0
            done = False

        self.current_player = -self.current_player  # プレイヤーを交代する

        return self.board, reward, done, {}

    def _is_winner(self, player):
        # 行で勝利したかどうか
        for i in range(3):
            if all(self.board[i] == player):
                return True
        # 列で勝利したかどうか
        for i in range(3):
            if all(self.board[:, i] == player):
                return True
        # 対角線で勝利したかどうか
        if all(self.board.diagonal() == player) or all(np.fliplr(self.board).diagonal() == player):
            return True
        return False

    def _is_board_full(self):
        return not (self.board == 0).any()

    def render(self, mode='human'):
        for row in self.board:
            print('|', end='')
            for col in row:
                if col == 1:
                    print('X', end='|')
                elif col == -1:
                    print('O', end='|')
                else:
                    print('_', end='|')
            print('')
        print('')

Gymを動作せる部分

class RandomAgentV1:
    def __init__(self, env):
        self.action_space = env.action_space

    def choose_action(self):
        return self.action_space.sample()  # 行動空間からランダムに行動を選択する


class RandomAgentV2:
    def __init__(self, env):
        self.env = env

    def choose_action(self):
        valid_actions = np.where(self.env.board == 0)[0]  # 空の位置を取得
        return np.random.choice(valid_actions)  # 空の位置からランダムに行動を選択


def vs_player(env):
    _ = env.reset()

    done = False
    while not done:
        env.render()
        print("It is the turn of {}.".format('X' if env.current_player == 1 else 'O'))
        action = int(input("Enter your move (0-8): "))
        observation, reward, done, info = env.step(action)
        print(f"Reward: {reward}")
        if done:
            if reward == 10:
                print("You win!")
            elif reward == 0:
                print("It's a draw!")


def vs_random_agent(env):
    agent = RandomAgentV2(env)

    _ = env.reset()
    done = False
    while not done:
        env.render()
        print("It is the turn of {}.".format('X' if env.current_player == 1 else 'O'))
        if env.current_player == 1 :
            action = int(input("Enter your move (0-8): "))
            print(f"Player's action: {action}")
        else:
            action = agent.choose_action()
            print(f"Agent's action: {action}")
        obs, reward, done, info = env.step(action)
        if done:
            if reward == 10:
                print("You win!")
            elif reward == 0:
                print("It's a draw!")


def main():
    env = OXGameEnv()
    _ = env.reset()

    play_mode = int(input("Enter your play mode (0 or 1)\n 0: 2 Player\n 1: VS Random agent\ninput : "))
    if play_mode == 0:
        vs_player(env)
    elif play_mode == 1:
        vs_random_agent(env)


if __name__ == '__main__':
    main()
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?