LoginSignup
0
2

More than 5 years have passed since last update.

OpenAI GymのCopy-v0をSarsaで解く

Last updated at Posted at 2017-08-13

主旨

OpenAI Gym1の課題Copy-v0の解法です.

下記記事の続きでSarsaで解いています.

Q学習で解くバージョンはこちらです

解法

  • Sarsa2
Q(s,a) \leftarrow Q(s,a) + \alpha \left\{ r(s,a) + \gamma Q(s',a') - Q(s,a) \right\}
  • $\epsilon$-greedy法

コード3

import numpy as np
import gym
from gym import wrappers

def run(alpha=0.3, gamma=0.9):
    Q = {}
    env = gym.make("Copy-v0")
    env = wrappers.Monitor(env, '/tmp/copy-v0-sarsa', force=True)
    Gs = []
    for episode in range(10**6):
        x = env.reset()
        done = False
        X, A, R = [], [], [] # States, Actions, Rewards
        while not done:
            if (np.random.random() < 0.01) or (not x in Q):
                a = env.action_space.sample()
            else:
                a = sorted(Q[x].items(), key=lambda _: -_[1])[0][0]
            X.append(x)
            A.append(a)
            if not x in Q:
                Q[x] = {}
            if not a in Q[x]:
                Q[x][a] = 0
            x, r, done, _ = env.step(a)
            R.append(r)
        T = len(X)
        for t in range(T-1, -1, -1):
            if t == T-1:
                x, a, r = X[t], A[t], R[t]
                Q[x][a] += alpha * (r - Q[x][a])
            else:
                x, nx, a, na, r = X[t], X[t+1], A[t], A[t+1], R[t]
                Q[x][a] += alpha * (r + gamma * Q[nx][na] - Q[x][a])
        G = sum(R) # Revenue
        print "Episode: %d, Reward: %d" % (episode, G)
        Gs.append(G)
        if np.mean(Gs[-100:]) > 25.0:
            break

if __name__ == "__main__":
    run()

スコア

[Sarsa] Episode: 5942, Reward: 30
[Q学習] Episode: 30229, Reward: 29

References


  1. Brockman, OpenAI Gym, 2016. 

  2. J.Scholz, Markov Decision Processes and Reinforcement Learning, 2013. 

  3. namakemono, OpenAI Gym Benchmark, 2017. 

0
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
2