More than 3 years have passed since last update.

強化学習１０　学習済みのニューラルネットを使ってみる。

Last updated at 2019-11-16Posted at 2019-11-16

強化学習９まで終了済みが前提です。
開発は、jupyter　notebookを使用しています。
VSCodeなどを使用していないので、切り替えが少なくて楽です。

ChainerRLクイックスタートそのままです。
最初に、matplotlibをインストールします。
※これは、クイックインストールに書いてありません。

pip install matplotlib

以下は、Jupyter　notebookからのコピペです。

import chainer
import chainer.functions as F
import chainer.links as L
import chainerrl
import gym
import numpy as np

env = gym.make('CartPole-v0')
print('observation space:', env.observation_space)
print('action space:', env.action_space)

obs = env.reset()
#env.render()
print('initial observation:', obs)

action = env.action_space.sample()
obs, r, done, info = env.step(action)
print('next observation:', obs)
print('reward:', r)
print('done:', done)
print('info:', info)

class QFunction(chainer.Chain):

    def __init__(self, obs_size, n_actions, n_hidden_channels=50):
        super().__init__()
        with self.init_scope():
            self.l0 = L.Linear(obs_size, n_hidden_channels)
            self.l1 = L.Linear(n_hidden_channels, n_hidden_channels)
            self.l2 = L.Linear(n_hidden_channels, n_actions)

    def __call__(self, x, test=False):
        """
        Args:
            x (ndarray or chainer.Variable): An observation
            test (bool): a flag indicating whether it is in test mode
        """
        h = F.tanh(self.l0(x))
        h = F.tanh(self.l1(h))
        return chainerrl.action_value.DiscreteActionValue(self.l2(h))

obs_size = env.observation_space.shape[0]
n_actions = env.action_space.n
q_func = QFunction(obs_size, n_actions)

# Use Adam to optimize q_func. eps=1e-2 is for stability.
optimizer = chainer.optimizers.Adam(eps=1e-2)
optimizer.setup(q_func)

# Set the discount factor that discounts future rewards.
gamma = 0.95

# Use epsilon-greedy for exploration
explorer = chainerrl.explorers.ConstantEpsilonGreedy(
    epsilon=0.3, random_action_func=env.action_space.sample)

# DQN uses Experience Replay.
# Specify a replay buffer and its capacity.
replay_buffer = chainerrl.replay_buffer.ReplayBuffer(capacity=10 ** 6)

# Since observations from CartPole-v0 is numpy.float64 while
# Chainer only accepts numpy.float32 by default, specify
# a converter as a feature extractor function phi.
phi = lambda x: x.astype(np.float32, copy=False)

# Now create an agent that will interact with the environment.
agent = chainerrl.agents.DoubleDQN(
    q_func, optimizer, replay_buffer, gamma, explorer,
    replay_start_size=500, update_interval=1,
    target_update_interval=100, phi=phi)

# Start virtual display
from pyvirtualdisplay import Display
display = Display(visible=0, size=(1024, 768))
display.start()
import os
os.environ["DISPLAY"] = ":" + str(display.display) + "." + str(display.screen)

agent.load('agent')
frames = []
for i in range(3):
    obs = env.reset()
    done = False
    R = 0
    t = 0
    while not done and t < 200:
        frames.append(env.render(mode = 'rgb_array'))
        action = agent.act(obs)
        obs, r, done, _ = env.step(action)
        R += r
        t += 1
    print('test episode:', i, 'R:', R)
    agent.stop_episode()
env.render()

import matplotlib.pyplot as plt
import matplotlib.animation
import numpy as np
from IPython.display import HTML

plt.figure(figsize=(frames[0].shape[1] / 72.0, frames[0].shape[0] / 72.0), dpi = 72)
patch = plt.imshow(frames[0])
plt.axis('off')
animate = lambda i: patch.set_data(frames[i])
ani = matplotlib.animation.FuncAnimation(plt.gcf(), animate, frames=len(frames), interval = 50)
HTML(ani.to_jshtml())

windowsはかなり違うので、強化学習12にまとめて書きます。

１０までの小まとめ。
chainerrlクイックスタートは、ところどころに地雷がありますが、概ね良好でした。chainerrlは、chainerのラッパーなのでしょうか。
魔改造もやりやすく、すぐれものだと思います。tensorflowも今後使ってみますが、当面はchainerrlを使おうかなと思います。
３０くらいまでは、OpenAI　gymをやります。

何でchainerかといえば、prefferd　Networksに期待しているからです。
アメリカにはGoogleなどの研究者に多額の報酬を与えるシステムがありますが、日本にはほとんどありません。インキュベーターとして研究費を出している未踏プロジェクトも時給１６００円です。
prefferedのインターンシップは、２５００円です。しかもいろんな手当もあります。ここに、彼らの本気度があります。
そして、ベンチマークは常に高い。今後に期待しています。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

強化学習１０ 学習済みのニューラルネットを使ってみる。

強化学習１０　学習済みのニューラルネットを使ってみる。