0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

チュートリアルStep13

Last updated at Posted at 2024-10-04

スクリーンショット 2024-10-09 22.05.09.png

puyopuyo.gif
 ぷよぷよプログラミングAI学習システムは、中高生の自己調整学習に向けて開発されました。
 どうやって、Chromebookで、10分で、素敵な人工知能を作れるのでしょうか?
 秘密は、3つ。
 1 jupyter liteで、webクライアントがpythonコードを実行
 2 超軽量な機械学習フレームワークdezeroを採用
 3 ぷよぷよのstageとactionがコンパクト

Github&X

ぷよぷよプログラミングAI学習用まとめ

チュートリアルまとめ

Step13 Animation next

13 どういう風に動いているか、アニメーションで見たいよね

step12 で学習したAgentで何かが動いています。どうなっているのかみたいですね。
公式を改造して、見れるようにします。

13.1 agent learned package

puyopuyo-aiフォルダーと同じレベルにpuyopuyo-masterフォルダーがあります。puyopuyo-masterフォルダーにagent_package,agent,mod_srcフォルダーを作ります。agentフォルダーには、dezero_emb.pyをコピーします。

import os
import shutil

agent_pkg_folder_name = '../puyopuyo-master/agent_learned_package'

if not os.path.isdir(agent_pkg_folder_name):
    os.mkdir(agent_pkg_folder_name)
    shutil.copy( 'dezero_emb.py',agent_pkg_folder_name)

 
['agent_package', 'css', 'img', 'index.html', 'index_mod.html', 'README.me', 'src']

13.2 agent.py

学習済みagentです。

  • 出力 : xの位置 、 回転0,90,180,270
  • 入力 : list board. puyo_color
%%writefile $agent_pkg_folder_name/agent.py
import numpy as np
import random
from puyopuyo import *
import dezero_emb as dezero

class DQNet(dezero.Models.Model):
  def __init__(self):
    super().__init__()
    self.l1 = dezero.L.Linear(128)
    self.l2 = dezero.L.Linear(128)
    self.l3 = dezero.L.Linear(1)

  def forward(self, x):
    x = dezero.F.relu(self.l1(x))
    x = dezero.F.relu(self.l2(x))
    x = self.l3(x)
    return x

class DQNAgent:
    def __init__(self):
        self.action_size = 2
        self.qnet = DQNet()

    def __call__(self, board_list, puyo_c):
        board_list = board_list.to_py()
        board = np.zeros(CFG.Height * CFG.Width, dtype=np.int32).reshape(CFG.Height, CFG.Width)
        for i in range(CFG.Height):
            for j in range(CFG.Width):
                if board_list[i][j] != None:
                    board[i][j] = int(board_list[i][j]['puyo']) 
        puyo = Puyopuyo()
        puyo.centerPuyo = puyo_c[0]
        puyo.movablePuyo = puyo_c[1]

        action = self.learned_agent(board, puyo)
        action[1] = action[1] * 90
        return action




    def learned_agent(self, board, puyo):
        action_list = utils.create_action_list(board)
        next_boards = []
        next_reward =[]
        action =(2, 1)
        if len(action_list):
            for action in action_list:
                next_board, reward, done = utils.next_board(board, puyo, action)
                if not done:
                    next_boards.append(next_board)
                    next_reward.append(reward)
        
        next_boards = np.stack(next_boards)
        predictions = self.eval2(next_boards)
        
        next_reward =np.array(next_reward)[:, np.newaxis]
        predictions += dezero.Variable(next_reward)
        index = predictions.data.argmax()
        action = action_list[index]
        return action

    def boardtostate(self, board):
        cont_b = 2 ** np.arange(CFG.Width,dtype=np.int32)
        b1 = np.zeros(CFG.Height * CFG.Width,dtype = np.int32).reshape(CFG.Height , CFG.Width)
        b1[board == 1] = 1
        b2 = np.zeros(CFG.Height * CFG.Width,dtype = np.int32).reshape(CFG.Height , CFG.Width)
        b2[board == 2] = 1
        b3 = np.zeros(CFG.Height * CFG.Width,dtype = np.int32).reshape(CFG.Height , CFG.Width)
        b3[board == 3] = 1
        b4 = np.zeros(CFG.Height * CFG.Width,dtype = np.int32).reshape(CFG.Height , CFG.Width)
        b4[board == 4] = 1
        board_list =np.concatenate([b1,b2,b3,b4])
        state =  board_list.dot(cont_b)      
        return state

    def eval(self, board):
        state = self.boardtostate(board)      
        return self.qnet(state)

    def eval2(self, boards):
        states = []
        for i in range(boards.shape[0]):
            state = self.boardtostate(boards[i])
            states.append(state)
        states = np.stack(states)      
        return self.qnet(states)

    def load_model(self,filename):
        self.qnet.load_weights(filename)



Overwriting ../puyopuyo-master/agent_learned_package/agent.py

13.3 init

init.jsを入れます。

%%writefile $agent_pkg_folder_name/__init__.js
create_action = pyodide.runPython(`
    from agent import *
    agent = DQNAgent()
    agent.load_model('puyopuyo.npz')
    agent`);

Writing ../puyopuyo-master/agent_learned_package/__init__.js
shutil.copy("trained_models/puyopuyo.npz",agent_pkg_folder_name)

            
'../puyopuyo-master/agent_learned_package\\puyopuyo.npz'

13.3 zipファイルを作ります。

import zipfile
shutil.make_archive('agent_learned', format='zip', root_dir=agent_pkg_folder_name)
'c:\\Users\\fgtoh\\Documents\\puyopuyo-py\\jupyterlite\\draft\\content\\puyopuyo-ai\\agent_learned.zip'
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?