Pythonで〇×ゲームのAIを一から作成する　その151　置換表によるミニマックス法の枝狩り

Last updated at 2025-02-09Posted at 2025-02-02

目次と前回の記事

これまでに作成したモジュール

以下のリンクから、これまでに作成したモジュールを見ることができます。

リンク	説明
marubatsu.py	Marubatsu、Marubatsu_GUI クラスの定義
ai.py	AI に関する関数
test.py	テストに関する関数
util.py	ユーティリティ関数の定義。現在は `gui_play` のみ定義されている
tree.py	ゲーム木に関する Node、Mbtree クラスの定義
gui.py	GUI に関する処理を行う基底クラスとなる GUI クラスの定義

AI の一覧とこれまでに作成したデータファイルについては、下記の記事を参照して下さい。

α 値と β 値の初期値の変更による枝狩り

これまでは、αβ 法で評価値を計算する際に、α 値と β 値の初期値 として 負の無限大と正の無限大を設定 して計算を行いました。αβ 法では 子ノードの評価値を計算 する際に、親ノードの α 値と β 値をそのまま引き継いで 計算を行うので、下図の深さが 0 ～ 7 の一番上のノードのように、ルートノードから最初の子ノードを次々に辿ったノード の評価値を計算する際の α 値と β 値の初期値 はそれぞれ 負の無限大と正の無限大 となります。

α 狩りは子ノードの評価値が α 以下、β 狩りは子ノードの評価値が β 以上の時に行われるので、上図の 深さが 0 ～ 7 の一番上のノード の評価値が計算された際に α 狩りまたは β 狩りが行われることはありません。

例えば下図の 9 フレーム目の赤枠の最初の子ノードの評価値が 1 と計算されると、α 値 < 評価値 < β 値なので β 狩りは行われず 、下図の α 値の更新が赤字で表示 されます。

その結果、上図で下段の > ボタンをクリックすると下図のように実際に β 狩りが行われず にその 次の子ノードの評価値が計算 されます。

評価値の最小値と最大値がわかっている場合 は、α 値と β 値の 初期値をその値に設定する ことで、上図の 深さが 0 ～ 7 の一番上のノードの評価値が計算された際 に α 狩りまたは β 狩りを行うことが出来るようになる ので、その実装を行ってみることにします。

なお、評価値の最小値と最大値がいくつになるかがわからないような場合はこの手法を用いることはできません。

Mbtree クラスの `calc_score_by_ab` メソッドの修正

下記のプログラムのように、αβ 法で評価値の計算を行う calc_score_by_ab に、α 値と β 値の初期値を 評価値の最小値と最大値で初期化（initialize）するかどうか を代入する init_ab という仮引数を追加 します。

4 行目：デフォルト値を False とする仮引数 init_ab を追加する
5 ~ 7 行目：init_ab が True の場合の α 値と β 値の初期値を計算する。評価値の最小値と最大値は最短の勝利を目指すかどうかを表す shortest_victory 属性が False の場合は -1 と 1、True の場合は -2 と 3 なのでその値を計算する
8 ~ 10 行目：init_ab が False の場合の α 値と β 値をこれまで通り負の無限大と正の無限大に設定する
11 行目：上記で計算した α 値と β 値を実引数に記述して calc_ab_score を呼び出すように修正する¹

 1  from tree import Mbtree
 2  from marubatsu import Marubatsu
 3  
 4  def calc_score_by_ab(self, abroot, debug=False, init_ab=False):       
元と同じなので省略
 5      if init_ab:
 6          alpha = -2 if self.shortest_victory else -1
 7          beta = 3 if self.shortest_victory else 1
 8      else:
 9          alpha = float("-inf")
10          beta = float("inf")
11      calc_ab_score(abroot, alpha, beta)
元と同じなので省略
12      
13  Mbtree.calc_score_by_ab = calc_score_by_ab

行番号のないプログラム

from tree import Mbtree
from marubatsu import Marubatsu

def calc_score_by_ab(self, abroot, debug=False, init_ab=False):          
    def assign_pruned_index(node, index):
        node.pruned_index = index
        self.num_pruned += 1
        for childnode in node.children:
            assign_pruned_index(childnode, index)
        
    def calc_ab_score(node, alpha=float("-inf"), beta=float("inf")):
        self.nodelist_by_score.append(node)
        self.ablist_by_score.append((alpha, beta, None, "start",
                                    self.num_calculated, self.num_pruned))
        if node.mb.status != Marubatsu.PLAYING:
            self.calc_score_of_node(node)
            if node.mb.turn == Marubatsu.CIRCLE:
                alpha = node.score
            else:
                beta = node.score
        else:
            if node.mb.turn == Marubatsu.CIRCLE:
                for childnode in node.children:
                    score = calc_ab_score(childnode, alpha, beta)
                    self.nodelist_by_score.append(node)
                    self.ablist_by_score.append((alpha, beta, score, "score",
                                                self.num_calculated, self.num_pruned))
                    if score > alpha:
                        alpha = score
                    self.nodelist_by_score.append(node)
                    if score >= beta:
                        index = node.children.index(childnode)
                        for prunednode in node.children[index + 1:]:
                            assign_pruned_index(prunednode, len(self.nodelist_by_score) - 1)
                    self.ablist_by_score.append((alpha, beta, None, "update",
                                                self.num_calculated, self.num_pruned))
                    if score >= beta:
                        break
                node.score = alpha
            else:
                for childnode in node.children:
                    score = calc_ab_score(childnode, alpha, beta)
                    self.nodelist_by_score.append(node)
                    self.ablist_by_score.append((alpha, beta, score, "score",
                                                self.num_calculated, self.num_pruned))
                    if score < beta:
                        beta = score
                    self.nodelist_by_score.append(node)
                    if score <= alpha:
                        index = node.children.index(childnode)
                        for prunednode in node.children[index + 1:]:
                            assign_pruned_index(prunednode, len(self.nodelist_by_score) - 1)
                    self.ablist_by_score.append((alpha, beta, None, "update",
                                                self.num_calculated, self.num_pruned))
                    if score <= alpha:
                        break
                node.score = beta

        self.nodelist_by_score.append(node)
        self.num_calculated += 1     
        self.ablist_by_score.append((alpha, beta, None, "end",
                                    self.num_calculated, self.num_pruned))
        node.score_index = len(self.nodelist_by_score) - 1  
        return node.score

    from ai import dprint       
    for node in self.nodelist:
        node.score_index = float("inf")
        node.pruned_index = float("inf")
    self.nodelist_by_score = []
    self.ablist_by_score = []
    self.num_calculated = 0
    self.num_pruned = 0
    if init_ab:
        alpha = -2 if self.shortest_victory else -1
        beta = 3 if self.shortest_victory else 1
    else:
        alpha = float("-inf")
        beta = float("inf")
    calc_ab_score(abroot, alpha, beta)
    total_nodenum = self.num_pruned + self.num_calculated
    ratio = self.num_calculated / total_nodenum * 100
    dprint(debug, "計算したノードの数",  self.num_calculated)
    dprint(debug, "枝狩りしたノードの数",  self.num_pruned)
    dprint(debug, "合計",  total_nodenum)
    dprint(debug, f"割合 {ratio:.1f}%")
    
Mbtree.calc_score_by_ab = calc_score_by_ab

修正箇所

from tree import Mbtree
from marubatsu import Marubatsu

-def calc_score_by_ab(self, abroot, debug=False):       
+def calc_score_by_ab(self, abroot, debug=False, init_ab=False):       
元と同じなので省略
+   if init_ab:
+       alpha = -2 if self.shortest_victory else -1
+       beta = 3 if self.shortest_victory else 1
+   else:
+       alpha = float("-inf")
+       beta = float("inf")
-   calc_ab_score(abroot)
+   calc_ab_score(abroot, alpha, beta)
元と同じなので省略
    
Mbtree.calc_score_by_ab = calc_score_by_ab

下記のプログラムで 評価値の最小値と最大値を -1 と 1 としてαβ 法で評価値を計算したゲーム木のデータをファイルから読み込み、実引数に init_ab=True を記述 することで α 値と β 値の初期値を評価値の最小値と最大値に設定して αβ 法で評価値を計算しなおすと、実行結果のように計算したノードの数の 割合が 3.1 % になる ことがわかります。

mbtree = Mbtree.load("../data/abtree_root")
mbtree.calc_score_by_ab(mbtree.root, init_ab=True, debug=True)

実行結果

計算したノードの数 16811
枝狩りしたノードの数 533135
合計 549946
割合 3.1%

α 値と β 値の初期値を負の無限大と正の無限大 とした場合は以前の記事で示したように 割合は 3.3 % となるので、この工夫によって 若干枝狩りを多く行う ことが出来るようになったことが確認できました。

また、下記のプログラムを実行して先程の赤枠の子ノードの最初の子ノードの評価値が 1 と計算された 9 フレーム目を表示すると、β 値の初期値が 1 であるため下図のように β 狩りの文字が赤字 になります。

from tree import Mbtree_Anim

Mbtree_Anim(mbtree, isscore=True)

実行結果

その結果、上図で下段の > ボタンをクリックすると下図のように β 狩りが行われて残りの子ノードが暗く表示 され、先程よりも 枝狩りがより多く行われる ことが確認できました。

本記事では行いませんが、興味がある方は最短の勝利を考慮した評価値を計算する場合についても確認してみて下さい。

なお、この工夫で 新たに枝狩りが行われる可能性が生じるノード は ほんの一部に過ぎない ので、それほど多くの枝狩りが新たに行われるようになるわけではありませんが、枝狩りの効率が上がることに間違いはないので 評価値の最小値と最大値がわかっている場合はこの手法を取ったほうが良いでしょう。

置換表を利用したミニマックス法での枝狩り

〇× ゲームでは、ゲーム盤上の マークの配置が同じ であればそれまでに行われた着手の手順に関わらず 局面の状況は同じ になります。例えば、下図の局面は (0, 0)、(1, 0)、(2, 0) の順で着手が行われた場合と (2, 0)、(1, 0)、(0, 0) の順で着手が行われた場合の 2 種類の可能性 がありますが、どちらの場合でも引き分けの局面 なので、評価値は 0 になります。

あらかじめ評価値を計算したゲーム木を作成し、そこから作成した 局面と評価値の対応表を使って着手を選択 する ai_gt6 は この性質を利用 しています。

この性質を満たさないゲームは数多くあります。例えばオセロにはどこにも着手を行えない場合は手番をパスするので、ゲーム盤上の配置だけからどちらの手番であるかを判断できない場合があります。そのため、局面の状況を計算するためにはゲーム盤上の配置だけでなく手番の情報が必要になります。将棋には千日手、囲碁にはコウという特殊なルール²があるため、手番に加えてそれ以前の局面の情報が必要です。

ミニマックス法 で評価値を計算する際に、この性質を利用 することで 下記の方法で枝狩りを行うことが出来ます。なお、αβ 法 でも同様の方法をで枝狩りを行うことができますが、下記の方法だけではうまくいかない ので、次回の記事で説明することにします。

それまでに計算した 局面と評価値の対応表を記録 するようにする
ノードの評価値が確定した際 に、局面と評価値の対応表にそのノードの局面と評価値の情報を 記録する
ノードの評価値を計算を開始する際 に、局面と評価値の対応表に そのノードの局面の評価値が記録されている場合 は、記録されている値をそのノードの評価値とし、そのノードの子孫ノードの枝狩りを行う

ミニマックス法などの ゲーム木の探索で利用する このような局面と評価値の対応表のことを、局面を表すデータを評価値に置換することとから、置換表（transposition table）と呼びます。〇× ゲームの場合は、下記の 局面と評価値の対応表と全く同じ方法 で置換表を作成することが出来ます。

置換表を dict で作成 する
局面を表す文字列を計算 する
局面を表す文字列をキー とし、キーの値に評価値を代入 する

〇× ゲーム は以前の記事で説明したように局面の種類は 5478 しかないので、全ての局面の評価値を置換表に記録することが出来ます。そのため置換表を dict を利用して簡単に作成することができます。

一方、オセロ、将棋、囲碁などのようにゲーム木の ノードの数が多すぎる場合 は、置換表に記録するデータが コンピューターのメモリに収まりきらなくなる という問題が発生します。そのような場合は、置換表に記録できるデータの量 をコンピューターの メモリに収まる量に制限 し、その量を超えたデータを記録する際には、古いデータを置換表から削除する という工夫を行なう必要があるので、dict をそのまま利用することはできません。

Mbtree クラスの `calc_score_by_ab` メソッドの修正その 1

Mbtree クラスには、ミニマックス法で評価値を計算する calc_score_by_bf メソッドと calc_score_by_df メソッドが定義されていますが、これらのメソッドで評価値を計算した場合は、Mbtree_Anim で枝狩りの様子を表示することができません。

そこで置換表を使ってミニマックス法で枝狩りを行った際に、Mbtree_Anim で枝狩りの様子を表示できる ように、calc_score_by_ab を ミニマックス法で評価値を計算できるように修正 することにします。具体的には、ミニマックス法で評価値を計算するかどうかを代入する minimax という名前の仮引数を追加 することにします。

なお、calc_score_by_ab がミニマックス法の評価値の計算を行うように 修正する方法は簡単 です。どのように修正すれば良いかについて少し考えてみて下さい。

calc_score_by_ab という名前のメソッドで、ミニマックス法で評価値を計算する点に違和感を感じる人は、メソッドの名前を別の名前に変更して下さい。

αβ 法 は ミニマックス法に対して α 値と β 値を導入することで枝狩りを行う アルゴリズムです。また、αβ 法はミニマックス法で評価値を計算する際に、子ノードの評価値が α 値以下または β 値以上の場合 に枝狩りを行います。従って、calc_score_by_ab でノードの評価値を計算する際に、α 値と β 値の初期値を負の無限大と正の無限大にする ことで 枝狩りが行われなくなる ので、ミニマックス法と同じ処理 が行われるようになります。

下記はそのように calc_score_by_ab を修正したプログラムです。

1 行目：デフォルト値を False とする仮引数 minimax を追加する
9 ~ 11 行目：calc_ab_score でノードの評価値を計算する際に、minimax が True の場合は α 値と β 値に負の無限大と正の無限大を代入する
12 行目：ミニマックス法で評価値を計算したかどうかを表示するようにした

 1  def calc_score_by_ab(self, abroot, debug=False, init_ab=False, minimax=False):
 2      def assign_pruned_index(node, index):
 3          node.pruned_index = index
 4          self.num_pruned += 1
 5          for childnode in node.children:
 6              assign_pruned_index(childnode, index)
 7          
 8      def calc_ab_score(node, alpha=float("-inf"), beta=float("inf")):
 9          if minimax:
10              alpha = float("-inf")
11              beta = float("inf")
元と同じなので省略
12      dprint(debug, "ミニマックス法で計算したか", minimax)
13      dprint(debug, "計算したノードの数",  self.num_calculated)
14      dprint(debug, "枝狩りしたノードの数",  self.num_pruned)
15      dprint(debug, "合計",  total_nodenum)
16      dprint(debug, f"割合 {ratio:.1f}%")
17      
18  Mbtree.calc_score_by_ab = calc_score_by_ab

行番号のないプログラム

def calc_score_by_ab(self, abroot, debug=False, init_ab=False, minimax=False):
    def assign_pruned_index(node, index):
        node.pruned_index = index
        self.num_pruned += 1
        for childnode in node.children:
            assign_pruned_index(childnode, index)
        
    def calc_ab_score(node, alpha=float("-inf"), beta=float("inf")):
        if minimax:
            alpha = float("-inf")
            beta = float("inf")
                
        self.nodelist_by_score.append(node)
        self.ablist_by_score.append((alpha, beta, None, "start",
                                     self.num_calculated, self.num_pruned))
        if node.mb.status != Marubatsu.PLAYING:
            self.calc_score_of_node(node)
            if node.mb.turn == Marubatsu.CIRCLE:
                alpha = node.score
            else:
                beta = node.score
        else:
            if node.mb.turn == Marubatsu.CIRCLE:
                for childnode in node.children:
                    score = calc_ab_score(childnode, alpha, beta)
                    self.nodelist_by_score.append(node)
                    self.ablist_by_score.append((alpha, beta, score, "score",
                                                 self.num_calculated, self.num_pruned))
                    if score > alpha:
                        alpha = score
                    self.nodelist_by_score.append(node)
                    if score >= beta:
                        index = node.children.index(childnode)
                        for prunednode in node.children[index + 1:]:
                            assign_pruned_index(prunednode, len(self.nodelist_by_score) - 1)
                    self.ablist_by_score.append((alpha, beta, None, "update",
                                                 self.num_calculated, self.num_pruned))
                    if score >= beta:
                        break
                node.score = alpha
            else:
                for childnode in node.children:
                    score = calc_ab_score(childnode, alpha, beta)
                    self.nodelist_by_score.append(node)
                    self.ablist_by_score.append((alpha, beta, score, "score",
                                                 self.num_calculated, self.num_pruned))
                    if score < beta:
                        beta = score
                    self.nodelist_by_score.append(node)
                    if score <= alpha:
                        index = node.children.index(childnode)
                        for prunednode in node.children[index + 1:]:
                            assign_pruned_index(prunednode, len(self.nodelist_by_score) - 1)
                    self.ablist_by_score.append((alpha, beta, None, "update",
                                                 self.num_calculated, self.num_pruned))
                    if score <= alpha:
                        break
                node.score = beta

        self.nodelist_by_score.append(node)
        self.num_calculated += 1     
        self.ablist_by_score.append((alpha, beta, None, "end",
                                     self.num_calculated, self.num_pruned))
        node.score_index = len(self.nodelist_by_score) - 1  
        return node.score

    from ai import dprint       
    for node in self.nodelist:
        node.score_index = float("inf")
        node.pruned_index = float("inf")
    self.nodelist_by_score = []
    self.ablist_by_score = []
    self.num_calculated = 0
    self.num_pruned = 0
    if init_ab:
        alpha = -2 if self.shortest_victory else -1
        beta = 3 if self.shortest_victory else 1
    else:
        alpha = float("-inf")
        beta = float("inf")
    calc_ab_score(abroot, alpha, beta)
    total_nodenum = self.num_pruned + self.num_calculated
    ratio = self.num_calculated / total_nodenum * 100
    dprint(debug, "ミニマックス法で計算したか", minimax)
    dprint(debug, "計算したノードの数",  self.num_calculated)
    dprint(debug, "枝狩りしたノードの数",  self.num_pruned)
    dprint(debug, "合計",  total_nodenum)
    dprint(debug, f"割合 {ratio:.1f}%")
    
Mbtree.calc_score_by_ab = calc_score_by_ab

修正箇所

-def calc_score_by_ab(self, abroot, debug=False, init_ab=False, minimax=False):
+def calc_score_by_ab(self, abroot, debug=False, init_ab=False):
    def assign_pruned_index(node, index):
        node.pruned_index = index
        self.num_pruned += 1
        for childnode in node.children:
            assign_pruned_index(childnode, index)
        
    def calc_ab_score(node, alpha=float("-inf"), beta=float("inf")):
+       if minimax:
+           alpha = float("-inf")
+           beta = float("inf")
元と同じなので省略
+   dprint(debug, "ミニマックス法で計算したか", minimax)
    dprint(debug, "計算したノードの数",  self.num_calculated)
    dprint(debug, "枝狩りしたノードの数",  self.num_pruned)
    dprint(debug, "合計",  total_nodenum)
    dprint(debug, f"割合 {ratio:.1f}%")
    
Mbtree.calc_score_by_ab = calc_score_by_ab

上記の修正後に下記のプログラムのように キーワード引数 minimax=True を記述 して calc_score_by_ab を呼び出すと、実行結果のように 枝狩りしたノードの数が 0 と表示され、枝狩りが行われないミニマックス法で評価値の計算が行われた ことが確認できます。

mbtree.calc_score_by_ab(mbtree.root, minimax=True, debug=True)

実行結果

ミニマックス法で計算したか True
計算したノードの数 549946
枝狩りしたノードの数 0
合計 549946
割合 100.0%

Mbtree クラスの `calc_score_by_ab` メソッドの修正その 2

次に、calc_score_by_ab に置換表（transposition table）を利用するかどうかを代入する use_tt という名前の仮引数を追加 し、下記のプログラムのように 置換表を利用したミニマックス法で評価値を計算できる ように calc_score_by_ab を修正します。

なお、Mbtree_Anim で置換表による枝狩りが行われたことを表示できる ように、置換表による枝狩りが行われた場合の アニメーションのフレームの状態を "tt" という文字列で表す ことにします。

1 行目：デフォルト値を False とする仮引数 use_tt を追加する
2 行目：ノードの評価値を計算するローカル関数 calc_ab_score に 置換表（transposition table）を表す dict を代入する仮引数 tt を追加 する。以前の記事で説明したようにデフォルト引数のデフォルト値に dict のようなミュータブルなデータを代入するとバグの原因になりやすいので、通常の仮引数として tt を追加 する。なお、通常の仮引数はデフォルト引数よりも前に記述する必要がある点に注意すること
3 ~ 14、26、27 行目：置換表を利用する use_tt が True の場合の処理を行う
4、5 行目：board_to_str メソッドを呼び出して ゲーム盤を表す文字列を計算し、その文字列のキーが置換表の dict に存在するかどうかを判定する
6 行目：置換表に存在する場合は、そのキーの値をそのノードの評価値とする
7、8、11 ~ 13 行目：評価値が計算されたので、このノードの情報を nodelist_by_score と ab_list_by_score の要素に記録する。この部分の処理はこのノードの評価値が確定した際に行う 21 ~ 25 行目とほぼ同じだが、Mbtree_Anim での表示を区別できるように、11 行目で ablist_by_score に記録する このノードの状態を "tt" とした点が異なる
9、10 行目：置換表にこのノードの評価値が記録されている場合は、このノードの 全ての子ノードの枝狩りが行われる ので、assign_pruned_index を呼び出して 枝狩りが行われたノードの数 と、子ノードとその子孫ノードが枝狩りが行われたことを記録 する。この処理は、11 行目の ablist_by_score の要素に 枝狩りが行われた数を登録する前 に 計算する必要がある 点に注意すること
14 行目：評価値を返り値として返して calc_ab_score の処理を終了す る
17、20 行目：子ノードの評価値を計算するために calc_ab_score を再帰呼び出しする際に、実引数 tt を記述する ように修正する
27、28 行目：このノードの 評価値が確定した際 に、このノードの評価値の情報を置換表に記録 する
30 行目：評価値の計算を開始する際 は 置換表の中身は空 なので、置換表のデータとして 実引数に空の dict を記述 して calc_ab_score を呼び出すように修正する

 1  def calc_score_by_ab(self, abroot, debug=False, init_ab=False, minimax=False, use_tt=False):
元と同じなので省略
 2      def calc_ab_score(node, tt, alpha=float("-inf"), beta=float("inf")):
 3          if use_tt:
 4              boardtxt = node.mb.board_to_str()
 5              if boardtxt in tt:
 6                  node.score = tt[boardtxt]
 7                  self.nodelist_by_score.append(node)
 8                  self.num_calculated += 1     
 9                  for childnode in node.children:
10                      assign_pruned_index(childnode, len(self.nodelist_by_score) - 1)
11                  self.ablist_by_score.append((alpha, beta, None, "tt",
12                                               self.num_calculated, self.num_pruned))
13                  node.score_index = len(self.nodelist_by_score) - 1  
14                  return node.score
元と同じなので省略
15              if node.mb.turn == Marubatsu.CIRCLE:
16                  for childnode in node.children:
17                      score = calc_ab_score(childnode, tt, alpha, beta)
元と同じなので省略
18              else:
19                  for childnode in node.children:
20                      score = calc_ab_score(childnode, tt, alpha, beta)
元と同じなので省略
21          self.nodelist_by_score.append(node)
22          self.num_calculated += 1     
23          self.ablist_by_score.append((alpha, beta, None, "end",
24                                       self.num_calculated, self.num_pruned))
25          node.score_index = len(self.nodelist_by_score) - 1  
26          if use_tt:
27              tt[boardtxt] = node.score        
28          return node.score
29  
元と同じなので省略
30      calc_ab_score(abroot, {}, alpha, beta)
元と同じなので省略
31      
32  Mbtree.calc_score_by_ab = calc_score_by_ab

行番号のないプログラム

def calc_score_by_ab(self, abroot, debug=False, init_ab=False, minimax=False, use_tt=False):
    def assign_pruned_index(node, index):
        node.pruned_index = index
        self.num_pruned += 1
        for childnode in node.children:
            assign_pruned_index(childnode, index)
        
    def calc_ab_score(node, tt, alpha=float("-inf"), beta=float("inf")):
        if use_tt:
            boardtxt = node.mb.board_to_str()
            if boardtxt in tt:
                node.score = tt[boardtxt]
                self.nodelist_by_score.append(node)
                self.num_calculated += 1     
                for childnode in node.children:
                    assign_pruned_index(childnode, len(self.nodelist_by_score) - 1)
                self.ablist_by_score.append((alpha, beta, None, "tt",
                                             self.num_calculated, self.num_pruned))
                node.score_index = len(self.nodelist_by_score) - 1  
                return node.score

        if minimax:
            alpha = float("-inf")
            beta = float("inf")
                
        self.nodelist_by_score.append(node)
        self.ablist_by_score.append((alpha, beta, None, "start",
                                     self.num_calculated, self.num_pruned))
        if node.mb.status != Marubatsu.PLAYING:
            self.calc_score_of_node(node)
            if node.mb.turn == Marubatsu.CIRCLE:
                alpha = node.score
            else:
                beta = node.score
        else:
            if node.mb.turn == Marubatsu.CIRCLE:
                for childnode in node.children:
                    score = calc_ab_score(childnode, tt, alpha, beta)
                    self.nodelist_by_score.append(node)
                    self.ablist_by_score.append((alpha, beta, score, "score",
                                                 self.num_calculated, self.num_pruned))
                    if score > alpha:
                        alpha = score
                    self.nodelist_by_score.append(node)
                    if score >= beta:
                        index = node.children.index(childnode)
                        for prunednode in node.children[index + 1:]:
                            assign_pruned_index(prunednode, len(self.nodelist_by_score) - 1)
                    self.ablist_by_score.append((alpha, beta, None, "update",
                                                 self.num_calculated, self.num_pruned))
                    if score >= beta:
                        break
                node.score = alpha
            else:
                for childnode in node.children:
                    score = calc_ab_score(childnode, tt, alpha, beta)
                    self.nodelist_by_score.append(node)
                    self.ablist_by_score.append((alpha, beta, score, "score",
                                                 self.num_calculated, self.num_pruned))
                    if score < beta:
                        beta = score
                    self.nodelist_by_score.append(node)
                    if score <= alpha:
                        index = node.children.index(childnode)
                        for prunednode in node.children[index + 1:]:
                            assign_pruned_index(prunednode, len(self.nodelist_by_score) - 1)
                    self.ablist_by_score.append((alpha, beta, None, "update",
                                                 self.num_calculated, self.num_pruned))
                    if score <= alpha:
                        break
                node.score = beta

        self.nodelist_by_score.append(node)
        self.num_calculated += 1     
        self.ablist_by_score.append((alpha, beta, None, "end",
                                     self.num_calculated, self.num_pruned))
        node.score_index = len(self.nodelist_by_score) - 1  
        if use_tt:
            tt[boardtxt] = node.score        
        return node.score

    from ai import dprint       
    for node in self.nodelist:
        node.score_index = float("inf")
        node.pruned_index = float("inf")
    self.nodelist_by_score = []
    self.ablist_by_score = []
    self.num_calculated = 0
    self.num_pruned = 0
    if init_ab:
        alpha = -2 if self.shortest_victory else -1
        beta = 3 if self.shortest_victory else 1
    else:
        alpha = float("-inf")
        beta = float("inf")
    calc_ab_score(abroot, {}, alpha, beta)
    total_nodenum = self.num_pruned + self.num_calculated
    ratio = self.num_calculated / total_nodenum * 100
    dprint(debug, "ミニマックス法で計算したか", minimax)
    dprint(debug, "計算したノードの数",  self.num_calculated)
    dprint(debug, "枝狩りしたノードの数",  self.num_pruned)
    dprint(debug, "合計",  total_nodenum)
    dprint(debug, f"割合 {ratio:.1f}%")
    
Mbtree.calc_score_by_ab = calc_score_by_ab

修正箇所

-def calc_score_by_ab(self, abroot, debug=False, init_ab=False, minimax=False):
+def calc_score_by_ab(self, abroot, debug=False, init_ab=False, minimax=False, use_tt=False):
元と同じなので省略
-   def calc_ab_score(node, alpha=float("-inf"), beta=float("inf")):
+   def calc_ab_score(node, tt, alpha=float("-inf"), beta=float("inf")):
+       if use_tt:
+           boardtxt = node.mb.board_to_str()
+           if boardtxt in tt:
+               node.score = tt[boardtxt]
+               self.nodelist_by_score.append(node)
+               self.num_calculated += 1     
+               for childnode in node.children:
+                   assign_pruned_index(childnode, len(self.nodelist_by_score) - 1)
+               self.ablist_by_score.append((alpha, beta, None, "tt",
+                                            self.num_calculated, self.num_pruned))
+               node.score_index = len(self.nodelist_by_score) - 1  
+               return node.score
元と同じなので省略
            if node.mb.turn == Marubatsu.CIRCLE:
                for childnode in node.children:
-                   score = calc_ab_score(childnode, alpha, beta)
+                   score = calc_ab_score(childnode, tt, alpha, beta)
元と同じなので省略
            else:
                for childnode in node.children:
-                   score = calc_ab_score(childnode, alpha, beta)
+                   score = calc_ab_score(childnode, tt, alpha, beta)
元と同じなので省略
        self.nodelist_by_score.append(node)
        self.num_calculated += 1     
        self.ablist_by_score.append((alpha, beta, None, "end",
                                     self.num_calculated, self.num_pruned))
        node.score_index = len(self.nodelist_by_score) - 1  
+       if use_tt:
+           tt[boardtxt] = node.score        
        return node.score

元と同じなので省略
-   calc_ab_score(abroot, alpha, beta)
+   calc_ab_score(abroot, {}, alpha, beta)
元と同じなので省略
    
Mbtree.calc_score_by_ab = calc_score_by_ab

上記の修正後に下記のプログラムを実行すると、実行結果のように 置換表を使ってミニマックス法で評価値を計算 した際の 割合が 2.9 % になることが確認できます。これは、αβ 法 を使って評価値を計算した際の 3.3 % よりも低い数値 です。

mbtree.calc_score_by_ab(mbtree.root, minimax=True, use_tt=True, debug=True)

実行結果

ミニマックス法で計算したか True
計算したノードの数 16168
枝狩りしたノードの数 533778
合計 549946
割合 2.9%

このことから、〇× ゲームの場合 は 置換表を使ってミニマックス法 で ルートノードの評価値を計算 すると、αβ 法と同程度の計算速度の向上が見込める ことがわかりました。

Mbtree_Anim のクラスの `update_ab` メソッドの修正

上記で 置換表による枝狩りが行われた際 に、ノードの状態 として "tt" という文字列を記録する ようにしたので、それに合わせて 置換表による枝狩りが行われたことがわかるように Mbtree_Anim を修正する ことにします。具体的には置換表による枝狩りが行われて評価値が確定した場合は 背景色を薄緑色で表示 し、1 行目の説明文を赤色で「置換表による置換」と表示する ことにします。

下記はそのように update_ab メソッドを修正したプログラムです。

8 行目：そのフレームの状態が置換表による評価値の確定であるかどうかを判定する
9、10 行目：1 行目の説明文を「置換表による置換」にし、文字の色を赤色にする
11 行目：背景色を薄緑色に設定する

 1  from tree import Mbtree_Anim
 2  import matplotlib.patches as patches
 3  
 4  def update_ab(self):
元と同じなので省略
 5      if status == "start":
 6          facecolor = "white"
 7          nodetype += " 処理の開始"
元と同じなので省略
 8      elif status == "tt":
 9          textlist[0] = "置換表による置換"
10          textcolorlist[0] = "red"
11          facecolor = "honeydew"
元と同じなので省略
12  
13  Mbtree_Anim.update_ab = update_ab

行番号のないプログラム

from tree import Mbtree_Anim
import matplotlib.patches as patches

def update_ab(self):
    alpha, beta, score, status, num_calculated, num_pruned = self.mbtree.ablist_by_score[self.play.value]
    maxnode = self.selectednode.mb.turn == Marubatsu.CIRCLE
    acolor = "red" if maxnode else "black"
    bcolor = "black" if maxnode else "red"
                            
    self.abax.clear()
    self.abax.set_xlim(-4, 23)
    self.abax.set_ylim(-1.5, 1.5)
    self.abax.axis("off")

    minus_inf = -3
    plus_inf = 4   
    alphacoord = max(minus_inf, alpha)
    betacoord = min(plus_inf, beta)
    
    color = "lightgray" if maxnode else "aqua"
    rect = patches.Rectangle(xy=(minus_inf, -0.5), width=alphacoord-minus_inf,
                            height=1, fc=color)
    self.abax.add_patch(rect)
    rect = patches.Rectangle(xy=(alphacoord, -0.5), width=betacoord-alphacoord,
                            height=1, fc="yellow")
    self.abax.add_patch(rect)
    color = "aqua" if maxnode else "lightgray"
    rect = patches.Rectangle(xy=(betacoord, -0.5), width=plus_inf-betacoord,
                            height=1, fc=color)
    self.abax.add_patch(rect)

    self.abax.plot(range(minus_inf, plus_inf + 1), [0] * (plus_inf + 1 - minus_inf) , "|-k")
    for num in range(minus_inf, plus_inf + 1):
        if num == minus_inf:
            numtext = "-∞"
        elif num == plus_inf:
            numtext = "∞"
        else:
            numtext = num
        self.abax.text(num, -1, numtext, ha="center")
        
    arrowprops = { "arrowstyle": "->"}
    self.abax.plot(alphacoord, 0, "or")
    self.abax.annotate(f"α = {alpha}", xy=(alphacoord, 0), xytext=(minus_inf, 1),
                    arrowprops=arrowprops, ha="center", c=acolor)
    self.abax.plot(betacoord, 0, "ob")
    self.abax.annotate(f"β = {beta}", xy=(betacoord, 0), xytext=(plus_inf, 1),
                    arrowprops=arrowprops, ha="center", c=bcolor)
    if score is not None:
        self.abax.plot(score, 0, "og")
        self.abax.annotate(f"score = {score}", xy=(score, 0),
                        xytext=((minus_inf + plus_inf) / 2, 1), 
                        arrowprops=arrowprops, ha="center")
            
    facecolorlist = ["aqua", "yellow", "lightgray"]
    textcolorlist = ["black", "black", "black"]
    if maxnode:
        nodetype = f"深さ {self.selectednode.mb.move_count} max node"
        textlist = ["β 狩り (β ≦ score)", "α 値の更新 (α < score < β)", "α 値の更新なし (score ≦ α)"]
        if score is not None:
            if beta <= score:
                textcolorlist[0] = "red"
            elif alpha < score:
                textcolorlist[1] = "red"
            else:
                textcolorlist[2] = "red"
    else:
        nodetype = f"深さ {self.selectednode.mb.move_count} min node"
        textlist = ["α 狩り (score <= α)", "β 値の更新 (α < score < β)", "β 値の更新なし (score ≦ β)"]
        if score is not None :
            if score <= alpha:
                textcolorlist[0] = "red"
            elif score < beta:
                textcolorlist[1] = "red"
            else:
                textcolorlist[2] = "red"
            
    if status == "start":
        facecolor = "white"
        nodetype += " 処理の開始"
    elif status == "score":
        facecolor = "lightyellow"
        nodetype += " 子ノードの評価値"
    elif status == "update":
        facecolor = "lightcyan"
        if maxnode:
            nodetype += " α 値の処理"
        else:
            nodetype += " β 値の処理"
    elif status == "tt":
        textlist[0] = "置換表による置換"
        textcolorlist[0] = "red"
        facecolor = "honeydew"
    else:
        facecolor = "lavenderblush"
        nodetype += " 評価値の確定"
    self.abfig.set_facecolor(facecolor)
    self.abax.text(6, 1, nodetype)   
    for i in range(3):
        rect = patches.Rectangle(xy=(5, 0.3 - i * 0.7), width=0.8, height=0.5, fc=facecolorlist[i])
        self.abax.add_patch(rect)
        self.abax.text(6, 0.4 - i * 0.7, textlist[i], c=textcolorlist[i])    
    
    num_total = num_calculated + num_pruned
    num_ratio = num_calculated / num_total if num_total != 0 else 0
    _, _, _, _, prev_num_calculated, prev_num_pruned = self.mbtree.ablist_by_score[self.prev_frame]
    prev_num_total = prev_num_calculated + prev_num_pruned
    diff_num_calculated = num_calculated - prev_num_calculated
    diff_num_pruned = num_pruned - prev_num_pruned
    diff_num_total = num_total - prev_num_total
    diff_num_ratio = diff_num_calculated / diff_num_total if diff_num_total != 0 else 0

    textlist = [ "計算済", "枝狩り", "合計", "割合" ]
    datalist = [ num_calculated, num_pruned, num_total, f"{num_ratio * 100:.1f}%"]
    diff_datalist = [ f"{diff_num_calculated:+d}", f"{diff_num_pruned:+d}", 
                    f"{diff_num_total:+d}", f"{diff_num_ratio * 100:.1f}%"]
    for i in range(4):
        self.abax.text(15, 1 - i * 0.7, textlist[i])
        self.abax.text(19.5, 1 - i * 0.7, datalist[i], ha="right")
        self.abax.text(22.5, 1 - i * 0.7, diff_datalist[i], ha="right")

Mbtree_Anim.update_ab = update_ab

修正箇所

from tree import Mbtree_Anim
import matplotlib.patches as patches

def update_ab(self):
元と同じなので省略
    if status == "start":
        facecolor = "white"
        nodetype += " 処理の開始"
元と同じなので省略
+   elif status == "tt":
+       textlist[0] = "置換表による置換"
+       textcolorlist[0] = "red"
+       facecolor = "honeydew"
元と同じなので省略

Mbtree_Anim.update_ab = update_ab

上記の修正後に下記のプログラムを実行し、(0, 0)、(1, 0)、(2, 0) の順で着手が行われた局面の 評価値が計算された 1676 フレーム目を表示すると、実行結果のように 赤枠のノードの子ノードが明るく表示される ので、赤枠の局面の 子ノードの評価値がすべて計算された ことが確認できます。

Mbtree_Anim(mbtree, isscore=True)

実行結果

次に、その後のフレームで 同じ局面になる (2, 0)、(1, 0)、(0, 0) の順で着手が行われた局面の 評価値が計算された 32690 フレーム目を表示すると実行結果のような図が表示されます。

なお、下図では 枝狩りが行われた数がわかる ように、直前の 32689 フレーム目を表示した後で上段の > をクリックして 32690 フレーム目を表示した場合の図です。

上図は 以下のように表示される ことから、置換表によって評価値が計算され、その 子孫ノードが枝狩りされたことがわかる ように表示されています。また、枝狩りの右に +1348 と表示されるので 置換表による枝狩り によって 1348 個のノードの枝狩りが行われた ことがわかります。興味がある方は、他の置換表に登録されたノードでも同様の表示が行われることを確認してみて下さい。

背景色が薄緑色で表示 される
数直線の右に 赤字で「置換表による置換」と表示 される
赤枠の 全ての子孫ノード が枝狩りが行われたことを表す さらに暗い色で表示 される

筆者も最初は気が付かなかったのですが、上図には一つ大きな問題 があります。どのような問題があるかについて少し考えてみて下さい。

問題の検証

置換表によって評価値が計算されるノード は、他のノードが "start" の状態のフレームから始まり、"end" の状態のフレームで終わるのと異なり、"tt" 状態のフレームしか存在しません。そのため、下段の <<、<、>、>> による 選択中のノード内の移動の操作を行うことはできません が、それにも関わらず上図ではそれらの 4 つのボタンが緑色で表示 されて クリックできる ようになっています。

また、4 つのボタンをクリックした際に行われる処理は、このノードの "start" や "end" などの状態のフレームを探す という処理ですが、そのような状態のフレームが存在しない ので いずれは list の範囲外のインデックスの要素を参照 しようとするため エラーが発生 します。下記は 32690 フレーム目で下段の << ボタンをクリックした際に表示されるエラーメッセージで、実際に list の範囲外（out of range）のインデックスを参照した という意味のメッセージが表示されています。この問題を修正する方法について少し考えてみて下さい。

略
File c:\Users\ys\ai\marubatsu\151\tree.py:1244, in Mbtree_Anim.create_event_handler.<locals>.change_frame(edge_status, diff, status_list)
   1242 while True:
   1243     frame += diff
-> 1244     node = self.mbtree.nodelist_by_score[frame]
   1245     status = self.mbtree.ablist_by_score[frame][3]
   1246     if node == selectednode and status in status_list:

IndexError: list index out of range

Mbtree_Anim のクラスの `update_gui` メソッドの修正

この問題は フレームの状態が "tt" の場合 に 下段の 4 つのボタンをクリックできないようにする ことで修正できます。ボタンの状態の設定 は update_gui メソッド内で行っているので下記のプログラムのように update_gui メソッドを修正します。

5、8 行目：4 つのボタンが操作できないようするに条件に「フレームの状態が "tt" である」という条件を追加する

 1  def update_gui(self):
元と同じなので省略
 2      if self.abfig is not None:
 3          self.update_ab()
 4          status = self.mbtree.ablist_by_score[self.play.value][3]
 5          disabled = status == "start" or status == "tt"
 6          self.set_button_status(self.node_first_button, disabled=disabled)
 7          self.set_button_status(self.node_prev_button, disabled=disabled)
 8          disabled = status == "end" or status == "tt"
 9          self.set_button_status(self.node_next_button, disabled=disabled)
10          self.set_button_status(self.node_last_button, disabled=disabled)
元と同じなので省略
11      
12  Mbtree_Anim.update_gui = update_gui

行番号のないプログラム

def update_gui(self):
    self.ax.clear()
    self.ax.set_xlim(-1, self.width - 1)
    self.ax.set_ylim(-1, self.height - 1)   
    self.ax.invert_yaxis()
    self.ax.axis("off")   
    
    self.selectednode = self.nodelist[self.play.value]
    self.centernode = self.selectednode
    if self.mbtree.algo == "bf":
        if self.centernode.depth > 0:
            self.centernode = self.centernode.parent
    while self.centernode.depth > 6:
        self.centernode = self.centernode.parent
    if self.centernode.depth <= 4:
        maxdepth = self.centernode.depth + 1
    elif self.centernode.depth == 5:
        maxdepth = 7
    else:
        maxdepth = 9
    self.mbtree.draw_subtree(centernode=self.centernode, selectednode=self.selectednode,
                             anim_frame=self.play.value, isscore=self.isscore, 
                             ax=self.ax, maxdepth=maxdepth, size=self.size)
    if self.abfig is not None:
        self.update_ab()
        status = self.mbtree.ablist_by_score[self.play.value][3]
        disabled = status == "start" or status == "tt"
        self.set_button_status(self.node_first_button, disabled=disabled)
        self.set_button_status(self.node_prev_button, disabled=disabled)
        disabled = status == "end" or status == "tt"
        self.set_button_status(self.node_next_button, disabled=disabled)
        self.set_button_status(self.node_last_button, disabled=disabled)

    disabled = self.play.value == 0
    self.set_button_status(self.prev_button, disabled=disabled)
    disabled = self.play.value == self.nodenum - 1
    self.set_button_status(self.next_button, disabled=disabled)
    
Mbtree_Anim.update_gui = update_gui

修正箇所

def update_gui(self):
元と同じなので省略
    if self.abfig is not None:
        self.update_ab()
        status = self.mbtree.ablist_by_score[self.play.value][3]
-       disabled = status == "start"
+       disabled = status == "start" or status == "tt"
        self.set_button_status(self.node_first_button, disabled=disabled)   
        self.set_button_status(self.node_prev_button, disabled=disabled)
-       disabled = status == "end"
+       disabled = status == "end" or status == "tt"
        self.set_button_status(self.node_next_button, disabled=disabled)                       
        self.set_button_status(self.node_last_button, disabled=disabled)
元と同じなので省略
    
Mbtree_Anim.update_gui = update_gui

上記の修正後に下記のプログラムを実行して 32690 フレーム目を表示すると、実行結果のように 下段の 4 つのボタンが灰色で表示されて操作できなくなった ことが確認できます。

Mbtree_Anim(mbtree, isscore=True)

実行結果

同一局面を考慮した置換表による枝狩り

以前の記事で、〇×ゲームには回転による同一局面と、線対称な 同一局面が存在する ことを説明しました。〇× ゲームの性質 から 同一局面の評価値は同じになる ので、評価値が計算されたノードの情報を 置換表に登録する際 に、同一局面に対しても同じ評価値を置換表に登録する ことで、より多くの枝狩り を行うことが出来るようになります。

同一局面の計算 は、以前の記事で定義した calc_same_boardtexts で行うことができるので、下記のプログラムのように calc_score_by_ab を修正します。

2 ~ 7 行目：calc_same_boardtexts を使って、ノードの同一局面の文字列の list を計算 し、全ての同一局面の評価値を置換表に登録する ように修正する。循環参照を避けるために、3 行目の calc_same_boardtext のインポートは calc_score_by_ab の中に記述した

 1  def calc_score_by_ab(self, abroot, debug=False, minimax=False, init_ab=False, use_tt=False):
元と同じなので省略
 2          if use_tt:
 3              from util import calc_same_boardtexts
 4
 5              boardtxtlist = calc_same_boardtexts(node.mb)
 6              for boardtxt in boardtxtlist:
 7                  tt[boardtxt] = node.score
 8          return node.score
元と同じなので省略
 9      
10  Mbtree.calc_score_by_ab = calc_score_by_ab

行番号のないプログラム

def calc_score_by_ab(self, abroot, debug=False, minimax=False, init_ab=False, use_tt=False):          
    def assign_pruned_index(node, index):
        node.pruned_index = index
        self.num_pruned += 1
        for childnode in node.children:
            assign_pruned_index(childnode, index)
        
    def calc_ab_score(node, tt, alpha=float("-inf"), beta=float("inf")):
        if minimax:
            alpha = float("-inf")
            beta = float("inf")
        if use_tt:
            boardtxt = node.mb.board_to_str()
            if boardtxt in tt:
                node.score = tt[boardtxt]
                if node.mb.turn == Marubatsu.CIRCLE:
                    alpha = node.score
                else:
                    beta = node.score
                self.nodelist_by_score.append(node)
                self.num_calculated += 1     
                for childnode in node.children:
                    assign_pruned_index(childnode, len(self.nodelist_by_score) - 1)
                self.ablist_by_score.append((alpha, beta, None, "tt",
                                             self.num_calculated, self.num_pruned))
                node.score_index = len(self.nodelist_by_score) - 1  
                return node.score
                
        self.nodelist_by_score.append(node)
        self.ablist_by_score.append((alpha, beta, None, "start",
                                    self.num_calculated, self.num_pruned))
        if node.mb.status != Marubatsu.PLAYING:
            self.calc_score_of_node(node)
            if node.mb.turn == Marubatsu.CIRCLE:
                alpha = node.score
            else:
                beta = node.score
        else:
            if node.mb.turn == Marubatsu.CIRCLE:
                for childnode in node.children:
                    score = calc_ab_score(childnode, tt, alpha, beta)
                    self.nodelist_by_score.append(node)
                    self.ablist_by_score.append((alpha, beta, score, "score",
                                                self.num_calculated, self.num_pruned))
                    if score > alpha:
                        alpha = score
                    self.nodelist_by_score.append(node)
                    if score >= beta:
                        index = node.children.index(childnode)
                        for prunednode in node.children[index + 1:]:
                            assign_pruned_index(prunednode, len(self.nodelist_by_score) - 1)
                    self.ablist_by_score.append((alpha, beta, None, "update",
                                                self.num_calculated, self.num_pruned))
                    if score >= beta:
                        break
                node.score = alpha
            else:
                for childnode in node.children:
                    score = calc_ab_score(childnode, tt, alpha, beta)
                    self.nodelist_by_score.append(node)
                    self.ablist_by_score.append((alpha, beta, score, "score",
                                                self.num_calculated, self.num_pruned))
                    if score < beta:
                        beta = score
                    self.nodelist_by_score.append(node)
                    if score <= alpha:
                        index = node.children.index(childnode)
                        for prunednode in node.children[index + 1:]:
                            assign_pruned_index(prunednode, len(self.nodelist_by_score) - 1)
                    self.ablist_by_score.append((alpha, beta, None, "update",
                                                self.num_calculated, self.num_pruned))
                    if score <= alpha:
                        break
                node.score = beta

        self.nodelist_by_score.append(node)
        self.num_calculated += 1     
        self.ablist_by_score.append((alpha, beta, None, "end",
                                    self.num_calculated, self.num_pruned))
        node.score_index = len(self.nodelist_by_score) - 1  
        if use_tt:
            from util import calc_same_boardtexts   
            
            boardtxtlist = calc_same_boardtexts(node.mb)
            for boardtxt in boardtxtlist:
                tt[boardtxt] = node.score
        return node.score

    from ai import dprint       
    for node in self.nodelist:
        node.score_index = float("inf")
        node.pruned_index = float("inf")
    self.nodelist_by_score = []
    self.ablist_by_score = []
    self.num_calculated = 0
    self.num_pruned = 0
    if init_ab:
        alpha = -2 if self.shortest_victory else -1
        beta = 3 if self.shortest_victory else 1
    else:
        alpha = float("-inf")
        beta = float("inf")
    calc_ab_score(abroot, {}, alpha, beta)
    total_nodenum = self.num_pruned + self.num_calculated
    ratio = self.num_calculated / total_nodenum * 100
    dprint(debug, "ミニマックス法で計算したか", minimax)    
    dprint(debug, "計算したノードの数",  self.num_calculated)
    dprint(debug, "枝狩りしたノードの数",  self.num_pruned)
    dprint(debug, "合計",  total_nodenum)
    dprint(debug, f"割合 {ratio:.1f}%")
    
Mbtree.calc_score_by_ab = calc_score_by_ab

修正箇所

def calc_score_by_ab(self, abroot, debug=False, minimax=False, init_ab=False, use_tt=False):
元と同じなので省略
        if use_tt:
+           from util import calc_same_boardtexts

-           tt[boardtxt] = node.score           
+           boardtxtlist = calc_same_boardtexts(node.mb)
+           for boardtxt in boardtxtlist:
+               tt[boardtxt] = node.score
        return node.score
元と同じなので省略
    
Mbtree.calc_score_by_ab = calc_score_by_ab

上記の修正後に下記のプログラムを実行すると、実行結果のように前回の記事 で最善手を着手した子ノードを最初に計算した場合の αβ 法 の計算したノードの数と割合である 2312、0.4 % と ほぼ同じ効率で枝狩りを行える ことが確認できました。

mbtree.calc_score_by_ab(mbtree.root, use_tt=True, minimax=True, debug=True)

実行結果

ミニマックス法で計算したか True
計算したノードの数 2271
枝狩りしたノードの数 547675
合計 549946
割合 0.4%

また、下記のプログラムを実行し、(0, 0) だけに着手 した局面の 同一局面 である (2, 0) だけに着手 した局面のノードが計算された 7485 フレーム目を表示すると、実行結果のように薄緑色で表示され、赤枠の子ノードがすべて暗い色で表示されるため、置換表による置換が行われた ことが確認できます。また、直前の 7484 フレーム目を表示した後で上段の > をクリックして 7485 フレーム目を表示すると、下図のように 59704 個の枝狩りを行われた ことが確認できます。

Mbtree_Anim(mbtree, isscore=True)

実行結果

置換表を利用した AI の定義と確認

置換表を利用したミニマックス法 によって 強解決の AI を作成できるか どうかを 確認するため に、ai_mmdfs_tt という名前の AI を定義 する事にします。この関数は、以前に作成した ミニマックス法で計算した評価値を使って着手を選択する AI である ai_mmdfs を下記のプログラムのように 修正して定義 します。また、その際に 最短の勝利を考慮した評価値で計算できる ように shortest_victory という仮引数を追加 することにします。

5 行目：デフォルト値を False とする仮引数 shortest_victory を追加する
7 行目：ゲーム盤の評価値を計算するローカル関数 mm_search に、置換表を代入する仮引数 tt を追加する
11 ~ 16 行目：決着がついた局面の評価値を shortest_victory の値を考慮して計算する。なお、この部分で計算した評価値を置換表に登録してもかまわないが、この計算はすぐに行うことができる ので置換表に登録しても 処理の効率が向上しない 点と、プログラムの記述が少しだけ長くなる 点を考慮して 置換表には登録しないことにした。置換表に登録したい方はそのように実装してもかまわないが、その場合は 18 ~ 20 行目の処理をこの処理の前に記述する必要がある点に注意すること
18 ~ 20 行目：局面を表す文字列を計算し、置換表に登録されている場合はその評価値を返すように修正する
27 行目：mm_search を再帰呼び出しする際に実引数 tt を記述するように修正する
28 ~ 35 行目：元のプログラムでは子ノードの評価値の最大値または最小値を計算して返り値として返していたが、評価値を同一局面の置換表に登録する必要がある ので score という変数に評価値を代入し、32 ~ 34 行目で同一局面の評価値を置換表に登録 した後で 35 行目で返り値として返す ように修正する³
37 行目：最初に mm_search を呼び出す際に、置換表として空の dict を実引数に記述する ように修正する

 1  from ai import ai_by_score, dprint
 2  from copy import deepcopy
 3  
 4  @ai_by_score
 5  def ai_mmdfs_tt(mb, debug=False, shortest_victory=False):
 6      count = 0
 7      def mm_search(mborig, tt):
 8          nonlocal count
 9          count += 1
10  
11          if mborig.status == Marubatsu.CIRCLE:
12              return (11 - mborig.move_count) / 2 if shortest_victory else 1
13          elif mborig.status == Marubatsu.CROSS:
14              return (mborig.move_count - 10) / 2 if shortest_victory else -1
15          elif mborig.status == Marubatsu.DRAW:
16              return 0
17          
18          boardtxt = mborig.board_to_str()
19          if boardtxt in tt:
20              return tt[boardtxt]
21          
22          legal_moves = mborig.calc_legal_moves()
23          score_list = []
24          for x, y in legal_moves:
25              mb = deepcopy(mborig)
26              mb.move(x, y)
27              score_list.append(mm_search(mb, tt))
28          if mborig.turn == Marubatsu.CIRCLE:
29              score = max(score_list)
30          else:
31              score = min(score_list)
32      boardtxtlist = calc_same_boardtexts(mborig)
33      for boardtxt in boardtxtlist:
34          tt[boardtxt] = score
35      return score
36      
37      score = mm_search(mb, {})
38      dprint(debug, "count =", count)
39      if mb.turn == Marubatsu.CIRCLE:
40          score *= -1
41      return score

行番号のないプログラム

from ai import ai_by_score, dprint
from copy import deepcopy

@ai_by_score
def ai_mmdfs_tt(mb, debug=False, shortest_victory=False):
    count = 0
    def mm_search(mborig, tt):
        nonlocal count
        count += 1

        if mborig.status == Marubatsu.CIRCLE:
            return (11 - mborig.move_count) / 2 if shortest_victory else 1
        elif mborig.status == Marubatsu.CROSS:
            return (mborig.move_count - 10) / 2 if shortest_victory else -1
        elif mborig.status == Marubatsu.DRAW:
            return 0
        
        boardtxt = mborig.board_to_str()
        if boardtxt in tt:
            return tt[boardtxt]
        
        legal_moves = mborig.calc_legal_moves()
        score_list = []
        for x, y in legal_moves:
            mb = deepcopy(mborig)
            mb.move(x, y)
            score_list.append(mm_search(mb, tt))
        if mborig.turn == Marubatsu.CIRCLE:
            score = max(score_list)
        else:
            score = min(score_list)
        boardtxtlist = calc_same_boardtexts(mborig)
        for boardtxt in boardtxtlist:
            tt[boardtxt] = score
        return score
    
    score = mm_search(mb, {})
    dprint(debug, "count =", count)
    if mb.turn == Marubatsu.CIRCLE:
        score *= -1
    return score

修正箇所

from ai import ai_by_score, dprint
from copy import deepcopy

@ai_by_score
-def ai_mmdfs(mb, debug=False):
+def ai_mmdfs_tt(mb, debug=False, shortest_victory=False):
    count = 0
-   def mm_search(mborig):
+   def mm_search(mborig, tt):
        nonlocal count
        count += 1

        if mborig.status == Marubatsu.CIRCLE:
-           return 1
+           return (11 - mborig.move_count) / 2 if shortest_victory else 1
        elif mborig.status == Marubatsu.CROSS:
-           return -1
+           return (mborig.move_count - 10) / 2 if shortest_victory else -1
        elif mborig.status == Marubatsu.DRAW:
            return 0
        
+       boardtxt = mborig.board_to_str()
+       if boardtxt in tt:
+           return tt[boardtxt]
        
        legal_moves = mborig.calc_legal_moves()
        score_list = []
        for x, y in legal_moves:
            mb = deepcopy(mborig)
            mb.move(x, y)
-           score_list.append(mm_search(mb))
+           score_list.append(mm_search(mb, tt))
        if mborig.turn == Marubatsu.CIRCLE:
-           return max(score_list)        
+           score = max(score_list)
        else:
-           return min(score_list)
+           score = min(score_list)
+       boardtxtlist = calc_same_boardtexts(mborig)
+       for boardtxt in boardtxtlist:
+           tt[boardtxt] = score
+       return score
    
-   score = mm_search(mb)
+   score = mm_search(mb, {})
    dprint(debug, "count =", count)
    if mb.turn == Marubatsu.CIRCLE:
        score *= -1
    return score

上記の定義後に Check_solved クラスの is_strongly_solved メソッドを利用してこの関数が 強解決の AI であるかどうかを確認 します。実行結果から最短の勝利を考慮した評価値を計算しない場合とする場合の両方で 強解決の AI であることが確認できました。

from util import Check_solved

Check_solved.is_strongly_solved(ai_mmdfs_tt)
Check_solved.is_strongly_solved(ai_mmdfs_tt, params={"shortest_victory": True})

実行結果

100%|██████████| 431/431 [00:05<00:00, 80.79it/s] 
431/431 100.00%
100%|██████████| 431/431 [00:04<00:00, 91.44it/s] 
431/431 100.00%

このことから、置換表を利用したミニマックス法で正しい計算を行うことが出来る ことが確認できました。

置換表の効率

上記で 〇× ゲーム の ルートノードの評価値 を、置換表を使ったミニマックス法で計算 すると αβ 法より高い効率で枝狩りを行うことが出来る ことが確認できたので、置換表を利用したミニマックス法の効率が αβ 法の効率より高いと 思った方がいるかもしれません が、常にそのようになるわけではありません。

置換表 は 同一局面の子孫ノードの枝狩りを行う という手法なので、評価値を計算するノード をルートノードとする 部分木の中 に、同一局面が多く存在しない場合 は 多くの枝狩りを行うことはできません。

〇×ゲーム は、全体的にみると 着手が少ない程同一局面の数が多くなる という性質があります。例えば、1 手目の局面 には 9 種類の局面 がありますが、その中で 隅への着手 が行われた 4 つの局面は同一局面 です。同様に、辺への着手 が行われた 4 つの局面は同一局面 です。従って、ゲーム開始時の ルートノードの評価値を計算する際 に、その子ノードである 深さ 1 の 9 個の子ノード のうち 3 + 3 = 6 つのノード で 置換表による枝狩りを行うことが出来ます。そのため、ルートノードの評価値 を計算する際に、先程の実行結果のように 非常に多くの割合の枝狩りを行うことができます。

一方、〇× ゲーム では 着手を行えば行うほど、子孫ノードの中の同一局面の割合が減る という傾向があるので、深いノードの評価値を計算 する際の 置換表による枝狩りの効率は落ちていく ことになります。

そのことを確認するために、下記のプログラムで 深さ 0 ~ 8 の最初のノード に対して、置換表を利用したミニマックス法で評価値の計算 を行ってみることにします。

for depth in range(9):
    print("depth", depth)
    mbtree.calc_score_by_ab(mbtree.nodelist_by_depth[depth][0], minimax=True, use_tt=True, debug=True)

実行結果

depth 0
ミニマックス法で計算したか True
計算したノードの数 2271
枝狩りしたノードの数 547675
合計 549946
割合 0.4%
depth 1
ミニマックス法で計算したか True
計算したノードの数 1820
枝狩りしたノードの数 57885
合計 59705
割合 3.0%
depth 2
ミニマックス法で計算したか True
計算したノードの数 1056
枝狩りしたノードの数 7176
合計 8232
割合 12.8%
depth 3
ミニマックス法で計算したか True
計算したノードの数 250
枝狩りしたノードの数 1099
合計 1349
割合 18.5%
depth 4
ミニマックス法で計算したか True
計算したノードの数 124
枝狩りしたノードの数 110
合計 234
割合 53.0%
depth 5
ミニマックス法で計算したか True
計算したノードの数 34
枝狩りしたノードの数 7
合計 41
割合 82.9%
depth 6
ミニマックス法で計算したか True
計算したノードの数 7
枝狩りしたノードの数 1
合計 8
割合 87.5%
depth 7
ミニマックス法で計算したか True
計算したノードの数 1
枝狩りしたノードの数 0
合計 1
割合 100.0%
depth 8
ミニマックス法で計算したか True
計算したノードの数 2
枝狩りしたノードの数 0
合計 2
割合 100.0%

次に下記のプログラムで 深さ 0 ~ 8 の最初のノード に対して、αβ 法で評価値の計算 を行ってみることにします。なお、置換表は利用しないので use_tt=True は記述しません。

for depth in range(9):
    print("depth", depth)
    mbtree.calc_score_by_ab(mbtree.nodelist_by_depth[depth][0], debug=True)

実行結果

depth 0
ミニマックス法で計算したか False
計算したノードの数 18297
枝狩りしたノードの数 531649
合計 549946
割合 3.3%
depth 1
ミニマックス法で計算したか False
計算したノードの数 2338
枝狩りしたノードの数 57367
合計 59705
割合 3.9%
depth 2
ミニマックス法で計算したか False
計算したノードの数 749
枝狩りしたノードの数 7483
合計 8232
割合 9.1%
depth 3
ミニマックス法で計算したか False
計算したノードの数 336
枝狩りしたノードの数 1013
合計 1349
割合 24.9%
depth 4
ミニマックス法で計算したか False
計算したノードの数 65
枝狩りしたノードの数 169
合計 234
割合 27.8%
depth 5
ミニマックス法で計算したか False
計算したノードの数 32
枝狩りしたノードの数 9
合計 41
割合 78.0%
depth 6
ミニマックス法で計算したか False
計算したノードの数 6
枝狩りしたノードの数 2
合計 8
割合 75.0%
depth 7
ミニマックス法で計算したか False
計算したノードの数 1
枝狩りしたノードの数 0
合計 1
割合 100.0%
depth 8
ミニマックス法で計算したか False
計算したノードの数 2
枝狩りしたノードの数 0
合計 2
割合 100.0%

下記は上記の結果を表にしたものです。左に置換表を利用したミニマックス法のデータを 表記し、αβ 法の結果 は比較のために 割合のみを右端の列に表記 します。

表から評価値を利用したミニマックス法では 深さが深くなるほど割合が大きく増えていく ことから、枝狩りの効率が大きく減る ことが確認できます。ただし、深さが深い程部分木のノードの数が急激に減る ため、割合が増えても計算したノードの数は減少する ことも確認できました。

αβ 法も同様の傾向はありますが、深さが 0 の場合は割合が高い、深さが 1、5 ~ 8 の場合はほぼ同じ、深さが 2 ~ 4 の場合は割合が低くなっているので、どちらがより枝狩りを効率よく行えるかは一概には言えない ことが確認できました。

深さ	計算したノードの数	合計	割合	αβ 法の割合
0	2271	549946	0.4 %	3.3 %
1	1820	59705	3.0 %	3.9 %
2	1056	8232	12.8 %	9.1 %
3	250	1349	18.5 %	24.9 %
4	124	234	53.0 %	27.8 %
5	34	41	82.9 %	78.0 %
6	7	8	87.5 %	75.0 %
7	1	1	100.0 %	100.0 %
8	2	2	100.0 %	100.0 %

なお、置換表を利用したミニマックス法の効率 は ゲームによって大きく性質が異なります。例えば、〇× ゲーム、オセロ、囲碁のような、着手を行うたびにゲーム盤が埋まっていく ようなゲームの場合は、手数が増えると子孫ノードの同一局面の割合が減って 置換表の効率が落ちていきます が、将棋やチェスのような 盤面上のコマを移動する ようなゲームの場合は、手数が増えても子孫ノードの同一局面の割合は大きく減らない ので 手数と置換表の効率には大きな関係はない という性質があります。

このように 置換表による枝狩りの効率 は 局面によってばらつきがあります が、ほとんどの場合は置換表を利用したほうが効率が上がるので 一般的には利用したほうが良いでしょう。

今回の記事のまとめ

今回の記事では、αβ 法で α 値と β 値の初期値を評価値の最小値と最大値に設定することで枝狩りの効率を少しだけ上げることが出来ることを示しました。

また、ミニマックス法で置換表を利用した枝狩りを行う方法について紹介しました。

ところで、今回修正した calc_score_by_ab を利用して下記のプログラムのように 置換表を利用した αβ 法の計算を行える と思った人はいないでしょうか？下記のプログラムの実行結果から 割合が 0.2 % と置換表を利用しない αβ 法の場合の 3.3 % から 大きく減っている のでうまく計算が出来ているように思えるかもしれませんが、実際には今回修正した calc_score_by_ab では置換表を利用して αβ 法の計算を正しく行うことはできません。その理由については次回の記事で説明しますが、余裕がある方は考えてみて下さい。

mbtree.calc_score_by_ab(mbtree.root, use_tt=True, debug=True)

実行結果

ミニマックス法で計算したか False
計算したノードの数 999
枝狩りしたノードの数 548947
合計 549946
割合 0.2%

本記事で入力したプログラム

リンク	説明
marubatsu.ipynb	本記事で入力して実行した JupyterLab のファイル
tree.py	本記事で更新した tree_new.py
ai.py	本記事で更新した ai_new.py

次回の記事

本記事ではそのような修正は行いませんが、この修正によって calc_ab_score を呼び出す際に必ず実引数に α 値と β 値が記述されるようになるので、calc_ab_score の仮引数 alpha と beta を通常の仮引数に変更することができます ↩
長くなるので本記事では説明しません。興味がある方は調べてみて下さい ↩
ai.py に ai_mmdfs_tt の定義を記述する際には、循環参照を避けるために from util import calc_same_boardtexts を ai_mmdfs_tt の定義の中に記述する必要があります ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Pythonで〇×ゲームのAIを一から作成する その151 置換表によるミニマックス法の枝狩り

目次と前回の記事

これまでに作成したモジュール

α 値と β 値の初期値の変更による枝狩り

Mbtree クラスの calc_score_by_ab メソッドの修正

置換表を利用したミニマックス法での枝狩り

Mbtree クラスの calc_score_by_ab メソッドの修正 その 1

Mbtree クラスの calc_score_by_ab メソッドの修正 その 2

Mbtree_Anim のクラスの update_ab メソッドの修正

問題の検証

Mbtree_Anim のクラスの update_gui メソッドの修正

同一局面を考慮した置換表による枝狩り

置換表を利用した AI の定義と確認

置換表の効率

今回の記事のまとめ

本記事で入力したプログラム

次回の記事

Pythonで〇×ゲームのAIを一から作成する　その151　置換表によるミニマックス法の枝狩り

Mbtree クラスの `calc_score_by_ab` メソッドの修正

Mbtree クラスの `calc_score_by_ab` メソッドの修正その 1

Mbtree クラスの `calc_score_by_ab` メソッドの修正その 2

Mbtree_Anim のクラスの `update_ab` メソッドの修正

Mbtree_Anim のクラスの `update_gui` メソッドの修正