Pythonで〇×ゲームのAIを一から作成する　その184　反復深化を利用した Principal Variation Search

Last updated at 2025-08-12Posted at 2025-08-08

目次と前回の記事

これまでに作成したモジュール

以下のリンクから、これまでに作成したモジュールを見ることができます。

リンク	説明
marubatsu.py	Marubatsu、Marubatsu_GUI クラスの定義
ai.py	AI に関する関数
test.py	テストに関する関数
util.py	ユーティリティ関数の定義
tree.py	ゲーム木に関する Node、Mbtree クラスなどの定義
gui.py	GUI に関する処理を行う基底クラスとなる GUI クラスの定義

AI の一覧とこれまでに作成したデータファイルについては、下記の記事を参照して下さい。

反復深化を利用した Principal Variation Search

前回の記事では、反復深化を利用した move ordering の手法を紹介し、多くのゲームでは move ordering を行う反復深化の効率は深さ制限探索よりも高くなる ことを紹介しました。

move ordering を利用した他の 枝狩りの効率化のアルゴリズム として、以前の記事で紹介した下記の スカウト法 があります。

move ordering を行う
先頭の子ノードの評価値の計算処理 を αβ 法と同じ 方法で計算する
2 番目以降の子ノード の評価値の計算処理を以下の手順で行う。以下は $i$（$i ≧ 2$）番目の子ノードの評価値 $s_i$ を計算する場合の処理の説明である
- max ノード の場合は以下の手順で計算を行う
  1. ウィンドウを (α 値, α 値 + 1) とした αβ 法 で $s_i$ を計算し、$s$ を更新する
  2. $s$ の値に応じて以下の処理を行う
    - β 値 ≦ $\boldsymbol{s}$ の場合は $s$ をこのノードの評価値として処理を終了するという、β 狩り の処理を行う
    - α 値 < $\boldsymbol{s}$ < β 値 の場合は ウィンドウを (α 値, β 値) とした αβ 法と同じ方法 で 計算しなおす。$s$ や α 値の更新、β 狩りの処理も αβ 法と同じ
    - $\boldsymbol{s}$ ≦ α 値 の場合は何もしない
- min ノード の場合は以下の手順で計算を行う
  1. ウィンドウを (β 値 - 1, β 値) とした αβ 法 で $s_i$ を計算し、$s$ を更新する
  2. $s$ の値に応じて以下の処理を行う
    - $\boldsymbol{s}$ ≦ α 値 の場合は $s$ をこのノードの評価値として処理を終了するという、β 狩り の処理を行う
    - α 値 < $\boldsymbol{s}$ < β 値 の場合は ウィンドウを (α 値, β 値) とした αβ 法と同じ方法 で 計算しなおす。$s$ や β 値の更新、α 狩りの処理も αβ 法と同じ
    - β 値 ≦ $\boldsymbol{s}$ の場合は何もしない

以前の記事で実装したスカウト法で最善手を計算する ai_scout は、深さの上限を制限せずに ゲーム木のすべてのノードに対して計算を行っていましたが、ミニマックス法や αβ 法と同様にスカウト法に対しても深さの上限を設定した 深さ制限探索で計算を行うことができます。そのため、スカウト法に対しても 深さの上限を 1 から増やしながら深さ制限探索を行うという、反復深化 を行うことができます。また、前回の記事で紹介した 直前の深さ制限探索の情報 を元に move ordering を行う αβ 法の反復深化と同様の方法で、スカウト法の反復深化 でも move ordering を行う ことができます。

スカウト法の反復深化 では、一つ前に行われた深さ制限探索の置換表に記録された principal variation を着手したノードに対して 通常のウィンドウで αβ 法の計算 を行い、それ以外のノード に対しては基本的には null window search を行います。そのため、この手法をスカウト法ではなく principal variation search、またはその頭文字を取って PVS と呼ぶ場合が多い ようです。なお、スカウト法と PVS は 本質的に同じアルゴリズム なので、Wikipedia では下記のリンク先の記事¹のように同じ項目で説明されているようです。

また、Chess Programming Wiki の下記の PVS の記事では「ほとんどの PVS の実装はネガスカウト法と類似している（Most PVS implementations are similar to Reinefeld's NegaScout）」、「それらは同じものである（They are identical）」などのように説明されているようです。

そこで、本記事でもこのアルゴリズムを以後は PVS と表記することにします。

`ai_pvs_dls` の実装

最初に深さ制限探索で PVS で最善手を計算 する ai_pvs_dls を実装することにします。その際に PVS のアルゴリズム は move ordering を行う αβ 法と共通する部分が多い ので ai_abs_dls を修正しながら実装 することにします。また、以前の記事で実装した 深さの上限を制限しないスカウト法 で処理を行う ai_scout も参考にします。

仮引数 `use_tt` の削除

ai_pvs_dls の 仮引数 は ai_abs_dls を基本的に踏襲する ことにしますが、PVS は move ordering を前提 とするアルゴリズムなので 置換表の利用は必須 であるため、置換表の利用の有無を表す仮引数 use_tt を削除 することにします。

下記は、ai_abs_dls の処理の中で、仮引数 use_tt を削除 し、use_tt が True の場合 の条件式のブロックを すべて実行する ように修正したプログラムです。

6 行目：仮引数 use_tt を削除する
11 ~ 24 行目、35 ~ 44 行目：if use_tt: を削除し、そのブロックを常に実行するようにする
30 行目の下にあった、use_tt が False の場合に boardtxt を計算する処理は、10 行目で boardtxt が常に計算されるようになったので削除した

 1  from marubatsu import Marubatsu
 2  from ai import ai_by_mmscore, dprint
 3  from copy import deepcopy
 4  
 5  @ai_by_mmscore
 6  def ai_pvs_dls(mb, debug=False, timelimit_pc=None, maxdepth=1, eval_func=None,
 7                 eval_params={}, tt=None, tt_for_mo=None):
 8      count = 0
 9      def ab_search(mborig, depth, tt, alpha=float("-inf"), beta=float("inf")):
元と同じなので省略
10          boardtxt = mborig.board_to_str()
11          if boardtxt in tt:
12              lower_bound, upper_bound, _ = tt[boardtxt]
13              if lower_bound == upper_bound:
14                  return lower_bound
15              elif upper_bound <= alpha:
16                  return upper_bound
17              elif beta <= lower_bound:
18                  return lower_bound
19              else:
20                  alpha = max(alpha, lower_bound)
21                  beta = min(beta, upper_bound)
22          else:
23              lower_bound = min_score
24              upper_bound = max_score
25              
26          alphaorig = alpha
27          betaorig = beta
28  
29          legal_moves = mborig.calc_legal_moves()
30          if tt_for_mo is not None:
31              if boardtxt in tt_for_mo:
32                  _, _, bestmove = tt_for_mo[boardtxt]
33                  index = legal_moves.index(bestmove)
34                  legal_moves[0], legal_moves[index] = legal_moves[index], legal_moves[0]
元と同じなので省略  
35          boardtxtlist = calc_same_boardtexts(mborig, bestmove)
36          if score <= alphaorig:
37              upper_bound = score
38          elif score < betaorig:
39              lower_bound = score
40              upper_bound = score
41          else:
42              lower_bound = score
43          for boardtxt, move in boardtxtlist:
44              tt[boardtxt] = (lower_bound, upper_bound, move)
45          return score
元と同じなので省略

行番号のないプログラム

from marubatsu import Marubatsu
from ai import ai_by_mmscore, dprint
from copy import deepcopy

@ai_by_mmscore
def ai_pvs_dls(mb, debug=False, timelimit_pc=None, maxdepth=1, eval_func=None,
               eval_params={}, tt=None, tt_for_mo=None):
    count = 0
    def ab_search(mborig, depth, tt, alpha=float("-inf"), beta=float("inf")):
        nonlocal count
        if timelimit_pc is not None and perf_counter() >= timelimit_pc:
            raise RuntimeError("time out")
        
        count += 1
        if mborig.status != Marubatsu.PLAYING or depth == maxdepth:
            return eval_func(mborig, calc_score=True, **eval_params)
        
        boardtxt = mborig.board_to_str()
        if boardtxt in tt:
            lower_bound, upper_bound, _ = tt[boardtxt]
            if lower_bound == upper_bound:
                return lower_bound
            elif upper_bound <= alpha:
                return upper_bound
            elif beta <= lower_bound:
                return lower_bound
            else:
                alpha = max(alpha, lower_bound)
                beta = min(beta, upper_bound)
        else:
            lower_bound = min_score
            upper_bound = max_score
        
        alphaorig = alpha
        betaorig = beta

        legal_moves = mborig.calc_legal_moves()
        if tt_for_mo is not None:
            if boardtxt in tt_for_mo:
                _, _, bestmove = tt_for_mo[boardtxt]
                index = legal_moves.index(bestmove)
                legal_moves[0], legal_moves[index] = legal_moves[index], legal_moves[0]
        if mborig.turn == Marubatsu.CIRCLE:
            score = float("-inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                abscore = ab_search(mb, depth + 1, tt, alpha, beta)
                if abscore > score:
                    bestmove = (x, y)
                score = max(score, abscore)
                if score >= beta:
                    break
                alpha = max(alpha, score)
        else:
            score = float("inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                abscore = ab_search(mb, depth + 1, tt, alpha, beta)
                if abscore < score:
                    bestmove = (x, y)
                score = min(score, abscore)
                if score <= alpha:
                    break
                beta = min(beta, score)   
            
        from util import calc_same_boardtexts

        boardtxtlist = calc_same_boardtexts(mborig, bestmove)
        if score <= alphaorig:
            upper_bound = score
        elif score < betaorig:
            lower_bound = score
            upper_bound = score
        else:
            lower_bound = score
        for boardtxt, move in boardtxtlist:
            tt[boardtxt] = (lower_bound, upper_bound, move)
        return score
                
    min_score = float("-inf")
    max_score = float("inf")
    
    if tt is None:
        tt = {}
    score = ab_search(mb, depth=0, tt=tt, alpha=min_score, beta=max_score)
    dprint(debug, "count =", count)
    return score, count

修正箇所

from marubatsu import Marubatsu
from ai import ai_by_mmscore, dprint
from copy import deepcopy

@ai_by_mmscore
def ai_pvs_dls(mb, debug=False, timelimit_pc=None, maxdepth=1, eval_func=None,
-              eval_params={}, use_tt=False, tt=None, tt_for_mo=None):
+              eval_params={}, tt=None, tt_for_mo=None):
    count = 0
    def ab_search(mborig, depth, tt, alpha=float("-inf"), beta=float("inf")):
元と同じなので省略
        boardtxt = mborig.board_to_str()
-       if use_tt:
-           boardtxt = mborig.board_to_str()
-           if boardtxt in tt:
-               lower_bound, upper_bound = tt[boardtxt]
-               if lower_bound == upper_bound:
-                   return lower_bound
-               elif upper_bound <= alpha:
-                   return upper_bound
-               elif beta <= lower_bound:
-                   return lower_bound
-               else:
-                   alpha = max(alpha, lower_bound)
-                   beta = min(beta, upper_bound)
-           else:
-               lower_bound = min_score
-               upper_bound = max_score        
+       if boardtxt in tt:
+           lower_bound, upper_bound, _ = tt[boardtxt]
+           if lower_bound == upper_bound:
+               return lower_bound
+           elif upper_bound <= alpha:
+               return upper_bound
+           elif beta <= lower_bound:
+               return lower_bound
+           else:
+               alpha = max(alpha, lower_bound)
+               beta = min(beta, upper_bound)
+       else:
+           lower_bound = min_score
+           upper_bound = max_score
            
        alphaorig = alpha
        betaorig = beta

        legal_moves = mborig.calc_legal_moves()
        if tt_for_mo is not None:
-           if not use_tt:            
-               boardtxt = mborig.board_to_str()
            if boardtxt in tt_for_mo:
                _, _, bestmove = tt_for_mo[boardtxt]
                index = legal_moves.index(bestmove)
                legal_moves[0], legal_moves[index] = legal_moves[index], legal_moves[0]
元と同じなので省略  
-       if use_tt:
-           boardtxtlist = calc_same_boardtexts(mborig, bestmove)
-           if score <= alphaorig:
-               upper_bound = score
-           elif score < betaorig:
-               lower_bound = score
-               upper_bound = score
-           else:
-               lower_bound = score
-           for boardtxt, move in boardtxtlist:
-              tt[boardtxt] = (lower_bound, upper_bound, move)
+       boardtxtlist = calc_same_boardtexts(mborig, bestmove)
+       if score <= alphaorig:
+           upper_bound = score
+       elif score < betaorig:
+           lower_bound = score
+           upper_bound = score
+       else:
+           lower_bound = score
+       for boardtxt, move in boardtxtlist:
+           tt[boardtxt] = (lower_bound, upper_bound, move)
        return score
元と同じなので省略

スカウト法に関する修正

次に、スカウト法の処理を行う ように ai_pvs_dls を修正します。具体的には、ai_scout の中で スカウト法の処理を行う下記の処理 を ai_pvs_dls で行うように修正します。

        if mborig.turn == Marubatsu.CIRCLE:
            x, y = legal_moves[0]
            mb = deepcopy(mborig)
            mb.move(x, y)
            score = ab_search(mb, tt, alpha, beta)
            alpha = max(alpha, score)
            if score < beta:
                for x, y in legal_moves[1:]:
                    mb = deepcopy(mborig)
                    mb.move(x, y)
                    score = max(score, ab_search(mb, tt, alpha, alpha + 1))
                    if score >= beta:
                        break
                    elif score > alpha:
                        score = max(score, ab_search(mb, tt, alpha, beta))
                        if score >= beta:
                            break
                        alpha = max(alpha, score)
        else:
            x, y = legal_moves[0]
            mb = deepcopy(mborig)
            mb.move(x, y)
            score = ab_search(mb, tt, alpha, beta)
            beta = min(beta, score)
            if score > alpha:
                for x, y in legal_moves[1:]:
                    mb = deepcopy(mborig)
                    mb.move(x, y)
                    score = min(score, ab_search(mb, tt, beta - 1, beta))
                    if score <= alpha:
                        break
                    elif score < beta:
                        score = min(score, ab_search(mb, tt, alpha, beta))
                        if score <= alpha:
                            break
                        beta = min(beta, score)

下記は、そのように ai_pvs_dls を修正したプログラムです。なお、下記の 説明と修正箇所 は、元の ai_pvs_dls ではなく、上記の ai_scout のプログラムとの違い です。

10 行目：ai_scout の場合と異なり、ai_pvs_dls では最善手を記録する必要があるので bestmove に最初の子ノードの合法手を代入する
15 ~ 18、22 ~ 25 行目：子ノードの αβ 値が score を超えていた場合に最善手を表す bestmove を更新する処理を追加した
30 行目の min ノードの場合の処理の修正は、max ノードの場合の処理と同様

 1  @ai_by_mmscore
 2  def ai_pvs_dls(mb, debug=False, timelimit_pc=None, maxdepth=1, eval_func=None, eval_params={},
 3             tt=None, tt_for_mo=None):
元と同じなので省略
 4          if mborig.turn == Marubatsu.CIRCLE:
 5              x, y = legal_moves[0]
 6              mb = deepcopy(mborig)
 7              mb.move(x, y)
 8              score = ab_search(mb, depth + 1, tt, alpha, beta)
 9              alpha = max(alpha, score)
10              bestmove = (x, y)
11              if score < beta:
12                  for x, y in legal_moves[1:]:
13                      mb = deepcopy(mborig)
14                      mb.move(x, y)
15                      abscore = ab_search(mb, depth + 1, tt, alpha, alpha + 1)
16                      if abscore > score:
17                          bestmove = (x, y)
18                      score = max(score, abscore)
19                      if score >= beta:
20                          break
21                      elif score > alpha:
22                          abscore = ab_search(mb, depth + 1, tt, alpha, beta)
23                          if abscore > score:
24                              bestmove = (x, y)
25                          score = max(score, abscore)
26                          if score >= beta:
27                              break
28                          alpha = max(alpha, score)                    
29          else:
30              # min ノードの場合の処理。max ノードの場合と同様なので省略
元と同じなので省略

行番号のないプログラム

@ai_by_mmscore
def ai_pvs_dls(mb, debug=False, timelimit_pc=None, maxdepth=1, eval_func=None, eval_params={},
           tt=None, tt_for_mo=None):
    count = 0
    def ab_search(mborig, depth, tt, alpha=float("-inf"), beta=float("inf")):
        nonlocal count
        if timelimit_pc is not None and perf_counter() >= timelimit_pc:
            raise RuntimeError("time out")
        
        count += 1
        if mborig.status != Marubatsu.PLAYING or depth == maxdepth:
            return eval_func(mborig, calc_score=True, **eval_params)
        
        boardtxt = mborig.board_to_str()
        if boardtxt in tt:
            lower_bound, upper_bound, _ = tt[boardtxt]
            if lower_bound == upper_bound:
                return lower_bound
            elif upper_bound <= alpha:
                return upper_bound
            elif beta <= lower_bound:
                return lower_bound
            else:
                alpha = max(alpha, lower_bound)
                beta = min(beta, upper_bound)
        else:
            lower_bound = min_score
            upper_bound = max_score
        
        alphaorig = alpha
        betaorig = beta

        legal_moves = mborig.calc_legal_moves()
        if tt_for_mo is not None:
            if boardtxt in tt_for_mo:
                _, _, bestmove = tt_for_mo[boardtxt]
                index = legal_moves.index(bestmove)
                legal_moves[0], legal_moves[index] = legal_moves[index], legal_moves[0]
        if mborig.turn == Marubatsu.CIRCLE:
            x, y = legal_moves[0]
            mb = deepcopy(mborig)
            mb.move(x, y)
            score = ab_search(mb, depth + 1, tt, alpha, beta)
            alpha = max(alpha, score)
            bestmove = (x, y)
            if score < beta:
                for x, y in legal_moves[1:]:
                    mb = deepcopy(mborig)
                    mb.move(x, y)
                    abscore = ab_search(mb, depth + 1, tt, alpha, alpha + 1)
                    if abscore > score:
                        bestmove = (x, y)
                    score = max(score, abscore)
                    if score >= beta:
                        break
                    elif score > alpha:
                        abscore = ab_search(mb, depth + 1, tt, alpha, beta)
                        if abscore > score:
                            bestmove = (x, y)
                        score = max(score, abscore)
                        if score >= beta:
                            break
                        alpha = max(alpha, score)                    
        else:
            x, y = legal_moves[0]
            mb = deepcopy(mborig)
            mb.move(x, y)
            score = ab_search(mb, depth + 1, tt, alpha, beta)
            beta = min(beta, score)
            bestmove = (x, y)
            if score > alpha:
                for x, y in legal_moves[1:]:
                    mb = deepcopy(mborig)
                    mb.move(x, y)
                    abscore = ab_search(mb, depth + 1, tt, beta - 1, beta)
                    if abscore < score:
                        bestmove = (x, y)
                    score = min(score, abscore)
                    if score <= alpha:
                        break
                    elif score < beta:
                        abscore = ab_search(mb, depth + 1, tt, alpha, beta)
                        if abscore < score:
                            bestmove = (x, y)
                        score = min(score, abscore)
                        if score <= alpha:
                            break
                        beta = min(beta, score)   
                    
        from util import calc_same_boardtexts

        boardtxtlist = calc_same_boardtexts(mborig, bestmove)
        if score <= alphaorig:
            upper_bound = score
        elif score < betaorig:
            lower_bound = score
            upper_bound = score
        else:
            lower_bound = score
        for boardtxt, move in boardtxtlist:
            tt[boardtxt] = (lower_bound, upper_bound, move)
        return score
                
    min_score = float("-inf")
    max_score = float("inf")
    
    if tt is None:
        tt = {}
    score = ab_search(mb, depth=0, tt=tt, alpha=min_score, beta=max_score)
    dprint(debug, "count =", count)
    return score, count

修正箇所

@ai_by_mmscore
def ai_pvs_dls(mb, debug=False, timelimit_pc=None, maxdepth=1, eval_func=None, eval_params={},
           tt=None, tt_for_mo=None):
元と同じなので省略
        if mborig.turn == Marubatsu.CIRCLE:
            x, y = legal_moves[0]
            mb = deepcopy(mborig)
            mb.move(x, y)
            score = ab_search(mb, depth + 1, tt, alpha, beta)
            alpha = max(alpha, score)
+           bestmove = (x, y)
            if score < beta:
                for x, y in legal_moves[1:]:
                    mb = deepcopy(mborig)
                    mb.move(x, y)
-                   score = max(score, ab_search(mb, tt, alpha, alpha + 1))
+                   abscore = ab_search(mb, depth + 1, tt, alpha, alpha + 1)
+                   if abscore > score:
+                       bestmove = (x, y)
+                   score = max(score, abscore)
                    if score >= beta:
                        break
                    elif score > alpha:
-                       score = max(score, ab_search(mb, tt, alpha, alpha + 1)
+                       abscore = ab_search(mb, depth + 1, tt, alpha, beta)
+                       if abscore > score:
+                           bestmove = (x, y)
+                       score = max(score, abscore)
                        if score >= beta:
                            break
                        alpha = max(alpha, score)                    
        else:
            x, y = legal_moves[0]
            mb = deepcopy(mborig)
            mb.move(x, y)
            score = ab_search(mb, depth + 1, tt, alpha, beta)
            beta = min(beta, score)
+           bestmove = (x, y)
            if score > alpha:
                for x, y in legal_moves[1:]:
                    mb = deepcopy(mborig)
                    mb.move(x, y)
-                   score = min(score, ab_search(mb, tt, alpha, alpha + 1))
+                   abscore = ab_search(mb, depth + 1, tt, beta - 1, beta)
+                   if abscore < score:
+                       bestmove = (x, y)
+                   score = min(score, abscore)
                    if score <= alpha:
                        break
                    elif score < beta:
-                   score = min(score, ab_search(mb, tt, alpha, alpha + 1))
+                       abscore = ab_search(mb, depth + 1, tt, alpha, beta)
+                       if abscore < score:
+                           bestmove = (x, y)
+                       score = min(score, abscore)
                        if score <= alpha:
                            break
                        beta = min(beta, score)
元と同じなので省略

精度が 100 % の move ordering を行う PVS の処理の効率

最初に 精度が 100 % の move ordering が行われた場合の ai_pvs_dls の 処理の効率 を、同じ局面に対して下記の処理を行い、それぞれの場合で 計算したノードの数を比較 することで検証することにします。

なお、下記の手順 2、3 の move ordering の 精度が 100 % になる理由 は、手順 1 と同じ深さの上限 で αβ 法と PVS で計算を行うため、手順 1 で計算した 置換表 には手順 2、3 で計算を行うすべてのノードに対する 最善手が記録される からです。

深さの上限を 9 とした置換表を利用する αβ 法で計算を行う
1 の置換表を利用した move ordering を行う深さの上限を 9 とした αβ 法で計算を行う
1 の置換表を利用した深さの上限を 9 とした PVS で計算を行う

下記はゲーム開始時の局面に対する上記の処理を行うプログラムです。なお、下記では静的評価関数は ai14s を利用しましたが、深さの上限 を〇× ゲームの ゲーム木の深さと同じ 9 としたので、静的評価関数はゲームの決着がついた局面の評価値を計算します。従って、決着のついた局面の評価値を正しく計算できる静的評価関数 であれば、静的評価関数は他の評価関数でも 結果は変わりません。

from ai import ai_abs_dls, ai14s

eval_params = {"minimax": True}

mb = Marubatsu()
result_abs = ai_abs_dls(mb, maxdepth=9, eval_func=ai14s, eval_params=eval_params,
                        use_tt=True, analyze=True)
print(result_abs["count"])

tt = result_abs["tt"]
result_abs_mo = ai_abs_dls(mb, maxdepth=9, eval_func=ai14s, eval_params=eval_params, 
                           use_tt=True, tt_for_mo=tt, analyze=True)
print(result_abs_mo["count"])

result_pvs = ai_pvs_dls(mb, maxdepth=9, eval_func=ai14s, eval_params=eval_params,
                        tt_for_mo=tt, analyze=True)
print(result_pvs["count"])

実行結果

1175
538
538

実行結果から、ゲーム開始時の局面 の場合では精度が 100 % の move ordering を行った αβ 法と PVS の効率が同じ であることが確認できました。以前の記事で説明したように、PVS と同じ処理を行う スカウト法 は move ordering の精度が高い程、効率が良くなる ことが 期待できます が、必ず良くなるとは限りません。実際に上記の場合は効率は同じになります。

下記は、ゲーム開始直後に (0, 0) に着手を行った局面 に対して上記と同様の処理を行うプログラムです。実行結果から、この場合は 573 と 569 のように、ほんの少しですが PVS が最も効率が高くなる ことが確認できます。

mb = Marubatsu()
mb.move(0, 0)
result_abs = ai_abs_dls(mb, maxdepth=9, eval_func=ai14s, eval_params=eval_params,
                        use_tt=True, analyze=True)
print(result_abs["count"])

tt = result_abs["tt"]
result_abs_mo = ai_abs_dls(mb, maxdepth=9, eval_func=ai14s, eval_params=eval_params, 
                           use_tt=True, tt_for_mo=tt, analyze=True)
print(result_abs_mo["count"])

result_pvs = ai_pvs_dls(mb, maxdepth=9, eval_func=ai14s, eval_params=eval_params,
                    tt_for_mo=tt, analyze=True)
print(result_pvs["count"])

実行結果

876
573
569

反復深化を利用した PVS の実装

PVS を行う関数が定義できたので、反復深化を利用した PVS を行う ai_pvs_iddfs を定義します。基本的には 前回の記事で修正した move ordering を行う αβ 法の反復深化 を行う ai_ab_iddfs と同じ ですが、下記の点が異なります。

PVS は置換表の利用と move ordering が前提なので、仮引数 use_tt、share_tt、moveordering を削除する
ai_abs_dls の代わりに ai_pvs_dls を呼び出す

下記はそのように ai_pvs_iddfs を定義したプログラムです。

3、4 行目：関数名を ai_pvs_iddfs に修正し、仮引数 use_tt、share_tt、moveordering を削除する
7 行目：常に tt_for_mo を空の dict で初期化するように修正する
13 行目：ai_abs_dls の代わりに ai_pvs_dls を呼び指すように修正する
仮引数 moveordering を無くしたので以下の修正を行う
- 15 行目の実引数を analyze=analyze or moveordering から analyze=True に修正する
- 16 行目にあった if analyze or moveordering を削除してそのブロックの処理を常に行うようにし、対応する else: とそのブロックを削除する

 1  from time import perf_counter
 2  
 3  def ai_pvs_iddfs(mb, debug=False, timelimit=10, eval_func=None,
 4                   eval_params={}, analyze=False):   
 5      starttime = perf_counter()
 6      timelimit_pc = starttime + timelimit
 7      tt_for_mo = {}
 8      bestmove = mb.calc_legal_moves()[0]
 9      totalcount = 0
10      resultlist = []
11      for maxdepth in range(9 - mb.move_count):
12          try:
13              result = ai_pvs_dls(mb, maxdepth=maxdepth, timelimit_pc=timelimit_pc,
14                                  eval_func=eval_func, eval_params=eval_params,
15                                  analyze=True, tt_for_mo=tt_for_mo)
元と同じなので省略
16      candidate = result["candidate"]
17      bestmove = result["bestmove"]
元と同じなので省略

行番号のないプログラム

from time import perf_counter

def ai_pvs_iddfs(mb, debug=False, timelimit=10, eval_func=None,
                eval_params={}, analyze=False):   
    starttime = perf_counter()
    timelimit_pc = starttime + timelimit
    tt_for_mo = {}
    bestmove = mb.calc_legal_moves()[0]
    totalcount = 0
    resultlist = []
    for maxdepth in range(9 - mb.move_count):
        try:
            result = ai_pvs_dls(mb, maxdepth=maxdepth, timelimit_pc=timelimit_pc,
                                eval_func=eval_func, eval_params=eval_params,
                                analyze=True, tt_for_mo=tt_for_mo)
        except RuntimeError:
            dprint(debug, "time out")
            break
        tt_for_mo = result["tt"]
        totaltime = perf_counter() - starttime
        resultlist.append(result)
        candidate = result["candidate"]
        bestmove = result["bestmove"]
        count = result["count"]
        time = result["time"]
        totalcount += count
        dprint(debug, f"maxdepth: {maxdepth}, time: {time:6.2f}/{totaltime:6.2f} ms, count {count:5}/{totalcount:5}, bestmove: {bestmove}, candidate: {candidate}")
    dprint(debug, f"totaltime: {perf_counter() - starttime: 6.2f} ms")
    if analyze:
        return resultlist
    else:
        return bestmove

修正箇所

from time import perf_counter

-def ai_ab_iddfs(mb, debug=False, timelimit=10, eval_func=None,
+def ai_pvs_iddfs(mb, debug=False, timelimit=10, eval_func=None,
-                eval_params={}, use_tt=True, share_tt=True, analyze=False):   
+                 eval_params={}, analyze=False):   
    starttime = perf_counter()
    timelimit_pc = starttime + timelimit
-   if moveordering:
-       use_tt = True
-       share_tt = True
-       tt_for_mo = {}
-   else:
-       tt_for_mo = None    
+   tt_for_mo = {}
    bestmove = mb.calc_legal_moves()[0]
    totalcount = 0
    resultlist = []
    for maxdepth in range(9 - mb.move_count):
        try:
-           result = ai_abs_dls(mb, maxdepth=maxdepth, timelimit_pc=timelimit_pc,
-                               eval_func=eval_func, eval_params=eval_params,
-                               use_tt=use_tt, share_tt=share_tt,
-                               analyze=analyze or moveordering, tt_for_mo=tt_for_mo)        
+           result = ai_pvs_dls(mb, maxdepth=maxdepth, timelimit_pc=timelimit_pc,
+                               eval_func=eval_func, eval_params=eval_params,
+                               analyze=True, tt_for_mo=tt_for_mo)

元と同じなので省略
-       if analyze or moveordering:
-           candidate = result["candidate"]
-           bestmove = result["bestmove"]
-           count = result["count"]
-           time = result["time"]
-           totalcount += count
-           dprint(debug, f"maxdepth: {maxdepth}, time: {time:6.2f}/{totaltime:6.2f} ms, count {count:5}/{totalcount:5}, bestmove: {bestmove}, candidate: {candidate}")
+       candidate = result["candidate"]
+       bestmove = result["bestmove"]
+       count = result["count"]
+       time = result["time"]
+       totalcount += count
+       dprint(debug, f"maxdepth: {maxdepth}, time: {time:6.2f}/{totaltime:6.2f} ms, count {count:5}/{totalcount:5}, bestmove: {bestmove}, candidate: {candidate}")
-       else:
-           bestmove = result
-           dprint(debug, f"maxdepth: {maxdepth}, time: {totaltime:6.2f} ms, bestmove: {bestmove}")
元と同じなので省略

上記の定義後に、下記のプログラムで ゲーム開始時の局面 に対して ai_ab_iddfs と ai_pvs_iddfs を実行して 結果を比較 すると、実行結果のように計算したノードの数の合計が 2652 と 2646 になり、PVS のほうがほんの少しだけ効率が良い ことが確認できました。

from ai import ai_ab_iddfs

mb = Marubatsu()
ai_ab_iddfs(mb, timelimit=10, debug=True, eval_func=ai14s,
            eval_params=eval_params, analyze=True, moveordering=True)
ai_pvs_iddfs(mb, timelimit=10, debug=True, eval_func=ai14s,
             eval_params=eval_params, analyze=True);

実行結果（後で説明しますが、実行結果が下記と若干異なる場合があります）

maxdepth: 0, time:   0.00/  0.00 ms, count     9/    9, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 1, time:   0.01/  0.01 ms, count    33/   42, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 2, time:   0.01/  0.03 ms, count    66/  108, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 3, time:   0.02/  0.05 ms, count   160/  268, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 4, time:   0.03/  0.07 ms, count   269/  537, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 5, time:   0.03/  0.10 ms, count   447/  984, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 6, time:   0.09/  0.20 ms, count   578/ 1562, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 7, time:   0.05/  0.25 ms, count   548/ 2110, bestmove: (0, 1), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
maxdepth: 8, time:   0.07/  0.31 ms, count   542/ 2652, bestmove: (0, 0), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
totaltime:   0.31 ms
maxdepth: 0, time:   0.00/  0.00 ms, count     9/    9, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 1, time:   0.00/  0.00 ms, count    36/   45, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 2, time:   0.00/  0.01 ms, count    67/  112, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 3, time:   0.01/  0.02 ms, count   163/  275, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 4, time:   0.02/  0.04 ms, count   270/  545, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 5, time:   0.05/  0.09 ms, count   447/  992, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 6, time:   0.04/  0.13 ms, count   566/ 1558, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 7, time:   0.03/  0.16 ms, count   546/ 2104, bestmove: (1, 2), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
maxdepth: 8, time:   0.03/  0.19 ms, count   542/ 2646, bestmove: (1, 1), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
totaltime:   0.19 ms

残念ながら 〇× ゲーム では PVS の利用によって効率はほとんど向上しない ようですが、下記の Chess Programming Wiki によると、チェスの場合 は PVS の反復深化によって約 10 % ほど処理速度が減る（we expect to save about 10% of a search effort）ようです。

本記事の最後で説明しますが、上記の 実行結果が若干異なる 場合があります。いずれの場合 でも PVS のほうが少しだけ効率が良いか、ほぼ同じ効率 になるという結果になります。

前向きな枝狩り

これまでに紹介した αβ 法、スカウト法、PVS などの枝狩りの手法は 枝狩りの有無によって得られる評価値が変化しない アルゴリズムで、そのような枝狩りのアルゴリズムを 後ろ向きの枝狩り（backward pruning）と呼びます。ゲーム木のルートノードを先頭と考えた場合に、探索を行う際に リーフノード（または深さの上限のノード）を最初に計算 し、そこから ルートノードに向かって後ろ向きに計算を行いながら枝狩りを行う ことが「後ろ向き」の由来です。αβ 法などの 後ろ向きの枝狩り のアルゴリズムで行われる α 狩りや β 狩りなどの 枝狩り は、評価値の計算に絶対に影響を及ぼさないことが確認できた場合 に行われるので、静的評価関数が計算した評価値に対する 正確なミニマックス値²を計算することができます。

一方、枝狩りを行うことで得られる評価値が変化する可能性がある 枝狩りのアルゴリズムの事を 前向きの枝狩り（forward pruning）と呼びます。前向きの枝狩りでは ルートノードから深さの浅い順に計算した評価値を元に枝狩りを行う ので「前向き」と呼びます。前向きの枝狩りでは、枝狩りを行っても 評価値の計算に影響を及ぼす確率が低いと考えられる 子ノードに対して枝狩りを行います。影響を及ぼさない確率を 100 % としないことで 後ろ向きの枝狩りより多くの枝狩りを行う ことができるので、以下のような利点と欠点が生じます。

利点：より多くの枝狩りを行うことで、同じ時間 で 後ろ向きの枝狩りよりも深くまで 探索を行うことができる
欠点：枝狩りを行ったノードの中に 評価値の計算に影響を及ぼすノードが含まれている可能性がある ため、正確な評価値が計算できない場合が生じる

わかりづらいと思いますので、将棋を具体例に挙げて説明します。将棋では 多くの局面で合法手が数十存在 します。そのようなゲームでミニマックス法で評価値を計算すると、深さが 1 増えるごと にゲーム木の ノードの数が数十倍 になってしまうため、あまり 多くの深さまでの探索を行うことができません。また、αβ 法 などの 前向きな枝狩り を行うアルゴリズムを使っても、同じ時間で探索できる 深さは 2 倍くらい にしかなりません。

人間が将棋のような 合法手が数十以上 も存在するような ゲームを遊ぶ場合 に、すべての合法手を検討する人はほとんどいない と思います。人間は合法手の中で 最善手の可能性が高そうな一部の合法手に絞って考える ことで、検討する 局面の数を大幅に減らし、より先の局面まで検討できるようにする のが一般的でしょう。この最善手でない 可能性が高い（100 % ではない）合法手を検討しない という考え方が 前向きな枝狩り に相当します。

実際に将棋では、ほとんどの局面 で 最善手と最善手に近い合法手 は 数個しかない ので、例えば前向きの枝狩りによって 検討する合法手の数を 1/10 に減らすことができれば、深さの上限が $\boldsymbol{d}$ の深さ制限探索で 計算するノードの数 が $\boldsymbol{10^d}$ 分の 1 のように 大幅に減ります。

前向きの枝狩りは、計算される評価値が変化 することで 評価値の精度が低くなる 可能性が生じますが、より深くまでゲーム木を探索できる ようになることで 評価値の精度を高める ことができます。前向きの枝狩りによって 総合的に評価値の精度が高まるか どうかは ゲームの性質によります。深くまで探索することで大きく精度が高まる ようなゲームの場合は 前向きの枝狩りが重要 となりますが、前向きの枝狩りに向いていないゲームもあります。

人間の場合は、合法手が最善手でない可能性が高いかどうかは、その人のゲームに対する経験や直観などから判断しますが、AI の場合は何らかの方法でその判断を行います。また、その 判断の精度が高い程 計算される 評価値の精度が高まります。

実際に使われる 前向きな枝狩りの手法 として、Prob Cut やその改良版である Multi Prob Cut という手法があります。詳細は長くなるので省略しますが、Prob Cut は 深さの上限が浅い探索 で計算された評価値が、深さの上限が深い探索 で計算された 評価値と近い値になる ことを利用した前向きの枝狩りです。また、深さの上限が浅い探索の計算方法として反復深化の手法が利用される場合があります。興味がある方は、下記の Chess Programming Wiki の Prob Cut の記事を参考にして下さい。

〇×ゲームにはうまく適用できないので本記事では紹介しませんが、チェスや将棋などのゲームでは 他にも様々な前向きな枝狩り の手法が考えられています。興味がある方は下記の Chess Programming Wiki の枝狩りの記事の中の、前向きな枝狩りを表す forward pruning techniques の項目などを参考に調べてみると良いでしょう。

`ai_by_mmscore` での PV の計算

以下の内容はおまけです。前回の記事で 置換表から PV を計算する方法 を紹介したので、@ai_by_mmscore がデコレートした関数に対して analyze=True をキーワード引数に記述して呼び出した際に__PV を計算する__ ように ai_by_mmscore を改良することにします。

下記は、そのように ai_by_mmscore を修正したプログラムです。

11 ~ 25 行目：PV は share_tt が True でなければ計算できないので、その場合に計算するように修正する
13 行目：PV を計算するためには、mb_orig の局面から PV の着手を行わなければならないので mb_orig を mb にコピーする
14 ~ 25 行目：前回の記事で PV を計算するプログラムと同じ。ただし、print によるメッセージの表示は削除した。なお、前回と前々回の記事では 24 行目が elif mb.status != Marubatsu.PLAYING となっていて間違っていたので else: に修正した（前回と前々回の記事は修正済）
26、27 行目：share_tt が False の場合は、PV として mb_orig の最善手しか計算できないので、PV には計算した最善手である bestmove のみを要素とする list を代入する
33 行目：元は bestmove の値を最善手として記録していたが、14 ~ 25 行目の処理で bestmove の値が PV の値で変化するので bestmove を記録してはいけない点に注意すること。mb_orig の局面の最善手は share_tt の値に関わらず PV の最初の要素に代入されるようにした、それを記録するように修正した
36 行目：返り値の dict の "PV" のキーに、PV の値を代入する

 1  from copy import deepcopy
 2  from functools import wraps
 3  from time import perf_counter
 4  from random import choice
 5  
 6  def ai_by_mmscore(eval_func):
 7      @wraps(eval_func)
 8      def wrapper(mb_orig, debug=False, *args, rand=True, share_tt=True,
 9                  analyze=False, calc_score=False, **kwargs):       
元と同じなので省略
10          if analyze:
11              if share_tt:
12                  PV = []
13                  mb = deepcopy(mb_orig)
14                  while mb.status == Marubatsu.PLAYING:
15                      PV.append(bestmove)
16                      x, y = bestmove
17                      if mb.board[x][y] != Marubatsu.EMPTY:
18                          print("そのマスには着手済みです")
19                          break
20                      mb.move(x, y)
21                      boardtxt = mb.board_to_str()
22                      if boardtxt in tt:
23                          _, _, bestmove = tt[boardtxt]
24                      else:
25                          break                
26              else:
27                  PV = bestmove
28              return {
29                  "candidate": best_moves,
30                  "score_by_move": score_by_move,
31                  "tt": tt,
32                  "time": perf_counter() - starttime,
33                  "bestmove": PV[0],
34                  "score": best_score,
35                  "count": totalcount,
36                  "PV": PV,
37              }
元と同じなので省略

行番号のないプログラム

from copy import deepcopy
from functools import wraps
from time import perf_counter
from random import choice

def ai_by_mmscore(eval_func):
    @wraps(eval_func)
    def wrapper(mb_orig, debug=False, *args, rand=True, share_tt=True,
                analyze=False, calc_score=False, **kwargs):       
        if calc_score:
            score, count = eval_func(mb_orig, debug, *args, **kwargs)
            return score
        
        starttime = perf_counter()
        dprint(debug, "Start ai_by_mmscore")
        dprint(debug, mb_orig)
        legal_moves = mb_orig.calc_legal_moves()
        dprint(debug, "legal_moves", legal_moves)
        maxnode = mb_orig.turn == Marubatsu.CIRCLE
        best_score = float("-inf") if maxnode else float("inf")
        best_moves = []
        tt = {} if share_tt else None
        totalcount = 0
        if analyze:
            score_by_move = {}
        for move in legal_moves:
            dprint(debug, "=" * 20)
            dprint(debug, "move", move)
            mb = deepcopy(mb_orig)
            x, y = move
            mb.move(x, y)
            dprint(debug, mb)
            score, count = eval_func(mb, debug, tt=tt, *args, **kwargs)
            totalcount += count
            dprint(debug, "score", score, "best score", best_score)
            if analyze:
                score_by_move[move] = score
            
            if (maxnode and best_score < score) or (not maxnode and best_score > score):
                best_score = score
                best_moves = [move]
                dprint(debug, "UPDATE")
                dprint(debug, "  best score", best_score)
                dprint(debug, "  best moves", best_moves)
            elif best_score == score:
                best_moves.append(move)
                dprint(debug, "APPEND")
                dprint(debug, "  best moves", best_moves)

        dprint(debug, "=" * 20)
        dprint(debug, "Finished")
        dprint(debug, "best score", best_score)
        dprint(debug, "best moves", best_moves)
        bestmove = choice(best_moves) if rand else best_moves[0]
        if analyze:
            if share_tt:
                PV = []
                mb = deepcopy(mb_orig)
                while mb.status == Marubatsu.PLAYING:
                    PV.append(bestmove)
                    x, y = bestmove
                    if mb.board[x][y] != Marubatsu.EMPTY:
                        print("そのマスには着手済みです")
                        break
                    mb.move(x, y)
                    boardtxt = mb.board_to_str()
                    if boardtxt in tt:
                        _, _, bestmove = tt[boardtxt]
                    else:
                        break                
            else:
                PV = bestmove
            return {
                "candidate": best_moves,
                "score_by_move": score_by_move,
                "tt": tt,
                "time": perf_counter() - starttime,
                "bestmove": PV[0],
                "score": best_score,
                "count": totalcount,
                "PV": PV,
            }
        else:
            return bestmove
        
    return wrapper

修正箇所

from copy import deepcopy
from functools import wraps
from time import perf_counter
from random import choice

def ai_by_mmscore(eval_func):
    @wraps(eval_func)
    def wrapper(mb_orig, debug=False, *args, rand=True, share_tt=True,
                analyze=False, calc_score=False, **kwargs):       
元と同じなので省略
        if analyze:
+           if share_tt:
+               PV = []
+               mb = deepcopy(mb_orig)
+               while mb.status == Marubatsu.PLAYING:
+                   PV.append(bestmove)
+                   x, y = bestmove
+                   if mb.board[x][y] != Marubatsu.EMPTY:
+                       print("そのマスには着手済みです")
+                       break
+                   mb.move(x, y)
+                   boardtxt = mb.board_to_str()
+                   if boardtxt in tt:
+                       _, _, bestmove = tt[boardtxt]
+                   else:
+                       break                
+           else:
+               PV = bestmove
            return {
                "candidate": best_moves,
                "score_by_move": score_by_move,
                "tt": tt,
                "time": perf_counter() - starttime,
-               "bestmove": bestmove,
+               "bestmove": PV[0],
                "score": best_score,
                "count": totalcount,
                "PV": PV,
            }
元と同じなので省略

次に、下記のプログラムで修正した ai_by_mmscore を用いて ai_pvs_dls を再定義 する必要があります。ai_pvs_dls のプログラムの内容は先ほどと同じなので折りたたみました。

ai_pvs_dls の再定義

@ai_by_mmscore
def ai_pvs_dls(mb, debug=False, timelimit_pc=None, maxdepth=1, eval_func=None, eval_params={},
           tt=None, tt_for_mo=None):

    count = 0
    def ab_search(mborig, depth, tt, alpha=float("-inf"), beta=float("inf")):
        nonlocal count
        if timelimit_pc is not None and perf_counter() >= timelimit_pc:
            raise RuntimeError("time out")
        
        count += 1
        if mborig.status != Marubatsu.PLAYING or depth == maxdepth:
            return eval_func(mborig, calc_score=True, **eval_params)
        
        boardtxt = mborig.board_to_str()
        if boardtxt in tt:
            lower_bound, upper_bound, _ = tt[boardtxt]
            if lower_bound == upper_bound:
                return lower_bound
            elif upper_bound <= alpha:
                return upper_bound
            elif beta <= lower_bound:
                return lower_bound
            else:
                alpha = max(alpha, lower_bound)
                beta = min(beta, upper_bound)
        else:
            lower_bound = min_score
            upper_bound = max_score
        
        alphaorig = alpha
        betaorig = beta

        legal_moves = mborig.calc_legal_moves()
        if tt_for_mo is not None:
            if boardtxt in tt_for_mo:
                _, _, bestmove = tt_for_mo[boardtxt]
                index = legal_moves.index(bestmove)
                legal_moves[0], legal_moves[index] = legal_moves[index], legal_moves[0]
        if mborig.turn == Marubatsu.CIRCLE:
            x, y = legal_moves[0]
            mb = deepcopy(mborig)
            mb.move(x, y)
            score = ab_search(mb, depth + 1, tt, alpha, beta)
            alpha = max(alpha, score)
            bestmove = (x, y)
            if score < beta:
                for x, y in legal_moves[1:]:
                    mb = deepcopy(mborig)
                    mb.move(x, y)
                    abscore = ab_search(mb, depth + 1, tt, alpha, alpha + 1)
                    if abscore > score:
                        bestmove = (x, y)
                    score = max(score, abscore)
                    if score >= beta:
                        break
                    elif score > alpha:
                        abscore = ab_search(mb, depth + 1, tt, alpha, beta)
                        if abscore > score:
                            bestmove = (x, y)
                        score = max(score, abscore)
                        if score >= beta:
                            break
                        alpha = max(alpha, score)                    
        else:
            x, y = legal_moves[0]
            mb = deepcopy(mborig)
            mb.move(x, y)
            score = ab_search(mb, depth + 1, tt, alpha, beta)
            beta = min(beta, score)
            bestmove = (x, y)
            if score > alpha:
                for x, y in legal_moves[1:]:
                    mb = deepcopy(mborig)
                    mb.move(x, y)
                    abscore = ab_search(mb, depth + 1, tt, beta - 1, beta)
                    if abscore < score:
                        bestmove = (x, y)
                    score = min(score, abscore)
                    if score <= alpha:
                        break
                    elif score < beta:
                        abscore = ab_search(mb, depth + 1, tt, alpha, beta)
                        if abscore < score:
                            bestmove = (x, y)
                        score = min(score, abscore)
                        if score <= alpha:
                            break
                        beta = min(beta, score)   
                    
        from util import calc_same_boardtexts

        boardtxtlist = calc_same_boardtexts(mborig, bestmove)
        if score <= alphaorig:
            upper_bound = score
        elif score < betaorig:
            lower_bound = score
            upper_bound = score
        else:
            lower_bound = score
        for boardtxt, move in boardtxtlist:
            tt[boardtxt] = (lower_bound, upper_bound, move)
        return score
                
    min_score = float("-inf")
    max_score = float("inf")
    
    if tt is None:
        tt = {}
    score = ab_search(mb, depth=0, tt=tt, alpha=min_score, beta=max_score)
    dprint(debug, "count =", count)
    return score, count

上記の再定義後に、下記のプログラムで ゲーム開始時の局面 に対して深さの上限を 9 として ai_pvs_dls を実行して 最善手、PV、PV を着手し続けた場合の局面の一覧 を表示すると、実行結果のように引き分けになる PV が正しく計算できている ことが確認できます。

result = ai_pvs_dls(mb, eval_func=ai14s, eval_params=eval_params, maxdepth=9, analyze=True)
print(result["bestmove"], result["PV"])
for x, y in result["PV"]:
    print(f"({x}, {y})")
    mb.move(x, y)
    print(mb)

実行結果（最初の着手はランダムなので下記と実行結果が異なる場合があります）

(2, 2) [(2, 2), (1, 1), (2, 1), (2, 0), (0, 2), (1, 2), (1, 0), (0, 1), (0, 0)]
(2, 2)
Turn x
...
...
..O

(1, 1)
Turn o
...
.X.
..o

(2, 1)
Turn x
...
.xO
..o

(2, 0)
Turn o
..X
.xo
..o

(0, 2)
Turn x
..x
.xo
O.o

(1, 2)
Turn o
..x
.xo
oXo

(1, 0)
Turn x
.Ox
.xo
oxo

(0, 1)
Turn o
.ox
Xxo
oxo

(0, 0)
winner draw
Oox
xxo
oxo

`ai_ab_iddfs` の計算結果が変わる理由とその修正

今回の記事を書いた後で、ai_ab_iddfs の計算結果の count の値 が JupyterLab を起動して実行する度に毎回変わる ことに気が付きました。下記は、先程行った 今回の記事の ai_ab_iddfs の計算結果 です。なお、ai_pvs_iddfs の計算結果は省略しました。

maxdepth: 0, time:   0.00/  0.00 ms, count     9/    9, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 1, time:   0.01/  0.01 ms, count    33/   42, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 2, time:   0.01/  0.03 ms, count    66/  108, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 3, time:   0.02/  0.05 ms, count   160/  268, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 4, time:   0.03/  0.07 ms, count   269/  537, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 5, time:   0.03/  0.10 ms, count   447/  984, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 6, time:   0.09/  0.20 ms, count   578/ 1562, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 7, time:   0.05/  0.25 ms, count   548/ 2110, bestmove: (0, 1), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
maxdepth: 8, time:   0.07/  0.31 ms, count   542/ 2652, bestmove: (0, 0), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
totaltime:   0.31 ms

下記は前回の記事での ai_ab_iddfs の計算結果で、上記と count の値が若干異なる ことが確認できます。このように ai_ab_iddfs の計算結果の count の値 は JupyterLab を起動（または再起動） すると 実行するたびに毎回異なります。実行結果は示しませんが ai_pvs_iddfs も同じ理由で毎回 count の値が異なる ので興味がある方は確認して下さい。

maxdepth: 0, time:   0.00/  0.00 ms, count     9/    9, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 1, time:   0.01/  0.01 ms, count    33/   42, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 2, time:   0.01/  0.02 ms, count    66/  108, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 3, time:   0.03/  0.05 ms, count   162/  270, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 4, time:   0.02/  0.07 ms, count   272/  542, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 5, time:   0.03/  0.10 ms, count   446/  988, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 6, time:   0.04/  0.14 ms, count   578/ 1566, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 7, time:   0.03/  0.17 ms, count   530/ 2096, bestmove: (1, 2), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
maxdepth: 8, time:   0.03/  0.20 ms, count   533/ 2629, bestmove: (0, 0), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
totaltime:   0.20 ms

この原因について調べてみたところ、calc_sameboardtxt の 返り値を set としたことが原因 であることが判明しました。その原因を初心者が自分で見つけるのは難しいと思いましたのでその理由と修正方法について補足します。

set の性質

set は list や dict と異なり、set に 要素を追加した順番 と、set から 要素を取り出す順番が一致しない という性質を持ちます。下記の公式の Python のドキュメントの set の説明に「A set object is an unordered collection」のように 整列されていない（unordered） とあることから、これは Python の仕様 のようです。

例えば下記のプログラムで set に 5 つの文字列を追加 した後で set の内容を表示 すると、追加した文字列の順番 と 表示された set の中身 の 順番が一致しない ことが確認できます。

a = set()
a.add("o....")
a.add(".o...")
a.add("..o..")
a.add("...o.")
a.add("....o")
print(a)

実行結果（下記と実行結果が異なる場合があります）

{'....o', 'o....', '..o..', '.o...', '...o.'}

また、下記のプログラムの実行結果のように for 文で set から要素を取り出す と、上記で表示された順番 で取り出されます。

for data in a:
    print(data)

実行結果

....o
o....
..o..
.o...
...o.

なお、下記のプログラムのように 上記と同じ処理を行っても実行結果が変わらない ことから、set に登録された データの順番が後から変わることはありません。

for data in a:
    print(data)

実行結果

....o
o....
..o..
.o...
...o.

JupyterLab を再起動した場合の set の要素の順番

set の要素の順番 は JupyterLab を新しく起動 するか、再起動 して Python のプログラムを再実行するたび に 同じデータであっても 下記のプログラムのように 変化します。この後で説明しますが、この性質が ai_ab_iddfs の計算結果の count の値 が JupyterLab を起動して実行するたびに毎回変わる 原因です。

# JupyterLab を再起動後に下記のプログラムを実行する
a = set()
a.add("o....")
a.add(".o...")
a.add("..o..")
a.add("...o.")
a.add("....o")
print(a)

実行結果（下記と実行結果が異なる場合があります）

{'...o.', '..o..', '....o', '.o...', 'o....'}

下記は先ほどの実行結果です。上記と見比べてみて下さい。

{'....o', 'o....', '..o..', '.o...', '...o.'}

下記のプログラムのように JupyterLab を再起動せず に 同じ set を作り直した場合 は 同じ順番で登録 されます。このことから set の要素の順番 は JupterLab を起動するたびに変化する何らかの値によって決まる のではないかと思われます。

# JupyterLab を再起動せずにに下記のプログラムを実行する
a = set()
a.add("o....")
a.add(".o...")
a.add("..o..")
a.add("...o.")
a.add("....o")
print(a)

実行結果

{'...o.', '..o..', '....o', '.o...', 'o....'}

set の要素の順番の違いによる置換表の変化

ai_abs_dls や ai_pvs_dls 内で 置換表にデータを記録 する際に、下記のプログラムのよう calc_same_boardtext で計算された set からデータを取り出して置換表に登録 します。この時に、set の要素の順番が変化 すると 異なるデータが置換表に登録される場合 が生じます。

        boardtxtlist = calc_same_boardtexts(mborig, bestmove)
略
        for boardtxt, move in boardtxtlist:
            tt[boardtxt] = (lower_bound, upper_bound, move)

具体的を上げて説明します。ゲーム開始後に (1, 1) に着手を行った局面 に対する 最善手は四隅のマス なのでその中の (0, 0) が最善手として計算された場合の処理 を行ってみることにします。下記は、その条件で上記のプログラムを実行するプログラムです。置換表のデータは空の dict で初期化 し、lower_bound と upper_bound の値は今回の検証では重要ではないのでそれぞれ 0 を代入しました。

from util import calc_same_boardtexts
from marubatsu import Marubatsu

mb = Marubatsu()
mb.move(1, 1)
print(mb)
boardtxtlist = calc_same_boardtexts(mb, (0, 0))
print("boardtxtlist")
print(boardtxtlist)
tt = {}
print("登録後の置換表の値")
for boardtxt, move in boardtxtlist:
    tt[boardtxt] = (0, 0, move)
    print(tt)

実行結果

Turn x
...
.O.
...

boardtxtlist
{('....o....', (2, 2)), ('....o....', (2, 0)), ('....o....', (0, 0)), ('....o....', (0, 2))}
登録後の置換表の値
{'....o....': (0, 0, (2, 2))}
{'....o....': (0, 0, (2, 0))}
{'....o....': (0, 0, (0, 0))}
{'....o....': (0, 0, (0, 2))}

この局面は、8 種類の同一局面 が 完全に同じ局面 になりますが、それぞれの 同一局面に対する (0, 0) の座標 は いずれかの四隅の座標 になるので、上記の実行結果の boardtxtlist の次の行で表示される返り値の set には 真ん中に着手が行われた局面を表す文字列 と 四隅の座標 を表す 4 つの要素が代入 されます。その後でその 4 つのデータ が set から順番に取り出されて置換表に記録 されますが、4 つのデータの局面は同じ なので、tt['....o....'] のデータが 上書き されて 最後に set から取り出されたデータのみが置換表に登録 されます。上記の場合は置換表の中身が {'....o....': (0, 0, (0, 2))} になります。

次に、JupyterLab を再起動して下記の先程と同じプログラムを実行 すると、下記の実行結果のように set の要素の順番が変わる ため、置換表 の中身が 上記とは異なる {'....o....': (0, 0, (2, 0))} になります。

# JupyterLab を再起動後に下記のプログラムを実行する
from marubatsu import Marubatsu
from util import calc_same_boardtexts

mb = Marubatsu()
mb.move(1, 1)
print(mb)
boardtxtlist = calc_same_boardtexts(mb, (0, 0))
print("boardtxtlist")
print(boardtxtlist)
tt = {}
print("登録後の置換表の値")
for boardtxt, move in boardtxtlist:
    tt[boardtxt] = (0, 0, move)
    print(tt)

実行結果

Turn x
...
.O.
...

boardtxtlist
{('....o....', (0, 0)), ('....o....', (2, 2)), ('....o....', (0, 2)), ('....o....', (2, 0))}
登録後の置換表の値
{'....o....': (0, 0, (0, 0))}
{'....o....': (0, 0, (2, 2))}
{'....o....': (0, 0, (0, 2))}
{'....o....': (0, 0, (2, 0))}

置換表 に登録された 局面の最善手が変化 すると move ordering も変わってしまう ため、枝狩りが変化 して同じ局面に対して 計算されるノードの数が変化 します。これが、ai_ab_iddfs の計算結果の count の値 が JupyterLab を起動（または再起動） して実行する度に 毎回変わる原因 です。

ただし、上記のように set の順番が変わっても 置換表に登録されるデータが最善手であることに変わりはない ので calc_same_boardtxt や置換表にデータを登録する 処理が間違っているわけではありません。また、move ordering が変化すると計算するノードの数は変わりますが、最善手が計算される点は変わらない ので 現状の ai_abs_dls や ai_pvs_dls が 間違った計算を行うわけではありません。

`calc_same_boardtxt` の修正

上記で説明したように現状のプログラムが間違った計算を行うわけではありませんが、先程のように 置換表の同じ局面のデータ に対して 複数回のデータを代入して上書きする処理 は 無駄な処理 なので修正することにします。

calc_same_boardtxt が計算するデータを set とした理由の一つ は、集合を表す set には同じ要素が重複して登録されない という性質があるからです。calc_same_boardtxt は 最初は局面を表すデータのみ を要素として持つ set を計算していたので、さきほどのゲーム開始直後に (1, 1) に着手した局面のように、8 つの同一局面が一致した場合 は計算された set の要素は 1 つだけ になります。

一方、前回の記事の修正によって calc_same_boardtxt が計算する set の要素が (局面のデータを表す文字列, 合法手) という tuple に変化 したため、先程の例では 同じ局面に対して異なる合法手が記録 された 4 つの要素を持つ set が計算される ようになってしまいました。

そこで、calc_same_boardtxt の返り値 を set ではなく、ゲーム盤を表す文字列をキー とし、その キーの値に合法手を代入 する dict として計算するように修正することにします。そのように修正することで、calc_same_boardtxt が計算した dict には、同じ局面に対して一つの合法手だけが記録 されるようになるので、先程のような同じ局面に対して合法手が変化するという問題はなくなります。

下記は、そのように calc_sameboardtxt を修正したプログラムです。

5 行目：boardtexts を「mb の局面を表す文字列」のキーの値に move を代入した dict で初期化するように修正する
12 行目：同一局面と、同一局面に対応する合法手を計算した後で、boardtexts の「同一局面を表す文字列」のキーの値に「計算した合法手」を代入するように修正する。

 1  def calc_same_boardtexts(mb, move=None):
元と同じなので省略
 2      if move is None:
 3          boardtexts = set([mb.board_to_str()])
 4      else:
 5          boardtexts = { mb.board_to_str(): move }
 6      for xa, xb, xc, ya, yb, yc, xa2, xb2, xc2, ya2, yb2, yc2 in data:
元と同じなので省略
 7          if move is None:
 8              boardtexts.add(txt)
 9          else:
10              x, y = move
11              x, y = xa2 * 2 + xb2 * x + xc2 * y, ya2 * 2 + yb2 * x + yc2 * y
12              boardtexts[txt] = (x, y)
13      return boardtexts

行番号のないプログラム

def calc_same_boardtexts(mb, move=None):
    data = [ [ 0,  0,  1, 1, -1,  0,  1,  0, -1, 0,  1,  0],
             [ 1, -1,  0, 1,  0, -1] * 2,
             [ 1,  0, -1, 0,  1,  0,  0,  0,  1, 1, -1,  0],
             [ 1, -1,  0, 0,  0,  1] * 2,
             [ 0,  1,  0, 1,  0, -1] * 2,
             [ 1,  0, -1, 1, -1,  0] * 2,
             [ 0,  0,  1, 0,  1,  0] * 2, ]
    if move is None:
        boardtexts = set([mb.board_to_str()])
    else:
        boardtexts = { mb.board_to_str(): move }
    for xa, xb, xc, ya, yb, yc, xa2, xb2, xc2, ya2, yb2, yc2 in data:
        txt = ""
        for x in range(mb.BOARD_SIZE):
            for y in range(mb.BOARD_SIZE):
                txt += mb.board[xa * 2 + xb * x + xc * y][ya * 2 + yb * x + yc * y]
        if move is None:
            boardtexts.add(txt)
        else:
            x, y = move
            x, y = xa2 * 2 + xb2 * x + xc2 * y, ya2 * 2 + yb2 * x + yc2 * y
            boardtexts[txt] = (x, y)
    return boardtexts

修正箇所

def calc_same_boardtexts(mb, move=None):
元と同じなので省略
    if move is None:
        boardtexts = set([mb.board_to_str()])
    else:
-       boardtexts = set([(mb.board_to_str(), move)])
+       boardtexts = { mb.board_to_str(): move }
    for xa, xb, xc, ya, yb, yc, xa2, xb2, xc2, ya2, yb2, yc2 in data:
元と同じなので省略
        if move is None:
            boardtexts.add(txt)
        else:
            x, y = move
            x, y = xa2 * 2 + xb2 * x + xc2 * y, ya2 * 2 + yb2 * x + yc2 * y
-           boardtexts.add((txt, (x, y)))
+           boardtexts[txt] = (x, y)
    return boardtexts

上記の修正後に、先程と同じ処理を行う下記のプログラムを実行すると、実行結果のように 1 つの要素を持つ dict が返り値として返され、置換表への登録処理が 1 回だけ行われる ようになったことが確認できます。なお、返り値が dict になった ので、9 行目を dict からキーとキーの値を変数に代入するように修正 する必要がある点に注意して下さい。

 1  mb = Marubatsu()
 2  mb.move(1, 1)
 3  print(mb)
 4  boardtxtlist = calc_same_boardtexts(mb, (0, 0))
 5  print("boardtxtlist")
 6  print(boardtxtlist)
 7  tt = {}
 8  print("登録後の置換表の値")
 9  for boardtxt, move in boardtxtlist.items():
10      tt[boardtxt] = (0, 0, move)
11      print(tt)

行番号のないプログラム

mb = Marubatsu()
mb.move(1, 1)
print(mb)
boardtxtlist = calc_same_boardtexts(mb, (0, 0))
print("boardtxtlist")
print(boardtxtlist)
tt = {}
print("登録後の置換表の値")
for boardtxt, move in boardtxtlist.items():
    tt[boardtxt] = (0, 0, move)
    print(tt)

修正箇所

mb = Marubatsu()
mb.move(1, 1)
print(mb)
boardtxtlist = calc_same_boardtexts(mb, (0, 0))
print("boardtxtlist")
print(boardtxtlist)
tt = {}
print("登録後の置換表の値")
-for boardtxt, move in boardtxtlist:
+for boardtxt, move in boardtxtlist.items():
    tt[boardtxt] = (0, 0, move)
    print(tt)

実行結果

Turn x
...
.O.
...

boardtxtlist
{'....o....': (0, 0)}
登録後の置換表の値
{'....o....': (0, 0, (0, 0))}

`ai_abs_dls` の修正

上記の修正にあわせて ai_abs_dls を下記のプログラムのように修正します。なお、プログラムは示しませんが、ai_pvs_dls に対しても同様の修正を行う必要があります。その修正は ai_new.py に反映させてあります。

8 行目：上記で修正した calc_same_boardtexts を利用するように、util.py からの calc_same_boardtexts のインポートをコメントにする。なお、この処理は util.py の calc_same_boardtexts を今回の記事の内容で修正した後は再び必要になるので、今回の記事の ai_new.py や次回以降の ai.py では再びコメントを外すことにする
12 行目：boardtxtlist が set から dict になったのでそれに合わせた修正を行う

 1  from ai import ai_by_mmscore, dprint
 2  from time import perf_counter
 3  from copy import deepcopy
 4  
 5  @ai_by_mmscore
 6  def ai_abs_dls(mb, debug=False, timelimit_pc=None, maxdepth=1, eval_func=None,
 7                 eval_params={}, use_tt=False, tt=None):
元と同じなので省略
 8  #       from util import calc_same_boardtexts
 9  
10          if use_tt:
11              boardtxtlist = calc_same_boardtexts(mborig, bestmove)
元と同じなので省略
12              for boardtxt, move in boardtxtlist.items():
13                  tt[boardtxt] = (lower_bound, upper_bound, move)
14          return score
元と同じなので省略

行番号のないプログラム

from ai import ai_by_mmscore, dprint
from time import perf_counter
from copy import deepcopy

@ai_by_mmscore
def ai_abs_dls(mb, debug=False, timelimit_pc=None, maxdepth=1, eval_func=None,
               eval_params={}, use_tt=False, tt=None):           
    count = 0
    def ab_search(mborig, depth, tt, alpha=float("-inf"), beta=float("inf")):
        nonlocal count
        if timelimit_pc is not None and perf_counter() >= timelimit_pc:
            raise RuntimeError("time out")
        
        count += 1
        if mborig.status != Marubatsu.PLAYING or depth == maxdepth:
            return eval_func(mborig, calc_score=True, **eval_params)
        
        if use_tt:
            boardtxt = mborig.board_to_str()
            if boardtxt in tt:
                lower_bound, upper_bound, _ = tt[boardtxt]
                if lower_bound == upper_bound:
                    return lower_bound
                elif upper_bound <= alpha:
                    return upper_bound
                elif beta <= lower_bound:
                    return lower_bound
                else:
                    alpha = max(alpha, lower_bound)
                    beta = min(beta, upper_bound)
            else:
                lower_bound = min_score
                upper_bound = max_score
        
        alphaorig = alpha
        betaorig = beta

        legal_moves = mborig.calc_legal_moves()
        if mborig.turn == Marubatsu.CIRCLE:
            score = float("-inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                abscore = ab_search(mb, depth + 1, tt, alpha, beta)
                if abscore > score:
                    bestmove = (x, y)
                score = max(score, abscore)
                if score >= beta:
                    break
                alpha = max(alpha, score)
        else:
            score = float("inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                abscore = ab_search(mb, depth + 1, tt, alpha, beta)
                if abscore < score:
                    bestmove = (x, y)
                score = min(score, abscore)
                if score <= alpha:
                    break
                beta = min(beta, score)   
            
#       from util import calc_same_boardtexts

        if use_tt:
            boardtxtlist = calc_same_boardtexts(mborig, bestmove)
            if score <= alphaorig:
                upper_bound = score
            elif score < betaorig:
                lower_bound = score
                upper_bound = score
            else:
                lower_bound = score
            for boardtxt, move in boardtxtlist.items():
                tt[boardtxt] = (lower_bound, upper_bound, move)
        return score
                
    min_score = float("-inf")
    max_score = float("inf")
    
    if tt is None:
        tt = {}
    score = ab_search(mb, depth=0, tt=tt, alpha=min_score, beta=max_score)
    dprint(debug, "count =", count)
    return score, count

修正箇所

from ai import ai_by_mmscore, dprint
from time import perf_counter
from copy import deepcopy

@ai_by_mmscore
def ai_abs_dls(mb, debug=False, timelimit_pc=None, maxdepth=1, eval_func=None,
               eval_params={}, use_tt=False, tt=None):
元と同じなので省略
+#      from util import calc_same_boardtexts

        if use_tt:
            boardtxtlist = calc_same_boardtexts(mborig, bestmove)
元と同じなので省略
-           for boardtxt, move in boardtxtlist:
+           for boardtxt, move in boardtxtlist.items():
                tt[boardtxt] = (lower_bound, upper_bound, move)
        return score
元と同じなので省略

次に、上記で修正した ai_abs_dls を利用するように ai_ab_iddfs を再定義する必要があります。プログラムは同じなので折りたたみました。

ai_abs_dlsの再定義

from time import perf_counter

def ai_ab_iddfs(mb, debug=False, timelimit=10, eval_func=None, eval_params={},
               use_tt=True, share_tt=True, analyze=False, moveordering=True):
    starttime = perf_counter()
    timelimit_pc = starttime + timelimit
    if moveordering:
        use_tt = True
        share_tt = True
        tt_for_mo = {}
    else:
        tt_for_mo = None
    bestmove = mb.calc_legal_moves()[0]
    totalcount = 0
    resultlist = []
    for maxdepth in range(9 - mb.move_count):
        try:
            result = ai_abs_dls(mb, maxdepth=maxdepth, timelimit_pc=timelimit_pc,
                                eval_func=eval_func, eval_params=eval_params,
                                use_tt=use_tt, share_tt=share_tt,
                                analyze=analyze or moveordering, tt_for_mo=tt_for_mo)
        except RuntimeError:
            dprint(debug, "time out")
            break
        if moveordering:
            tt_for_mo = result["tt"]
        totaltime = perf_counter() - starttime
        resultlist.append(result)
        if analyze or moveordering:
            candidate = result["candidate"]
            bestmove = result["bestmove"]
            count = result["count"]
            time = result["time"]
            totalcount += count
            dprint(debug, f"maxdepth: {maxdepth}, time: {time:6.2f}/{totaltime:6.2f} ms, count {count:5}/{totalcount:5}, bestmove: {bestmove}, candidate: {candidate}")
        else:
            bestmove = result
            dprint(debug, f"maxdepth: {maxdepth}, time: {totaltime:6.2f} ms, bestmove: {bestmove}")
    dprint(debug, f"totaltime: {perf_counter() - starttime: 6.2f} ms")
    if analyze:
        return resultlist
    else:
        return bestmove

上記の修正後に下記のプログラムで 先程と度同じ条件 で ai_ab_iddfs を実行すると実行結果のような表示が行われます。

from ai import ai_ab_iddfs, ai14s

mb = Marubatsu()
eval_params = {"minimax": True}
ai_ab_iddfs(mb, timelimit=10, debug=True, eval_func=ai14s,
            eval_params=eval_params, analyze=True, moveordering=True);

実行結果

maxdepth: 0, time:   0.00/  0.00 ms, count     9/    9, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 1, time:   0.01/  0.01 ms, count    33/   42, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 2, time:   0.02/  0.03 ms, count    66/  108, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 3, time:   0.03/  0.05 ms, count   158/  266, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 4, time:   0.02/  0.08 ms, count   269/  535, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 5, time:   0.03/  0.11 ms, count   458/  993, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 6, time:   0.04/  0.14 ms, count   584/ 1577, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 7, time:   0.03/  0.18 ms, count   546/ 2123, bestmove: (1, 0), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
maxdepth: 8, time:   0.03/  0.21 ms, count   545/ 2668, bestmove: (2, 2), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
totaltime:   0.21 ms

次に、上記のプログラムを ai_new.py と util_new.py に反映 し、JupyterLab を再起動 して下記のプログラムを実行すると、上記と同じ実行結果が表示 されるので、JupyterLab の再起動 によって ai_ab_iddfs の結果が変化しなくなった ことが確認できます。

# JupyterLab を再起動後に下記のプログラムを実行する
from ai_new import ai_ab_iddfs, ai14s
from marubatsu import Marubatsu

mb = Marubatsu()
eval_params = {"minimax": True}
ai_ab_iddfs(mb, timelimit=10, debug=True, eval_func=ai14s,
            eval_params=eval_params, analyze=True, moveordering=True);

実行結果

maxdepth: 0, time:   0.00/  0.00 ms, count     9/    9, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 1, time:   0.01/  0.01 ms, count    33/   42, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 2, time:   0.00/  0.01 ms, count    66/  108, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 3, time:   0.01/  0.02 ms, count   158/  266, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 4, time:   0.02/  0.04 ms, count   269/  535, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 5, time:   0.03/  0.07 ms, count   458/  993, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 6, time:   0.04/  0.10 ms, count   584/ 1577, bestmove: (1, 1), candidate: [(1, 1)]
maxdepth: 7, time:   0.03/  0.14 ms, count   546/ 2123, bestmove: (1, 0), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
maxdepth: 8, time:   0.03/  0.17 ms, count   545/ 2668, bestmove: (0, 2), candidate: [(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)]
totaltime:   0.17 ms

興味がある方は、ai_pvs_iddfs の結果も変化しなくなる ことを確認してみて下さい。

今回の記事のまとめ

今回の記事では Principal Variation Search と 反復深化を利用した PVS について紹介し、その実装を行いました。今回の記事でミニマックス法によるゲーム木の探索と、その効率化のための枝狩りの手法の説明は終了しますが、他にも様々な手法がありますので、興味がある方は調べてみて下さい。

本記事で入力したプログラム

リンク	説明
marubatsu.ipynb	本記事で入力して実行した JupyterLab のファイル
ai.py	本記事で更新した ai_new.py
util.py	本記事で更新した util_new.py

次回の記事

スカウト法は多くの場合は max ノードと min ノードの処理を一つにまとめるネガスカウト法で実装されるので、リンク先の記事のタイトルは Negascout になっています ↩
ゲームの決着がついていない局面に対して静的評価関数が計算した評価値には誤差があり、後ろ向きの枝狩りでのその誤差が無くなるわけではない点に注意して下さい ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Pythonで〇×ゲームのAIを一から作成する その184 反復深化を利用した Principal Variation Search

目次と前回の記事

これまでに作成したモジュール

反復深化を利用した Principal Variation Search

ai_pvs_dls の実装

仮引数 use_tt の削除

スカウト法に関する修正

精度が 100 % の move ordering を行う PVS の処理の効率

反復深化を利用した PVS の実装

前向きな枝狩り

ai_by_mmscore での PV の計算

ai_ab_iddfs の計算結果が変わる理由とその修正

set の性質

JupyterLab を再起動した場合の set の要素の順番

set の要素の順番の違いによる置換表の変化

calc_same_boardtxt の修正

ai_abs_dls の修正

今回の記事のまとめ

本記事で入力したプログラム

次回の記事

Pythonで〇×ゲームのAIを一から作成する　その184　反復深化を利用した Principal Variation Search

`ai_pvs_dls` の実装

仮引数 `use_tt` の削除

`ai_by_mmscore` での PV の計算

`ai_ab_iddfs` の計算結果が変わる理由とその修正

`calc_same_boardtxt` の修正

`ai_abs_dls` の修正