Pythonで〇×ゲームのAIを一から作成する　その173　ai_by_mmscore の定義と深さ制限探索による AI の実装

Last updated at 2025-06-02Posted at 2025-05-27

目次と前回の記事

これまでに作成したモジュール

以下のリンクから、これまでに作成したモジュールを見ることができます。

リンク	説明
marubatsu.py	Marubatsu、Marubatsu_GUI クラスの定義
ai.py	AI に関する関数
test.py	テストに関する関数
util.py	ユーティリティ関数の定義
tree.py	ゲーム木に関する Node、Mbtree クラスなどの定義
gui.py	GUI に関する処理を行う基底クラスとなる GUI クラスの定義

AI の一覧とこれまでに作成したデータファイルについては、下記の記事を参照して下さい。

深さ制限探索による AI の定義

今回の記事では、前回の記事で説明した 深さ制限探索 を利用した AI を実装します。

下記は前回の記事で説明した深さ制限探索アルゴリズムの再掲です。

ベースとなる αβ 法やスカウト法などの ゲーム木の探索 でノード $N$ に対するミニマックス値の近似値の計算を行う際に、下記の処理を行う。

深さの上限 $\boldsymbol{d}$ を決める
$N$ をルートノードとした、深さが $\boldsymbol{d}$ までのノード から構成される 部分木 に対して下記の計算を行う
- リーフノード以外 のノードの評価値の計算は、ベースとなるゲーム木の探索と同じ方法 で行う
- リーフノード の評価値の計算は 静的評価関数で行う

本記事で利用する用語

今回の記事でも前回の記事に引き続き下記の用語を使用します。

深さ制限探索 は、ゲーム木を探索する任意のアルゴリズム に対して適用することができます。本記事では深さ制限探索を適用する ゲーム木の探索 アルゴリズムの事を ベースとなるアルゴリズム と表記することにします。

任意の局面 に対して 局面の状況を表す評価値 を計算する関数のことを 評価関数 と呼びます。また、局面の情報だけから 評価値を計算する評価関数のことを 静的評価関数、ゲーム木の探索 を用いて評価値を計算する関数のことを 探索ベースの評価関数 と表記します。

前回の記事で説明したように、本記事では 深さの上限を $\boldsymbol{d}$、静的評価関数を eval_func とする 深さ制限探索(depth limited search) のことを下記のように表記することにします。なお、この表記法は 本記事独自 のもので、一般的な表記法ではありません。

DLS(ベースとなるアルゴリズム, d, eval_func)

`@ai_by_score` を利用した深さ制限探索の実装方法

下記は前回の記事で説明した、静的評価関数 を @ai_by_score でラップ した場合に行なわれる処理の再掲です。

@ai_by_score を利用して 定義された関数 が行う処理は、ラップされた静的評価関数を eval_func と表記すると下記のように表記できる。

DLS(ネガマックス法, 1, eval_func)

上記は 深さ 1 までのゲーム木の探索を行うので、深さの上限を 1 とする 深さ制限探索 と考えることができます。

下記は前回の記事で説明した、探索ベースの評価関数 を @ai_by_score でラップ した場合に行なわれる処理の再掲です。

ルートノード に対しては、探索ベースの評価関数を利用した DLS(ネガマックス法, 1, 探索ベースの評価関数) で処理を行う
深さ 1 のそれぞれの子ノード に対しては、ゲームの決着がついた局面の評価値を計算できる静的評価関数を利用した DLS(探索アルゴリズム, ∞, 静的評価関数) で処理を行う

@ai_by_score を利用 した 深さ制限探索の AI の実装方法 は、上記とよく似ています。具体的には、深さの上限を制限 した 探索ベースの評価関数 を @ai_by_score でラップ します。下記はその場合に行われる処理で、上記と異なる部分を太字にしました。

ルートノードに対しては、深さの上限を制限した 探索ベースの評価関数を利用した DLS(ネガマックス法, 1, 探索ベースの評価関数) で処理を行う
深さ 1 のそれぞれの子ノードに対しては、任意の局面 の評価値を計算できる静的評価関数を利用した DLS(探索アルゴリズム, 深さの上限, 静的評価関数) で処理を行う

深さの上限を制限 する 探索ベースの評価関数 を @ai_by_score によってラップ することで、深さ制限探索 によって着手を選択する AI の関数を定義 することができる。

深さ制限探索で着手を選択する場合の深さの上限の性質

深さ 1 のノードに対して行われる 深さ制限探索 の 深さの上限を $\boldsymbol{d}$ と表記すると、上記の処理は ルートノードから $\boldsymbol{1 + d}$ までの深さ のノードの探索を行うので、全体としてみる と 深さの上限を $\boldsymbol{d + 1}$ とした 深さ制限探索 が行われることになります。

$\boldsymbol{d ≧ 0}$ なので、深さ制限探索によって着手を選択する AI での ルートノードからの深さの上限は必ず 1 以上 になるということを意味します。そのようになる理由は、ルートノードの 最善手 を 子ノードの評価値によって計算 するためには、必ず 深さが 1 である ルートノードの 子ノードの評価値を計算する必要がある からです。

@ai_by_score が 2 種類の探索アルゴリズムを利用する理由 は、前回の記事で説明したように 最善手が複数存在する場合 にその中から ランダムに着手を選択できる ようにするためです。最善手の中から 1 つを選択できれば良い 場合は、1 種類の探索アルゴリズム で実装することができますが、その場合でも 子ノードの評価値を計算する必要があるので 深さの上限は必ず 1 以上 になります。

ルートノードの 子ノードの評価値によって最善手を計算 する AI は、深さの上限が 1 以上の深さ制限探索 を行う。

なお、ノードの評価値を計算 する場合は 深さの上限 は 0 でもかまいません。実際に 静的評価関数 は 深さの上限が 0 の深さ制限探索とみなすことができます。

`ai_by_mmscore` の定義

上記で説明した方法で @ai_by_score を利用して深さ制限探索の AI を定義することをできますが、その方法にはいくつかの問題があるため、本記事では 別の ai_by_mmscore という関数を定義 し @ai_by_mmscore を利用して深さ制限探索の AI を定義 することにします。

具体的には、探索ベースの評価関数 を @ai_by_score でラップ した場合に いくつかの問題が発生 します。深さ制限探索 は 探索ベースのゲーム木の探索を行う ので @ai_by_score を利用して定義すると 同じ問題が発生 します。どのような問題が発生するかについて少し考えてみて下さい。

`@ai_by_score` の問題点

@ai_by_score は前回の記事で説明したように ネガマックス法 で ルートノードの子ノードの探索 を行うため、子ノードの局面の評価値 として 直前の手番のプレイヤーが有利になるほど大きな値 を計算します。

ネガマックス法の評価関数は、前回の記事でネガマックス法の実装例で示したように、その局面の手番 のプレイヤーが有利になるほど大きな値を計算し、計算された 子ノードの評価値の符号を反転 するのが一般的ですが、@ai_by_score の評価関数 はそれとは 異なる評価値を計算 します。

@ai_by_score はネガマックス法でルートノードの評価値の計算を行いますが、評価関数が上記の逆に 直前の手番 のプレイヤーが有利になるほど大きな値を計算することで、計算された子ノードの評価値の 符号を反転しなくても良い ようにします。

そのことが原因で、ai_mmdfs や ai_ab_all などの、ゲーム木の探索 によって局面の ミニマックス値を計算 する場合に @ai_by_score を利用すると、評価関数が 評価値を 返り値として 返す処理を行う際 に、計算した ミニマックス値を 直前の手番のプレイヤーが有利になるほど大きな値になるように 変換する処理が必要 となります。

具体的には、ミニマックス値 は先手である 〇の手番のプレイヤーが有利なほど大きな値 を計算するので、〇の手番の場合 は計算した ミニマックス値の符号を反転 させる必要があります。その処理は ai_mmdfs などの関数の 最後に記述された下記のプログラム で行います。

    if mb.turn == Marubatsu.CIRCLE:
        score *= -1
    return score

上記の符号を反転する処理 は 直観的にわかりづらい ので、ミニマックス値を計算する評価関数 を利用できる ai_by_mmscore という 関数を定義 することにします。

ai_by_mmscore が行う処理は、以下の点を除けば ai_by_score とほぼ同じ です。名前の mmscore は ミニマックス値（mini-max score）によって着手を選択する ことが由来です。

評価関数が計算する評価値 が ミニマックス値 である
ルートノードの子ノードの計算 をネガマックス法ではなく、ミニマックス法で行う

置換表の共有の有無を表す仮引数の追加

前回の記事で @ai_by_score を使って定義された AI の関数が子ノードの評価値を計算する際に 置換表を共有する方法 を紹介しましたが、その方法は ai_mmdfs_tt(mb, tt={}, debug=True) のように キーワード引数 tt={} を記述して呼び出す いうものでした。

この方法には以下のような問題があります。

置換表を共有する際に記述する tt={} の 意味がわかりづらい
置換表を共有することで処理時間が大幅に減るため、置換表を共有する場合 は キーワード引数の記述を省略 できるようにし、置換表を共有しない場合 に何らかの キーワード引数を記述 するようにしたほうが便利である

この問題を解決する方法として、ai_by_score の ラッパー関数 wrapeer に True が代入 された場合に 置換表を共有（share） する処理を行う share_tt という仮引数を追加 し、下記のように ai_by_score を修正するという方法を考えた人がいるかもしれませんが、この方法ではうまくいかない場合 があります。

8 行目：デフォルト値を True とする share_tt という仮引数を追加する
10 行目：ローカル変数 tt に share_tt が True の場合は空の dict を、そうでない場合は None を代入する
20 行目：評価関数を呼び出す際にキーワード引数 tt=tt を記述するように修正することで、share_tt が True の場合は tt に代入された置換表を表す dict が共有され、None の場合は置換表が共有されなくなる

 1  from ai import dprint
 2  from functools import wraps
 3  from random import choice
 4  from copy import deepcopy
 5  
 6  def ai_by_score(eval_func):
 7      @wraps(eval_func)
 8      def wrapper(mb_orig, debug=False, *args, rand=True, share_tt=True,
 9                  analyze=False, calc_score=False, **kwargs):
元と同じなので省略
10          tt = {} if share_tt else None
11          if analyze:
12              score_by_move = {}
13          for move in legal_moves:
14              dprint(debug, "=" * 20)
15              dprint(debug, "move", move)
16              mb = deepcopy(mb_orig)
17              x, y = move
18              mb.move(x, y)
19              dprint(debug, mb)
20              score = eval_func(mb, debug, tt=tt, *args, **kwargs)
元と同じなので省略

行番号のないプログラム

from ai import dprint
from functools import wraps
from random import choice
from copy import deepcopy

def ai_by_score(eval_func):
    @wraps(eval_func)
    def wrapper(mb_orig, debug=False, *args, rand=True, share_tt=True,
                analyze=False, calc_score=False, **kwargs):
        if calc_score:
            return eval_func(mb_orig, debug, *args, **kwargs)

        dprint(debug, "Start ai_by_score")
        dprint(debug, mb_orig)
        legal_moves = mb_orig.calc_legal_moves()
        dprint(debug, "legal_moves", legal_moves)
        best_score = float("-inf")
        best_moves = []
        tt = {} if share_tt else None
        if analyze:
            score_by_move = {}
        for move in legal_moves:
            dprint(debug, "=" * 20)
            dprint(debug, "move", move)
            mb = deepcopy(mb_orig)
            x, y = move
            mb.move(x, y)
            dprint(debug, mb)
            score = eval_func(mb, debug, tt=tt, *args, **kwargs)
            dprint(debug, "score", score, "best score", best_score)
            if analyze:
                score_by_move[move] = score
            
            if best_score < score:
                best_score = score
                best_moves = [move]
                dprint(debug, "UPDATE")
                dprint(debug, "  best score", best_score)
                dprint(debug, "  best moves", best_moves)
            elif best_score == score:
                best_moves.append(move)
                dprint(debug, "APPEND")
                dprint(debug, "  best moves", best_moves)

        dprint(debug, "=" * 20)
        dprint(debug, "Finished")
        dprint(debug, "best score", best_score)
        dprint(debug, "best moves", best_moves)
        if analyze:
            return {
                "candidate": best_moves,
                "score_by_move": score_by_move,
            }
        elif rand:   
            return choice(best_moves)
        else:
            return best_moves[0]
        
    return wrapper

修正箇所

from ai import dprint
from functools import wraps
from random import choice
from copy import deepcopy

def ai_by_score(eval_func):
    @wraps(eval_func)
-   def wrapper(mb_orig, debug=False, *args, rand=True,
+   def wrapper(mb_orig, debug=False, *args, rand=True, share_tt=True,
                analyze=False, calc_score=False, **kwargs):
元と同じなので省略
+       tt = {} if share_tt else None
        if analyze:
            score_by_move = {}
        for move in legal_moves:
            dprint(debug, "=" * 20)
            dprint(debug, "move", move)
            mb = deepcopy(mb_orig)
            x, y = move
            mb.move(x, y)
            dprint(debug, mb)
-           score = eval_func(mb, debug, tt=tt, *args, **kwargs)
+           score = eval_func(mb, debug, tt=tt, *args, **kwargs)
元と同じなので省略

`@ai_by_score` の修正がうまくいく例

まず、この方法で うまくいく場合 について説明します。

上記の修正後に、修正した @ai_by_score を利用して 下記のプログラムのように 探索ベースの評価関数 を利用した ai_abs_all を再定義 します。ai_abs_all の定義そのものは何も変更していないのでプログラムは折りたたみました。

ai_abs_all の再定義

from marubatsu import Marubatsu

@ai_by_score
def ai_abs_all(mb, debug, shortest_victory=False, init_ab=False,
               use_tt=False, tt=None, ai_for_mo=None, params={},
               sort_allnodes=False, calc_count=False):           
    count = 0
    def ab_search(mborig, tt, alpha=float("-inf"), beta=float("inf")):
        nonlocal count
        count += 1
        if mborig.status == Marubatsu.CIRCLE:
            return (11 - mborig.move_count) / 2 if shortest_victory else 1
        elif mborig.status == Marubatsu.CROSS:
            return (mborig.move_count - 10) / 2 if shortest_victory else -1
        elif mborig.status == Marubatsu.DRAW:
            return 0
        
        if use_tt:
            boardtxt = mborig.board_to_str()
            if boardtxt in tt:
                lower_bound, upper_bound = tt[boardtxt]
                if lower_bound == upper_bound:
                    return lower_bound
                elif upper_bound <= alpha:
                    return upper_bound
                elif beta <= lower_bound:
                    return lower_bound
                else:
                    alpha = max(alpha, lower_bound)
                    beta = min(beta, upper_bound)
            else:
                lower_bound = min_score
                upper_bound = max_score
        
        alphaorig = alpha
        betaorig = beta

        legal_moves = mborig.calc_legal_moves()
        if ai_for_mo is not None:
            if sort_allnodes:
                score_by_move = ai_for_mo(mborig, analyze=True, **params)["score_by_move"]
                score_by_move_list = sorted(score_by_move.items(), key=lambda x:x[1], reverse=True)
                legal_moves = [x[0] for x in score_by_move_list]
            else:
                legal_moves = mborig.calc_legal_moves()
                bestmove = ai_for_mo(mborig, rand=False, **params)
                index = legal_moves.index(bestmove)
                legal_moves[0], legal_moves[index] = legal_moves[index], legal_moves[0]
        if mborig.turn == Marubatsu.CIRCLE:
            score = float("-inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                score = max(score, ab_search(mb, tt, alpha, beta))
                if score >= beta:
                    break
                alpha = max(alpha, score)
        else:
            score = float("inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                score = min(score, ab_search(mb, tt, alpha, beta))
                if score <= alpha:
                    break
                beta = min(beta, score)   
            
        from util import calc_same_boardtexts

        if use_tt:
            boardtxtlist = calc_same_boardtexts(mborig)
            if score <= alphaorig:
                upper_bound = score
            elif score < betaorig:
                lower_bound = score
                upper_bound = score
            else:
                lower_bound = score
            for boardtxt in boardtxtlist:
                tt[boardtxt] = (lower_bound, upper_bound)
        return score
                
    min_score = -2 if shortest_victory else -1
    max_score = 3 if shortest_victory else 1
    alpha = min_score if init_ab else float("-inf")
    beta = max_score if init_ab else float("inf")

    if tt is None:
        tt = {}
    score = ab_search(mb, tt=tt, alpha=alpha, beta=beta)
    dprint(debug, "count =", count)
    if calc_count:
        return count
    if mb.turn == Marubatsu.CIRCLE:
        score *= -1
    return score

下記のプログラムで ゲーム開始時の局面 に対して再定義した ai_abs_all を実行すると、実行結果のように 着手が選択 されます。

mb = Marubatsu()
ai_abs_all(mb, share_tt=False)

実行結果（実行結果はランダムなので下記の異なる表示が行われる場合があります）

(1, 2)

また、下記のプログラムで 置換表を共有する場合としない場合 で 強解決の AI であるか どうかを判定すると、いずれも強解決の AI である ことが確認できます。

置換表を共有する場合

from util import Check_solved
Check_solved.is_strongly_solved(ai_abs_all, params={"use_tt":True})

100%|██████████| 431/431 [00:02<00:00, 190.94it/s]
431/431 100.00%

置換表を共有しない場合

Check_solved.is_strongly_solved(ai_abs_all, params={"use_tt":True, "share_tt":False})

100%|██████████| 431/431 [00:03<00:00, 136.31it/s]
431/431 100.00%

また、前回の記事と同様に、下記のプログラムで ai2s との対戦を 100 回ずつ行うと、処理時間 が 26 秒と 92 秒となり 置換表を共有したほうが大幅に短くなる ことが確認できます。

from ai import ai2s, ai_match

ai_match(ai=[ai_abs_all, ai2s], params=[{"use_tt": True}, {}], match_num=100)
ai_match(ai=[ai_abs_all, ai2s], params=[{"use_tt": True, "share_tt": False}, {}], match_num=100)

実行結果

ai_abs_all VS ai2s
100%|██████████| 100/100 [00:26<00:00,  3.83it/s]
count     win    lose    draw
o          95       0       5
x          75       0      25
total     170       0      30

ratio     win    lose    draw
o       95.0%    0.0%    5.0%
x       75.0%    0.0%   25.0%
total   85.0%    0.0%   15.0%

ai_abs_all VS ai2s
100%|██████████| 100/100 [01:32<00:00,  1.08it/s]
count     win    lose    draw
o          96       0       4
x          72       0      28
total     168       0      32

ratio     win    lose    draw
o       96.0%    0.0%    4.0%
x       72.0%    0.0%   28.0%
total   84.0%    0.0%   16.0%

`@ai_by_score` の修正がうまくいかない例

次に、@ai_by_score を利用して、下記のプログラムのように常に評価値として 0 を計算する 静的評価関数 を利用した ai2s を再定義 します。

@ai_by_score
def ai2s(mb, debug=False):
    return 0

上記の実行後に下記のプログラムでゲーム開始時の局面に対して ai2s で着手を選択すると、実行結果のような エラーが発生 します。

ai2s(mb)

実行結果

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 ai2s(mb)

Cell In[1], line 29
     27 mb.move(x, y)
     28 dprint(debug, mb)
---> 29 score = eval_func(mb, debug, tt=tt, *args, **kwargs)
     30 dprint(debug, "score", score, "best score", best_score)
     31 if analyze:

TypeError: ai2s() got an unexpected keyword argument 'tt'

エラーメッセージは、ai2s が予期せぬ（unexpected）キーワード引数（keyword argument）tt を持つという意味で、実際に ai2s は置換表を利用しない静的評価関数 なので 仮引数 tt を持ちません。

先程の修正によって @ai_by_score が ラップできる評価関数 は以下のような性質を持つようになります。

ラップする評価関数は 仮引数 tt によって 置換表を共有 することができる、探索ベースの評価関数 である必要がある
置換表を利用しない 静的評価関数は利用できない

このように、先程の修正 によって @ai_by_score は 置換表を利用しない、静的評価関数 を ラップすることができなくなる という問題が発生します。

これから定義する ai_by_mmscore がラップする関数は ミニマックス値を計算 する 探索ベースの評価関数 です。従って、それらの関数を 仮引数 tt によって 置換表を共有 できるように 定義する ことで、上記の問題は発生しなくなります。

`ai_by_mmscore` の定義

ai_by_mmscore と ai_by_score の違いは以下の通りです。

ラップする 評価関数が計算する値 が ミニマックス値 である
ルートノードの子ノードの計算 をネガマックス法ではなく、ミニマックス法で行う
ラップする評価関数は 探索ベースの評価関数 で、置換表を共有するかどうかを表す 仮引数 share_tt を持つ

ai_by_mmscore は ai_by_score を修正 することで下記のプログラムのように定義できます。なお、share_tt に関する修正は先ほど説明済なので省略します。

1 行目：関数名を ai_by_mmscore に修正した
5 行目：mb_orig が max ノードであるかどうかを判定してローカル変数に代入する
6 行目：最善手の評価値の初期値をノードの種類に応じた値で初期化する
17 行目：12 行目で計算した子ノードのミニマックス値がこれまでの最善手よりも優れているかを、ミニマックス法でノードの種類に応じた条件式で判定するように修正する。なお、and 演算子のほうが or 演算子よりも優先順位が高いので条件式の中の () は不要だが、わかりやすさを重視して () を記述した

 1  def ai_by_mmscore(eval_func):
 2      @wraps(eval_func)
 3      def wrapper(mb_orig, debug=False, *args, rand=True, share_tt=True,
 4                  analyze=False, calc_score=False, **kwargs):
元と同じなので省略
 5          maxnode = mb_orig.turn == Marubatsu.CIRCLE
 6          best_score = float("-inf") if maxnode else float("inf")
 7          best_moves = []
 8          tt = {} if share_tt else None
 9          if analyze:
10              score_by_move = {}
11          for move in legal_moves:
元と同じなので省略
12              score = eval_func(mb, debug, tt=tt, *args, **kwargs)
13              dprint(debug, "score", score, "best score", best_score)
14              if analyze:
15                  score_by_move[move] = score
16              
17              if (maxnode and best_score < score) or (not maxnode and best_score > score):
18                  best_score = score
元と同じなので省略

行番号のないプログラム

def ai_by_mmscore(eval_func):
    @wraps(eval_func)
    def wrapper(mb_orig, debug=False, *args, rand=True, share_tt=True,
                analyze=False, calc_score=False, **kwargs):
        if calc_score:
            return eval_func(mb_orig, debug, *args, **kwargs)

        dprint(debug, "Start ai_by_mmscore")
        dprint(debug, mb_orig)
        legal_moves = mb_orig.calc_legal_moves()
        dprint(debug, "legal_moves", legal_moves)
        maxnode = mb_orig.turn == Marubatsu.CIRCLE
        best_score = float("-inf") if maxnode else float("inf")
        best_moves = []
        tt = {} if share_tt else None
        if analyze:
            score_by_move = {}
        for move in legal_moves:
            dprint(debug, "=" * 20)
            dprint(debug, "move", move)
            mb = deepcopy(mb_orig)
            x, y = move
            mb.move(x, y)
            dprint(debug, mb)
            score = eval_func(mb, debug, tt=tt, *args, **kwargs)
            dprint(debug, "score", score, "best score", best_score)
            if analyze:
                score_by_move[move] = score
            
            if (maxnode and best_score < score) or (not maxnode and best_score > score):
                best_score = score
                best_moves = [move]
                dprint(debug, "UPDATE")
                dprint(debug, "  best score", best_score)
                dprint(debug, "  best moves", best_moves)
            elif best_score == score:
                best_moves.append(move)
                dprint(debug, "APPEND")
                dprint(debug, "  best moves", best_moves)

        dprint(debug, "=" * 20)
        dprint(debug, "Finished")
        dprint(debug, "best score", best_score)
        dprint(debug, "best moves", best_moves)
        if analyze:
            return {
                "candidate": best_moves,
                "score_by_move": score_by_move,
            }
        elif rand:   
            return choice(best_moves)
        else:
            return best_moves[0]
        
    return wrapper

修正箇所

-def ai_by_score(eval_func):
+def ai_by_mmscore(eval_func):
    @wraps(eval_func)
    def wrapper(mb_orig, debug=False, *args, rand=True, share_tt=True,
                analyze=False, calc_score=False, **kwargs):
元と同じなので省略
+       maxnode = mb_orig.turn == Marubatsu.CIRCLE
-       best_score = float("-inf")
+       best_score = float("-inf") if maxnode else float("inf")
        best_moves = []
        tt = {} if share_tt else None
        if analyze:
            score_by_move = {}
        for move in legal_moves:
元と同じなので省略
            score = eval_func(mb, debug, tt=tt, *args, **kwargs)
            dprint(debug, "score", score, "best score", best_score)
            if analyze:
                score_by_move[move] = score
            
-           if maxnode and best_score < score:
+           if (maxnode and best_score < score) or (not maxnode and best_score > score):
                best_score = score
元と同じなので省略

`ai_abs_all` の再定義

ai_by_mmscore は 仮引数 tt を持ち、ミニマックス値を計算 する 探索ベースの評価関数 をラップすることができるので、それらの性質を持つ αβ 法 で着手を選択する ai_abs_all を ai_by_mmscore を利用して下記のプログラムのように 再定義 することにします。

1 行目：デコレータ式を @ai_by_mmscore に修正した
6 行目の後にあった ミニマックス値を変換する処理を削除 して 常にミニマックス値を返す ように修正した

1  @ai_by_mmscore
2  def ai_abs_all(mb, debug, shortest_victory=False, init_ab=False,
3                 use_tt=False, tt=None, ai_for_mo=None, params={},
4                 sort_allnodes=False, calc_count=False):           
元と同じなので省略
5      if calc_count:
6          return count
7      return score

行番号のないプログラム

@ai_by_mmscore
def ai_abs_all(mb, debug, shortest_victory=False, init_ab=False,
               use_tt=False, tt=None, ai_for_mo=None, params={},
               sort_allnodes=False, calc_count=False):           
    count = 0
    def ab_search(mborig, tt, alpha=float("-inf"), beta=float("inf")):
        nonlocal count
        count += 1
        if mborig.status == Marubatsu.CIRCLE:
            return (11 - mborig.move_count) / 2 if shortest_victory else 1
        elif mborig.status == Marubatsu.CROSS:
            return (mborig.move_count - 10) / 2 if shortest_victory else -1
        elif mborig.status == Marubatsu.DRAW:
            return 0
        
        if use_tt:
            boardtxt = mborig.board_to_str()
            if boardtxt in tt:
                lower_bound, upper_bound = tt[boardtxt]
                if lower_bound == upper_bound:
                    return lower_bound
                elif upper_bound <= alpha:
                    return upper_bound
                elif beta <= lower_bound:
                    return lower_bound
                else:
                    alpha = max(alpha, lower_bound)
                    beta = min(beta, upper_bound)
            else:
                lower_bound = min_score
                upper_bound = max_score
        
        alphaorig = alpha
        betaorig = beta

        legal_moves = mborig.calc_legal_moves()
        if ai_for_mo is not None:
            if sort_allnodes:
                score_by_move = ai_for_mo(mborig, analyze=True, **params)["score_by_move"]
                score_by_move_list = sorted(score_by_move.items(), key=lambda x:x[1], reverse=True)
                legal_moves = [x[0] for x in score_by_move_list]
            else:
                legal_moves = mborig.calc_legal_moves()
                bestmove = ai_for_mo(mborig, rand=False, **params)
                index = legal_moves.index(bestmove)
                legal_moves[0], legal_moves[index] = legal_moves[index], legal_moves[0]
        if mborig.turn == Marubatsu.CIRCLE:
            score = float("-inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                score = max(score, ab_search(mb, tt, alpha, beta))
                if score >= beta:
                    break
                alpha = max(alpha, score)
        else:
            score = float("inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                score = min(score, ab_search(mb, tt, alpha, beta))
                if score <= alpha:
                    break
                beta = min(beta, score)   
            
        from util import calc_same_boardtexts

        if use_tt:
            boardtxtlist = calc_same_boardtexts(mborig)
            if score <= alphaorig:
                upper_bound = score
            elif score < betaorig:
                lower_bound = score
                upper_bound = score
            else:
                lower_bound = score
            for boardtxt in boardtxtlist:
                tt[boardtxt] = (lower_bound, upper_bound)
        return score
                
    min_score = -2 if shortest_victory else -1
    max_score = 3 if shortest_victory else 1
    alpha = min_score if init_ab else float("-inf")
    beta = max_score if init_ab else float("inf")

    if tt is None:
        tt = {}
    score = ab_search(mb, tt=tt, alpha=alpha, beta=beta)
    dprint(debug, "count =", count)
    if calc_count:
        return count
    return score

修正箇所

-@ai_by_score
+@ai_by_mmscore
def ai_abs_all(mb, debug, shortest_victory=False, init_ab=False,
               use_tt=False, tt=None, ai_for_mo=None, params={},
               sort_allnodes=False, calc_count=False):           
元と同じなので省略
    if calc_count:
        return count
-   if mb.turn == Marubatsu.CIRCLE:
-       score *= -1
    return score

上記の修正後に下記のプログラムを実行すると、実行結果から 再定義した ai_abs_all が以下の場合に 強解決の AI であることが確認 できます。

置換表を利用しない場合
置換表を利用し、共有する場合
置換表を利用し、共有しない場合

Check_solved.is_strongly_solved(ai_abs_all)
Check_solved.is_strongly_solved(ai_abs_all, params={"use_tt":True})
Check_solved.is_strongly_solved(ai_abs_all, params={"use_tt":True, "share_tt": False})

実行結果

100%|██████████| 431/431 [00:03<00:00, 115.10it/s]
431/431 100.00%
100%|██████████| 431/431 [00:02<00:00, 207.38it/s]
431/431 100.00%
100%|██████████| 431/431 [00:03<00:00, 131.60it/s]
431/431 100.00%

`ai_mmdfs_all` の定義と `ai_scout`、`ai_mtdf` の再定義

ミニマックス法 で着手を選択する AI の関数は、置換表を利用しない ai_mmdfs と 利用する ai_mmdfs_tt の 2 種類をこれまで定義しましたが、先程説明した理由から置換表を利用しない ai_mmdfs を ai_by_mmscore で再定義することはできません。そこで、ミニマックス法 で 置換表の利用の有無を選択 できる ai_mmdfs_all を ai_mmdfs_tt を修正 して下記のプログラムのように定義することにします。

1 行目：デコレータ式を ai_by_mmscore に修正した
2 行目：デフォルト値を False とする仮引数 use_tt を追加した
6、8 ~ 13 行目：use_tt が True の場合のみ置換表を利用するように修正した
19 行目の後にあったミニマックス値を変換する処理を削除して常にミニマックス値を返すように修正した

 1  @ai_by_mmscore
 2  def ai_mmdfs_all(mb, debug=False, use_tt=False, tt=None, shortest_victory=False):
 3      count = 0
 4      def mm_search(mborig, tt):
元と同じなので省略
 5          boardtxt = mborig.board_to_str()
 6          if use_tt and boardtxt in tt:
 7              return tt[boardtxt]
元と同じなので省略
 8          if use_tt:
 9              from util import calc_same_boardtexts
10  
11              boardtxtlist = calc_same_boardtexts(mborig)
12              for boardtxt in boardtxtlist:
13                  tt[boardtxt] = score
14          return score
15      
16      if tt is None:
17          tt = {}
18      score = mm_search(mb, tt)
19      dprint(debug, "count =", count)
20      return score

行番号のないプログラム

@ai_by_mmscore
def ai_mmdfs_all(mb, debug=False, use_tt=False, tt=None, shortest_victory=False):
    count = 0
    def mm_search(mborig, tt):
        nonlocal count
        count += 1

        if mborig.status == Marubatsu.CIRCLE:
            return (11 - mborig.move_count) / 2 if shortest_victory else 1
        elif mborig.status == Marubatsu.CROSS:
            return (mborig.move_count - 10) / 2 if shortest_victory else -1
        elif mborig.status == Marubatsu.DRAW:
            return 0
        
        boardtxt = mborig.board_to_str()
        if use_tt and boardtxt in tt:
            return tt[boardtxt]
        
        legal_moves = mborig.calc_legal_moves()
        score_list = []
        for x, y in legal_moves:
            mb = deepcopy(mborig)
            mb.move(x, y)
            score_list.append(mm_search(mb, tt))
        if mborig.turn == Marubatsu.CIRCLE:
            score = max(score_list)
        else:
            score = min(score_list)
            
        if use_tt:
            from util import calc_same_boardtexts

            boardtxtlist = calc_same_boardtexts(mborig)
            for boardtxt in boardtxtlist:
                tt[boardtxt] = score
        return score
    
    if tt is None:
        tt = {}
    score = mm_search(mb, tt)
    dprint(debug, "count =", count)
    return score

修正箇所

-@ai_by_score
+@ai_by_mmscore
-def ai_mmdfs_tt(mb, debug=False, use_tt=False, tt=None, shortest_victory=False):
+def ai_mmdfs_all(mb, debug=False, use_tt=False, tt=None, shortest_victory=False):
    count = 0
    def mm_search(mborig, tt):
元と同じなので省略
        boardtxt = mborig.board_to_str()
-       if boardtxt in tt:
+       if use_tt and boardtxt in tt:
            return tt[boardtxt]
元と同じなので省略
-       from util import calc_same_boardtexts

-       boardtxtlist = calc_same_boardtexts(mborig)
-       for boardtxt in boardtxtlist:
-           tt[boardtxt] = score
+       if use_tt:
+           from util import calc_same_boardtexts

+           boardtxtlist = calc_same_boardtexts(mborig)
+           for boardtxt in boardtxtlist:
+               tt[boardtxt] = score
        return score
    
    if tt is None:
        tt = {}
    score = mm_search(mb, tt)
    dprint(debug, "count =", count)
-   if mb.turn == Marubatsu.CIRCLE:
-       score *= -1
    return score

先程と同様に、下記のプログラムを実行することで、実行結果から ai_mmdfs_all が先程と同じ 3 種類の場合に 強解決の AI であることが確認 できます。

Check_solved.is_strongly_solved(ai_mmdfs_all)
Check_solved.is_strongly_solved(ai_mmdfs_all, params={"use_tt":True})
Check_solved.is_strongly_solved(ai_mmdfs_all, params={"use_tt":True, "share_tt": False})

実行結果

100%|██████████| 431/431 [00:15<00:00, 28.04it/s] 
431/431 100.00%
100%|██████████| 431/431 [00:02<00:00, 171.20it/s]
431/431 100.00%
100%|██████████| 431/431 [00:05<00:00, 78.79it/s] 
431/431 100.00%

スカウト法の ai_scout と MTD(f) 法の ai_mtdf も ai_by_mmscore を利用して下記のプログラムのように 再定義 することができます。修正箇所はいずれも以下の 2 点のみなので説明は省略します。

デコレータ式を ai_by_mmscore に修正する
ミニマックス値を変換する処理を削除して常にミニマックス値を返すように修正する

ai_scoutの再定義

@ai_by_mmscore
def ai_scout(mb, debug=False, shortest_victory=False,
             init_ab=False, use_tt=False, tt=None, ai_for_mo=None,
             params={}, sort_allnodes=False, calc_count=False):        
    count = 0
    def ab_search(mborig, tt, alpha=float("-inf"), beta=float("inf")):
        nonlocal count
        count += 1
        if mborig.status == Marubatsu.CIRCLE:
            return (11 - mborig.move_count) / 2 if shortest_victory else 1
        elif mborig.status == Marubatsu.CROSS:
            return (mborig.move_count - 10) / 2 if shortest_victory else -1
        elif mborig.status == Marubatsu.DRAW:
            return 0
        
        if use_tt:
            boardtxt = mborig.board_to_str()
            if boardtxt in tt:
                lower_bound, upper_bound = tt[boardtxt]
                if lower_bound == upper_bound:
                    return lower_bound
                elif upper_bound <= alpha:
                    return upper_bound
                elif beta <= lower_bound:
                    return lower_bound
                else:
                    alpha = max(alpha, lower_bound)
                    beta = min(beta, upper_bound)
            else:
                lower_bound = min_score
                upper_bound = max_score
        
        alphaorig = alpha
        betaorig = beta

        legal_moves = mborig.calc_legal_moves()
        if ai_for_mo is not None:
            if sort_allnodes:
                score_by_move = ai_for_mo(mborig, analyze=True, **params)["score_by_move"]
                score_by_move_list = sorted(score_by_move.items(), key=lambda x:x[1], reverse=True)
                legal_moves = [x[0] for x in score_by_move_list]
            else:
                legal_moves = mborig.calc_legal_moves()
                bestmove = ai_for_mo(mborig, rand=False, **params)
                index = legal_moves.index(bestmove)
                legal_moves[0], legal_moves[index] = legal_moves[index], legal_moves[0]

        if mborig.turn == Marubatsu.CIRCLE:
            x, y = legal_moves[0]
            mb = deepcopy(mborig)
            mb.move(x, y)
            score = ab_search(mb, tt, alpha, beta)
            alpha = max(alpha, score)
            if score < beta:
                for x, y in legal_moves[1:]:
                    mb = deepcopy(mborig)
                    mb.move(x, y)
                    score = max(score, ab_search(mb, tt, alpha, alpha + 1))
                    if score >= beta:
                        break
                    elif score > alpha:
                        score = max(score, ab_search(mb, tt, alpha, beta))
                        if score >= beta:
                            break
                        alpha = max(alpha, score)
        else:
            x, y = legal_moves[0]
            mb = deepcopy(mborig)
            mb.move(x, y)
            score = ab_search(mb, tt, alpha, beta)
            beta = min(beta, score)
            if score > alpha:
                for x, y in legal_moves[1:]:
                    mb = deepcopy(mborig)
                    mb.move(x, y)
                    score = min(score, ab_search(mb, tt, beta - 1, beta))
                    if score <= alpha:
                        break
                    elif score < beta:
                        score = min(score, ab_search(mb, tt, alpha, beta))
                        if score <= alpha:
                            break
                        beta = min(beta, score)   
            
        if use_tt:
            from util import calc_same_boardtexts

            boardtxtlist = calc_same_boardtexts(mborig)
            if score <= alphaorig:
                upper_bound = score
            elif score < betaorig:
                lower_bound = score
                upper_bound = score
            else:
                lower_bound = score
            for boardtxt in boardtxtlist:
                tt[boardtxt] = (lower_bound, upper_bound)
        return score
                
    min_score = -2 if shortest_victory else -1
    max_score = 3 if shortest_victory else 1
    alpha = min_score if init_ab else float("-inf")
    beta = max_score if init_ab else float("inf")

    if tt is None:
        tt = {}
    score = ab_search(mb, tt=tt, alpha=alpha, beta=beta)
    dprint(debug, "count =", count)
    if calc_count:
        return count
    return score

ai_scoutの再定義

@ai_by_mmscore
def ai_mtdf(mb, debug=False, shortest_victory=False,
             init_ab=False, use_tt=False, tt=None, f=0, ai_for_mo=None,
             params={}, sort_allnodes=False, calc_count=False):           
    count = 0
    def ab_search(mborig, tt, alpha=float("-inf"), beta=float("inf")):
        nonlocal count
        count += 1
        if mborig.status == Marubatsu.CIRCLE:
            return (11 - mborig.move_count) / 2 if shortest_victory else 1
        elif mborig.status == Marubatsu.CROSS:
            return (mborig.move_count - 10) / 2 if shortest_victory else -1
        elif mborig.status == Marubatsu.DRAW:
            return 0
        
        if use_tt:
            boardtxt = mborig.board_to_str()
            if boardtxt in tt:
                lower_bound, upper_bound = tt[boardtxt]
                if lower_bound == upper_bound:
                    return lower_bound
                elif upper_bound <= alpha:
                    return upper_bound
                elif beta <= lower_bound:
                    return lower_bound
                else:
                    alpha = max(alpha, lower_bound)
                    beta = min(beta, upper_bound)
            else:
                lower_bound = min_score
                upper_bound = max_score
        
        alphaorig = alpha
        betaorig = beta

        legal_moves = mborig.calc_legal_moves()
        if ai_for_mo is not None:
            if sort_allnodes:
                score_by_move = ai_for_mo(mborig, analyze=True, **params)["score_by_move"]
                score_by_move_list = sorted(score_by_move.items(), key=lambda x:x[1], reverse=True)
                legal_moves = [x[0] for x in score_by_move_list]
            else:
                legal_moves = mborig.calc_legal_moves()
                bestmove = ai_for_mo(mborig, rand=False, **params)
                index = legal_moves.index(bestmove)
                legal_moves[0], legal_moves[index] = legal_moves[index], legal_moves[0]
        if mborig.turn == Marubatsu.CIRCLE:
            score = float("-inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                score = max(score, ab_search(mb, tt, alpha, beta))
                if score >= beta:
                    break
                alpha = max(alpha, score)
        else:
            score = float("inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                score = min(score, ab_search(mb, tt, alpha, beta))
                if score <= alpha:
                    break
                beta = min(beta, score)   
            
        from util import calc_same_boardtexts

        if use_tt:
            boardtxtlist = calc_same_boardtexts(mborig)
            if score <= alphaorig:
                upper_bound = score
            elif score < betaorig:
                lower_bound = score
                upper_bound = score
            else:
                lower_bound = score
            for boardtxt in boardtxtlist:
                tt[boardtxt] = (lower_bound, upper_bound)
        return score
                
    min_score = -2 if shortest_victory else -1
    max_score = 3 if shortest_victory else 1
    lbound = min_score if init_ab else float("-inf")
    ubound = max_score if init_ab else float("inf")

    if tt is None:
        tt = {}
    dprint(debug, "      count | ウィンドウ | αβ 値 |      type |    MM の範囲")
    while lbound != ubound:
        beta = f + 1 if lbound == f else f
        prevcount = count
        f = ab_search(mb, tt, alpha=beta - 1, beta=beta)
        if f >= beta:
            lbound = f
            type = "fail high"
        else:
            ubound = f
            type = "fail low "
        dprint(debug, f"{count - prevcount:5.0f}/{count:5.0f} |  ({beta - 1:2.0f}, {beta:2.0f}) |    {f:2.0f} | {type} | [{lbound:4.0f}, {ubound:4.0f}]")
 
    score = f
            
    if calc_count:
        return count
    return score

先程と同様に、下記のプログラムを実行することで、実行結果から ai_scout と ai_mtdf が先程と同じ 3 種類の場合に 強解決の AI であることが確認 できます。なお、スカウト法では最善手を推定するための関数として ai14s を利用しました。

from ai import ai14s

Check_solved.is_strongly_solved(ai_scout, params={"ai_for_mo": ai14s})
Check_solved.is_strongly_solved(ai_scout, params={"ai_for_mo": ai14s, "use_tt":True})
Check_solved.is_strongly_solved(ai_scout, params={"ai_for_mo": ai14s, "use_tt":True, "share_tt": False})
Check_solved.is_strongly_solved(ai_mtdf, params={"f": 0})
Check_solved.is_strongly_solved(ai_mtdf, params={"f": 0, "use_tt":True})
Check_solved.is_strongly_solved(ai_mtdf, params={"f": 0, "use_tt":True, "share_tt": False})

実行結果

100%|██████████| 431/431 [00:05<00:00, 81.57it/s] 
431/431 100.00%
100%|██████████| 431/431 [00:03<00:00, 112.05it/s]
431/431 100.00%
100%|██████████| 431/431 [00:05<00:00, 85.48it/s] 
431/431 100.00%
100%|██████████| 431/431 [00:04<00:00, 93.00it/s] 
431/431 100.00%
100%|██████████| 431/431 [00:01<00:00, 249.91it/s]
431/431 100.00%
100%|██████████| 431/431 [00:02<00:00, 182.14it/s]
431/431 100.00%

深さ制限探索の AI の実装

下記は 深さ制限探索アルゴリズムの再掲 です。

深さの上限 $\boldsymbol{d}$ を決める
$N$ をルートノードとした、深さが $\boldsymbol{d}$ までのノード から構成される 部分木 に対して下記の計算を行う
- リーフノード以外 のノードの評価値の計算は、ベースとなるゲーム木の探索と同じ方法 で行う
- リーフノード の評価値の計算は 静的評価関数で行う

上記から、深さ制限探索を行う関数 は 深さの上限_ と 静的評価関数 を代入する 仮引数が必要 になることがわかります。

先程再定義した ai_by_mmscore を修正 して深さ制限探索の AI を実装する場合に、深さ制限探索で利用する 静的評価関数 として ai1s ~ ai14s をそのまま 利用することはできません。その理由について少し考えてみて下さい。

その理由は以下のように、@ai_by_mmscore と @ai_by_score がラップする評価関数が計算する 評価値の性質が異なる からです。

@ai_by_mmscore がラップする 評価関数 は評価値として、先手である 〇の手番が有利になるほど大きな値 である ミニマックス値を計算 する
@ai_by_score がラップする 評価関数 は評価値として、直前の手番のプレイヤーが有利になるほど大きな値 を計算する

`ai_by_score` の修正

この問題を解決するためには、@ai_by_score を利用して定義された関数が calc_score=True を記述して呼び出された場合に、ミニマックス値を計算できる ように修正する必要があります。そこで ai_by_score のラッパー関数 wrapper を下記のプログラムのように、True が代入 された場合に ミニマックス値を返す仮引数を追加 するように修正することにします。

なお、先程追加した share_tt に関する処理があるとエラーが発生するので、下記はそれらの処理を削除したプログラムに対する修正です。

3 行目：ラッパー関数にデフォルト値を False とする仮引数 minimax を追加する
5 ~ 8 行目：mb_orig の局面の評価値を計算して返す処理を行う際に、minimax が True の場合 は先手の〇の手番の場合に評価値の符号を反転させることで、常にミニマックス値を返すように修正 する

1  def ai_by_score(eval_func):
2      @wraps(eval_func)
3      def wrapper(mb_orig, debug=False, *args, rand=True,
4                  analyze=False, calc_score=False, minimax=False, **kwargs):
5          if calc_score:
6              score = eval_func(mb_orig, debug, *args, **kwargs)
7              if minimax and mb_orig.turn == Marubatsu.CIRCLE:
8                  score *= -1
9              return score
元と同じなので省略

行番号のないプログラム

def ai_by_score(eval_func):
    @wraps(eval_func)
    def wrapper(mb_orig, debug=False, *args, rand=True,
                analyze=False, calc_score=False, minimax=False, **kwargs):
        if calc_score:
            score = eval_func(mb_orig, debug, *args, **kwargs)
            if minimax and mb_orig.turn == Marubatsu.CIRCLE:
                score *= -1
            return score

        dprint(debug, "Start ai_by_score")
        dprint(debug, mb_orig)
        legal_moves = mb_orig.calc_legal_moves()
        dprint(debug, "legal_moves", legal_moves)
        best_score = float("-inf")
        best_moves = []
        if analyze:
            score_by_move = {}
        for move in legal_moves:
            dprint(debug, "=" * 20)
            dprint(debug, "move", move)
            mb = deepcopy(mb_orig)
            x, y = move
            mb.move(x, y)
            dprint(debug, mb)
            score = eval_func(mb, debug, *args, **kwargs)
            dprint(debug, "score", score, "best score", best_score)
            if analyze:
                score_by_move[move] = score
            
            if best_score < score:
                best_score = score
                best_moves = [move]
                dprint(debug, "UPDATE")
                dprint(debug, "  best score", best_score)
                dprint(debug, "  best moves", best_moves)
            elif best_score == score:
                best_moves.append(move)
                dprint(debug, "APPEND")
                dprint(debug, "  best moves", best_moves)

        dprint(debug, "=" * 20)
        dprint(debug, "Finished")
        dprint(debug, "best score", best_score)
        dprint(debug, "best moves", best_moves)
        if analyze:
            return {
                "candidate": best_moves,
                "score_by_move": score_by_move,
            }
        elif rand:   
            return choice(best_moves)
        else:
            return best_moves[0]
        
    return wrapper

修正箇所

def ai_by_score(eval_func):
    @wraps(eval_func)
    def wrapper(mb_orig, debug=False, *args, rand=True,
-               analyze=False, calc_score=False, **kwargs):
+               analyze=False, calc_score=False, minimax=False, **kwargs):
        if calc_score:
-           return eval_func(mb_orig, debug, *args, **kwargs)
+           score = eval_func(mb_orig, debug, *args, **kwargs)
+           if minimax and mb_orig.turn == Marubatsu.CIRCLE:
+               score *= -1
+           return score
元と同じなので省略

`ai14s` の再定義

上記の修正後に、修正した ai_by_score を使って下記のプログラムのように ai14s を再定義 します。

ai14sの再定義

from pprint import pprint
from ai import Markpat

@ai_by_score
def ai14s(mb, debug=False, score_victory=300, score_sure_victory=200,
          score_defeat=-100, score_special=100, score_201=2,
          score_102=0.5, score_012=-1):
    # 評価値の合計を計算する変数を 0 で初期化する
    score = 0        

    # 自分が勝利している場合
    if mb.status == mb.last_turn:
        return score_victory

    markpats = mb.count_markpats()
    if debug:
        pprint(markpats)
    # 相手が勝利できる場合は評価値を加算する
    if markpats[Markpat(last_turn=0, turn=2, empty=1)] > 0:
        score = score_defeat * markpats[Markpat(last_turn=0, turn=2, empty=1)]
    # 次の自分の手番で自分が必ず勝利できる場合
    elif markpats[Markpat(last_turn=2, turn=0, empty=1)] >= 2:
        return score_sure_victory
    
    # 斜め方向に 〇×〇 が並び、いずれかの辺の 1 つのマスのみに × が配置されている場合
    if mb.board[1][1] == Marubatsu.CROSS and \
        (mb.board[0][0] == mb.board[2][2] == Marubatsu.CIRCLE or \
        mb.board[2][0] == mb.board[0][2] == Marubatsu.CIRCLE) and \
        (mb.board[1][0] == Marubatsu.CROSS or \
        mb.board[0][1] == Marubatsu.CROSS or \
        mb.board[2][1] == Marubatsu.CROSS or \
        mb.board[1][2] == Marubatsu.CROSS) and \
        mb.move_count == 4:
        return score_special    

    # 次の自分の手番で自分が勝利できる場合は評価値に score_201 を加算する
    if markpats[Markpat(last_turn=2, turn=0, empty=1)] == 1:
        score += score_201
    # 「自 1 敵 0 空 2」1 つあたり score_102 だけ、評価値を加算する
    score += markpats[Markpat(last_turn=1, turn=0, empty=2)] * score_102
    # 「自 0 敵 1 空 2」1 つあたり score_201 だけ、評価値を減算する
    score += markpats[Markpat(last_turn=0, turn=1, empty=2)] * score_012
    
    # 計算した評価値を返す
    return score

上記の修正によって、ai14s がキーワード引数 minimax=True を記述することで 先手の〇の手番の場合に評価値の符号が反転する ようになったことを確認することにします。

ゲーム開始時 の 〇の手番 に対しては ai14s は 評価値として 0 を返しますが、下記のプログラムのように minimax=True を記述 すると、-0.0 のように符号が反転する ことが確認できます。なお、表示は異なりますが、0.0 と -0.0 はどちらも 0 で 同じ数値 を表します。

print(ai14s(mb, calc_score=True))
print(ai14s(mb, calc_score=True, minimax=True))

実行結果

0.0
-0.0

下記は (1, 1) を着手した × の手番 の場合で、実行結果から どちらも同じ評価値が計算 されることが確認できます。

mb.move(1, 1)
print(ai14s(mb, calc_score=True))
print(ai14s(mb, calc_score=True, minimax=True))

実行結果

2.0
2.0

念のためさらに (0, 0) に着手した 〇の手番 の場合を確認すると、実行結果から 評価値の符号が反転する ことが確認できます。

mb.move(0, 0)
print(ai14s(mb, calc_score=True))
print(ai14s(mb, calc_score=True, minimax=True))

実行結果

2.0
-2.0

`ai_abs_dls` の定義

深さ制限探索 を行う AI として、αβ 法をベース とした深さ制限探索で着手を選択する ai_abs_dls という関数を ai_abs_all を修正 して定義することにします。なお、名前の dls は深さ制限探索を表す depth limited search の略です。

ai_abs_all との違い は以下の通りです。

深さの上限を代入 する仮引数 maxdepth を追加 する
静的評価関数 と、静的評価関数が必要とする パラメータ を代入する仮引数 eval_func、eval_params を追加 する
評価値は静的評価関数で計算するため shortest_victory は利用できないので削除 する
評価値の最小値と最大値 は 静的評価関数によって異なる ため α 値と β 値の初期値を評価値の最小値と最大値に設定することはできないので init_ab を削除 する
深さ制限探索では move ordering はまだ説明していない反復深化という手法を利用して行うのが一般的なので、move ordering に関する処理は削除 する

下記は ai_abs_dls の定義です。

1 行目：デコレータ式を ai_by_mmscore に修正する
2、3 行目：関数名を ai_abs_dls に修正し、仮引数 maxdepth、eval_func、eval_params を追加し、仮引数 shortest_victory、init_ab、move ordering に関する仮引数を削除する。なお、追加する仮引数は仮引数 debug の後に記述する必要があるのでデフォルト引数とした
5 行目：mb をルートノードとした部分木での、ab_search で評価値を計算する mborig の深さを代入する仮引数 depth を追加する
8、9 行目：決着がついているか、mborig の深さを表す depth が深さの上限を表す maxdepth 以上の場合 に静的評価関数 eval_func の キーワード引数に calc_score=True を記述 して呼び出すことで mborig の局面の評価値を計算 する
12、14 行目：ab_search の再帰呼び出し によって 深さが 1 増えた子ノードの評価値を計算 するので、深さを表す実引数に depth + 1 を記述 するように修正する
15、16 行目：評価値の下界と上界を表す変数に負の無限大と正の無限大を代入するように修正する
20 行目：mb のミニマックス値を計算 する処理を行う際に、mb の深さを表す実引数に 0 を記述し、α 値と β 値の初期値に評価値の下界と上界を記述するように修正する
move ordering に関する処理を削除する

 1  @ai_by_mmscore
 2  def ai_abs_dls(mb, debug=False, maxdepth=1, eval_func=None, eval_params={},
 3                 use_tt=False, tt=None, calc_count=False):           
 4      count = 0
 5      def ab_search(mborig, depth, tt, alpha=float("-inf"), beta=float("inf")):
 6          nonlocal count
 7          count += 1
 8          if mborig.status != Marubatsu.PLAYING or depth == maxdepth:
 9              return eval_func(mborig, calc_score=True, **eval_params)
元と同じなので省略
10          legal_moves = mborig.calc_legal_moves()
11          if mborig.turn == Marubatsu.CIRCLE:
元と同じなので省略
12                  score = max(score, ab_search(mb, depth + 1, tt, alpha, beta))
元と同じなので省略
13          else:
元と同じなので省略
14                  score = min(score, ab_search(mb, depth + 1, tt, alpha, beta))
元と同じなので省略
15      min_score = float("-inf")
16      max_score = float("inf")
17  
18      if tt is None:
19          tt = {}
20      score = ab_search(mb, depth=0, tt=tt, alpha=min_score, beta=max_score)
21      dprint(debug, "count =", count)
22      if calc_count:
23          return count
24      return score

行番号のないプログラム

@ai_by_mmscore
def ai_abs_dls(mb, debug=False, maxdepth=1, eval_func=None, eval_params={},
               use_tt=False, tt=None, calc_count=False):           
    count = 0
    def ab_search(mborig, depth, tt, alpha=float("-inf"), beta=float("inf")):
        nonlocal count
        count += 1
        if mborig.status != Marubatsu.PLAYING or depth == maxdepth:
            return eval_func(mborig, calc_score=True, **eval_params)
        
        if use_tt:
            boardtxt = mborig.board_to_str()
            if boardtxt in tt:
                lower_bound, upper_bound = tt[boardtxt]
                if lower_bound == upper_bound:
                    return lower_bound
                elif upper_bound <= alpha:
                    return upper_bound
                elif beta <= lower_bound:
                    return lower_bound
                else:
                    alpha = max(alpha, lower_bound)
                    beta = min(beta, upper_bound)
            else:
                lower_bound = min_score
                upper_bound = max_score
        
        alphaorig = alpha
        betaorig = beta

        legal_moves = mborig.calc_legal_moves()
        if mborig.turn == Marubatsu.CIRCLE:
            score = float("-inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                score = max(score, ab_search(mb, depth + 1, tt, alpha, beta))
                if score >= beta:
                    break
                alpha = max(alpha, score)
        else:
            score = float("inf")
            for x, y in legal_moves:
                mb = deepcopy(mborig)
                mb.move(x, y)
                score = min(score, ab_search(mb, depth + 1, tt, alpha, beta))
                if score <= alpha:
                    break
                beta = min(beta, score)   
            
        from util import calc_same_boardtexts

        if use_tt:
            boardtxtlist = calc_same_boardtexts(mborig)
            if score <= alphaorig:
                upper_bound = score
            elif score < betaorig:
                lower_bound = score
                upper_bound = score
            else:
                lower_bound = score
            for boardtxt in boardtxtlist:
                tt[boardtxt] = (lower_bound, upper_bound)
        return score
                
    min_score = float("-inf")
    max_score = float("inf")

    if tt is None:
        tt = {}
    score = ab_search(mb, depth=0, tt=tt, alpha=min_score, beta=max_score)
    dprint(debug, "count =", count)
    if calc_count:
        return count
    return score

修正箇所

-@ai_by_score
+@ai_by_mmscore
-def ai_abs_all(mb, debug=False, shortest_victory=False, init_ab=False,
-               use_tt=False, tt=None, ai_for_mo=None, params={},
-               sort_allnodes=False, calc_count=False):
+def ai_abs_dls(mb, debug, maxdepth=1, eval_func=None, eval_params={},
+              use_tt=False, tt=None, calc_count=False):           
    count = 0
-   def ab_search(mborig, tt, alpha=float("-inf"), beta=float("inf")):
+   def ab_search(mborig, depth, tt, alpha=float("-inf"), beta=float("inf")):
        nonlocal count
        count += 1
-       if mborig.status == Marubatsu.CIRCLE:
-           return (11 - mborig.move_count) / 2 if shortest_victory else 1
-       elif mborig.status == Marubatsu.CROSS:
-           return (mborig.move_count - 10) / 2 if shortest_victory else -1
-       elif mborig.status == Marubatsu.DRAW:
-           return 0
+       if mborig.status != Marubatsu.PLAYING or depth == maxdepth:
+           return eval_func(mborig, calc_score=True, **eval_params)
元と同じなので省略
        legal_moves = mborig.calc_legal_moves()
-       if ai_for_mo is not None:
-           if sort_allnodes:
-               score_by_move = ai_for_mo(mborig, analyze=True, **params)["score_by_move"]
-               score_by_move_list = sorted(score_by_move.items(), key=lambda x:x[1], reverse=True)
-               legal_moves = [x[0] for x in score_by_move_list]
-           else:
-               legal_moves = mborig.calc_legal_moves()
-               bestmove = ai_for_mo(mborig, rand=False, **params)
-               index = legal_moves.index(bestmove)
-               legal_moves[0], legal_moves[index] = legal_moves[index], legal_moves[0]
        if mborig.turn == Marubatsu.CIRCLE:
元と同じなので省略
-               score = max(score, ab_search(mb, tt, alpha, beta))
+               score = max(score, ab_search(mb, depth + 1, tt, alpha, beta))
元と同じなので省略
        else:
元と同じなので省略
-               score = min(score, ab_search(mb, tt, alpha, beta))
+               score = min(score, ab_search(mb, depth + 1, tt, alpha, beta))
元と同じなので省略
-   min_score = -2 if shortest_victory else -1
-   max_score = 3 if shortest_victory else 1
+   min_score = float("-inf")
+   max_score = float("inf")
-   alpha = min_score if init_ab else float("-inf")
-   beta = max_score if init_ab else float("inf")

    if tt is None:
        tt = {}
-   score = ab_search(mb, tt=tt, alpha=alpha, beta=beta)
+   score = ab_search(mb, depth=0, tt=tt, alpha=min_score, beta=max_score)
    dprint(debug, "count =", count)
    if calc_count:
        return count
    return score

`@ai_by_mmscore` を利用して定義した関数の処理の注意点

先程説明した、深さ制限探索で着手を選択 する場合の 深さの上限の性質 は @ai_by_mmscore を利用して 定義した関数にも当てはまります。

そのため、@ai_by_mmscore を利用して定義された関数 はの 仮引数 calc_score に代入された値によって、仮引数 maxdepth に代入された 深さの上限を表す数値の意味 が下記のように 異なる点に注意して下さい。

@ai_by_mmscore を利用して定義された関数 は、仮引数 calc_score に代入された値によって下記のような 異なる処理を行う。

calc_score=True の場合は、ルートノードから 深さ maxdepth まで のノードの探索によって ルートノードのミニマックス値を計算 する。

calc_score=False の場合 は、下記の理由からルートノードから 深さ maxdepth + 1 まで のノードの探索によって 最善手を計算 する。

calc_score=False の場合は、ルートノードの 子ノードの評価値 を深さ制限探索で 計算する際 の 深さの上限が maxdepth となる
ルートノードの 子ノードの深さは 1 なので、深さが maxdepth + 1 まで のノードの探索が行われる

動作の確認

上記の性質から、〇× ゲーム木 のゲーム木の 最大の深さは 9 なので、ai_abs_dls で ゲーム開始時 の最善手を計算する際に 設定できる深さの上限は 0 ~ 8 となります。なお、深さの上限として 9 以上の値 を設定して計算することもできますが、深さが 10 のノードは存在しないのでその場合は 深さの上限を 8 と設定した場合と同じ処理 が行われます。

キーワード引数 calc_score=True を記述して呼び出した場合は、深さの上限として 0 ~ 9 を設定でき、10 以上を設定した場合は 9 を設定した場合と同じ処理を行います。

下記のプログラムで、ゲーム開始時の局面 に対して 深さの上限を 0 ~ 8、静的評価関数 を ai14s とした場合の ai_abs_dls の動作を確認 することにします。なお、@ai_by_score を利用して定義 された ai14s を 静的関数として利用する場合 は 呼び出す際のパラメータ に {"minimax": True} を記述して ミニマックス値を計算する必要がある ことを忘れないように注意して下さい。なお、mb に対して先程着手を行ったので 1 行目の mb.restart() でゲーム開始時の局面に戻す処理を行っています。

mb.restart()
for maxdepth in range(9):
    print(maxdepth, ai_abs_dls(mb, maxdepth=maxdepth, eval_func=ai14s,
          eval_params={"minimax": True}))

実行結果

0 (1, 1)
1 (1, 1)
2 (1, 1)
3 (1, 1)
4 (1, 1)
5 (1, 1)
6 (1, 1)
7 (0, 2)
8 (0, 2)

実行結果から 深さの上限が 0 ~ 6 まで は (1, 1) のみ が、7 と 8 の場合は それ以外の合法手も選択されること が確認できました。

なお、深さが 7 と 8 の場合は下記のプログラムのようにそれぞれ 10 回実行することで、ランダムな合法手が選択 されることがわかります。

for maxdepth in range(7, 9):
    for _ in range(10):
        print(maxdepth, ai_abs_dls(mb, maxdepth=maxdepth, eval_func=ai14s, eval_params={"minimax": True}))

実行結果

7 (2, 2)
7 (0, 1)
7 (0, 1)
7 (0, 2)
7 (1, 0)
7 (1, 0)
7 (2, 0)
7 (0, 0)
7 (2, 1)
7 (2, 0)
8 (0, 2)
8 (1, 2)
8 (2, 1)
8 (1, 0)
8 (2, 1)
8 (2, 2)
8 (0, 0)
8 (1, 1)
8 (1, 2)
8 (1, 0)

深さ制限探索で深さを増やすとどのような処理が行われるかについては次回の記事で説明します。深さが 7 と 8 の場合に上記のように異なる合法手が選択される理由についても次回の記事で説明するので、余裕がある方はその理由について少し考えてみて下さい。

なお、ミニマックス法、スカウト法、mtd(f) をベースとした深さ制限探索を行う関数も同様の方法で定義できますが、長くなったので次回以降の記事で定義することにします。

今回の記事のまとめ

今回の記事ではミニマックス値を計算する評価関数を @ai_by_score でラップする場合の問題点を説明し、その問題を解決する ai_by_mmscore を定義しました。また、それを利用してこれまでに作成したミニマックス値を計算する AI を再定義しました。

次に、@ai_by_mmscore を利用して αβ 法をベースとした深さ制限探索によって着手を選択する ai_abs_dls を定義しました。次回の記事では定義した ai_abs_dls が行う処理の性質について説明します。

本記事で入力したプログラム

リンク	説明
marubatsu.ipynb	本記事で入力して実行した JupyterLab のファイル
ai.py	本記事で更新した ai_new.py

次回の記事

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Pythonで〇×ゲームのAIを一から作成する その173 ai_by_mmscore の定義と深さ制限探索による AI の実装

目次と前回の記事

これまでに作成したモジュール

深さ制限探索による AI の定義

本記事で利用する用語

@ai_by_score を利用した深さ制限探索の実装方法

深さ制限探索で着手を選択する場合の深さの上限の性質

ai_by_mmscore の定義

@ai_by_score の問題点

置換表の共有の有無を表す仮引数の追加

@ai_by_score の修正がうまくいく例

@ai_by_score の修正がうまくいかない例

ai_by_mmscore の定義

ai_abs_all の再定義

ai_mmdfs_all の定義と ai_scout、ai_mtdf の再定義

深さ制限探索の AI の実装

ai_by_score の修正

ai14s の再定義

ai_abs_dls の定義

@ai_by_mmscore を利用して定義した関数の処理の注意点

動作の確認

今回の記事のまとめ

本記事で入力したプログラム

次回の記事

Pythonで〇×ゲームのAIを一から作成する　その173　ai_by_mmscore の定義と深さ制限探索による AI の実装

`@ai_by_score` を利用した深さ制限探索の実装方法

`ai_by_mmscore` の定義

`@ai_by_score` の問題点

`@ai_by_score` の修正がうまくいく例

`@ai_by_score` の修正がうまくいかない例

`ai_by_mmscore` の定義

`ai_abs_all` の再定義

`ai_mmdfs_all` の定義と `ai_scout`、`ai_mtdf` の再定義

`ai_by_score` の修正

`ai14s` の再定義

`ai_abs_dls` の定義

`@ai_by_mmscore` を利用して定義した関数の処理の注意点