More than 3 years have passed since last update.

コンピュータとオセロ対戦11　～機械学習、方針決定～

Last updated at 2021-10-02Posted at 2021-10-01

前回

今回の目標

sklearnを使い、機械学習対戦オセロを作りたいと思います。
いくつかの方針を考えましたが、いずれも手順としては

教師データを作成する
教師データから学習する
正解率を確認し、方針の評価をする

という三段階で行いました。
具体的な説明は後ほど行います。

ここから本編

方針1　1hand_2eslect

この方法は上手くいきませんでした。

オセロには様々な状況があり、例えば今の状態から置ける場所も全くなかったり一通りあったり二通りあったり三通りあったりします。
そこで、常に「一番多くひっくりかえせる場所」「二番目に多くひっくりかえせる場所」のどちらかにしか置かないと仮定し以下のアルゴリズムを考えました。またこの時、対戦相手が異常に弱かったり強かったりした場合、正確なデータが取れないと思ったので相手側も同じように打ち返してきます。

次の手で、「一番多くひっくりかえせる場所」「二番目に多くひっくりかえせる場所」を探す。
そのうちどちらかにランダムで置く。
たまたま勝った時のデータを記録しておく。
機械学習。

ここでデータとは、具体的には

現在のスコア
一番多くひっくりかえせる場所に置いた時のスコア
二番目に多くひっくりかえせる場所に置いた時のスコア
「一番多くひっくりかえせる場所」と「二番目に多くひっくりかえせる場所」、どちらに置いたか

の四つです。ここでスコアとは、「自分の駒の数 - 相手の駒の数」のことです。
たまたま勝てた時のデータを集めるのだから、何か特徴が抽出できるはずです。

ここで、なぜこんな方法をとろうとしているか説明します。
前述したとおり、オセロにおいて、とれる手数の数は一定ではありませんし、前のターンでは「置ける」と判断されていた場所でも次の自分のターンでは置けなくなる、という状況だってあります。
この機械学習モデルは当然後々実装するわけなので、「置けない」ところに「置け！」と指示されても困るのです。
そのため、先に選択肢を与え、「どれを選びますか？」という方式をとりました。

データづくり

方針1で説明した1～3の手順の部分です。
まずはデータを作らなければいけません。
Pythonでは実行に時間がかかると思ったのでC言語で。

mainの上に書くやつ

あまり今までと変わらないです。put関数やcheck関数が見やすくなりましたが、本題ではないうえProcessing編で解説しているのでここでは触れません。
NUMは試行回数で、今回の場合1000回勝つまで繰り返します。

data.c

# include <stdio.h>
# include <stdlib.h>
# include <time.h>
# include <stdbool.h>

# define NONE 0
# define WHITE 1
# define BLACK 2
# define SIZE 8
# define PLAYER BLACK
// #define PLAYER WHITE
# define NUM 1000

void setup(void);
void think(int num);
void put(int line, int col, char now[SIZE][SIZE]);
void printb(void);
void copy(char parent[SIZE][SIZE], char child[SIZE][SIZE]);
void print(FILE * fp);
bool check(int line, int col);
bool check_all(void);
bool win(void);
int cal_score(char now[SIZE][SIZE]);

char board[SIZE][SIZE] = {NONE};
char score_arr[SIZE * SIZE - 4][3] = {0};
char point[SIZE * SIZE - 4] = {0};
int turn;

copy

文字通りコピー関数。

data.c

void copy(char parent[SIZE][SIZE], char child[SIZE][SIZE]) {
    for (int i = 0; i < SIZE; i++)
        for (int j = 0; j < SIZE; j++)
            child[i][j] = parent[i][j];
}

cal_score

文字通りスコアを計算する関数。

data.c

int cal_score(char now[SIZE][SIZE]){
    int my = turn, opp = 3 - turn;
    int score = 0;

    for (int i = 0; i < SIZE; i++)
        for (int j = 0; j < SIZE; j++){
            if (now[i][j] == my) score++;
            else if (now[i][j] == opp) score--;
        }
    
    return score;
}

think

この関数の中で、方針1で説明した手順

次の手で、「一番多くひっくりかえせる場所」「二番目に多くひっくりかえせる場所」を探す。
そのうちどちらかにランダムで置く。
たまたま勝った時のデータを記録しておく。
機械学習。

のうち1～3までをしています。

data.c

void think(int num){
    char line[2], col[2];
    char board_leaf[SIZE][SIZE];
    int score, top_score = -SIZE * SIZE, sec_score = top_score;
    int i, j;
    bool one = false, two = false;

    score_arr[num][0] = cal_score(board);

    for (i = 0; i < SIZE; i++){
        for (j = 0; j < SIZE; j++){
            if (check(i, j)){
                copy(board, board_leaf);
                put(i, j, board_leaf);
                score = cal_score(board_leaf);
                if (score > top_score){
                    top_score = score;
                    line[1] = line[0];
                    col[1] = col[0];
                    line[0] = i, col[0] = j;
                    score_arr[num][2] = score_arr[num][1];
                    score_arr[num][1] = score;
                    one = true;
                }else if (score > sec_score){
                    sec_score = score;
                    line[1] = i, col[1] = j;
                    score_arr[num][2] = score;
                    two = true;
                }
            }
        }
    }

    if (one && !two) {
        for (i = 0; i < 3; i++)
            score_arr[num][i] = 0;
        point[num] = 0;
        put(line[0], col[0], board);
    }else if (two){
        int place = rand() % 2;
        put(line[place], col[place], board);
        point[num] = place;
    }

    score_arr[num][3] = turn;
}

print

データの記録。

data.c

void print(FILE * fp){
    int my = 3 - PLAYER;

    for (int i = 0; i < SIZE * SIZE - 4; i++){
        if (score_arr[i][3] == my){
            fprintf(fp, "%3d%3d%3d", score_arr[i][0], score_arr[i][1], score_arr[i][2]);
            fprintf(fp, "%d\n", point[i]);
        }
    }
}

main

特筆すべきことはないですが一応。

data.c

void main(void) {
    double start = clock();

    int turn_num;
    bool can, old_can;
    FILE * fp;
    if (PLAYER == BLACK) fp = fopen("whitedata.txt", "w");
    else fp = fopen("blackdata.txt", "w");

    if (fp == NULL){
        printf("text file error\n");
        exit(-1);
    }

    for (int num = 0; num < NUM; num++){
        
        setup();
        srand((unsigned int)time(NULL) + num);

        while (!win()){
            turn = BLACK;
            can = true, old_can = true;
            turn_num = 0;
            setup();
            // printb();
            while ((can = check_all()) || old_can){
                if (can){
                    think(turn_num);
                    // printb();
                    turn_num++;
                }
                old_can = can;
                turn = 3 - turn;
            }
        }

        if (turn_num == SIZE * SIZE - 4)
          print(fp);
    }

    fclose(fp);
    printf("実行時間: %.4lf[s]\n", (clock() - start) / 1000000);
}

機械学習

本題である学習は、sklearnを使いPythonで行いました。
どの方法が適しているか調べる意味でも様々な方法を使いました。

read.py

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Ridge
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import LinearSVC

x = []
y = []

with open("whitedata.txt", "r") as data:
    for num in data:
        x.append([])
        x[-1].append(int(num[0:3]))
        x[-1].append(int(num[3:6]))
        x[-1].append(int(num[6:9]))
        y.append(int(num[-2]))
        if all([x[-1][i] == 0 for i in range(3)]):
            del x[-1]
            del y[-1]
        
# print(x)
# print(y)

x_train, x_test, y_train, y_test = train_test_split(x, y,
                                                    test_size=0.3,
                                                    random_state=0)

model = LinearRegression()
model.fit(x_train, y_train)
print("LinearRegression:")
print("train score:\t", model.score(x_train, y_train))
print("test score:\t", model.score(x_test, y_test), "\n")

""" 省略 """

model = LinearSVC()
model.fit(x_train, y_train)
print("LinearSVC:")
print("train score:\t", model.score(x_train, y_train))
print("test score:\t", model.score(x_test, y_test))

学習結果

正解率がおよそ0.5が多いです。一見1より大きい正解率が出ていますが、よく見ると末尾に「e-07」などがついているので大丈夫です。
今回、0か1かを当てる問題だったので、ほぼ学習していないことがわかります。
方針1に書いた

現在のスコア
一番多くひっくりかえせる場所に置いた時のスコア
二番目に多くひっくりかえせる場所に置いた時のスコア
「一番多くひっくりかえせる場所」と「二番目に多くひっくりかえせる場所」、どちらに置いたか

の四つのデータには相関関係はありませんでした。

LinearRegression:
train score:	 0.0002736707737531763
test score:	 -0.0007700840363100703 

LogisticRegression:
train score:	 0.5044421020930583
test score:	 0.49174376390678065 

Ridge:
train score:	 0.00027367077373408044
test score:	 -0.0007700758839097599 

DecisionTreeClassifier:
train score:	 0.5234653415650253
test score:	 0.4985361283522661 

KNeighborsClassifier:
train score:	 0.5564924961100236
test score:	 0.5024007495022836 

LinearSVC:
train score:	 0.4963609898107715
test score:	 0.5072022485068509

少し変えてみる 1hand_3select

アルゴリズムを以下のように変更し、実行してみました(プログラムは省略)。

次の手で、「一番多くひっくりかえせる場所」「二番目に多くひっくりかえせる場所」「三番目に多くひっくりかえせる場所」を探す。
そのうちどれかにランダムで置く。
たまたま勝った時のデータを記録しておく。
機械学習。

ここでデータとは、

現在のスコア
一番多くひっくりかえせる場所に置いた時のスコア
二番目に多くひっくりかえせる場所に置いた時のスコア
三番目に多くひっくりかえせる場所に置いた時のスコア
「一番多くひっくりかえせる場所」と「二番目に多くひっくりかえせる場所」と「三番目に多くひっくりかえせる場所」、どこに置いたか

ここでスコアとは、「自分の駒の数 - 相手の駒の数」のこと。

学習結果

正解率はおよそ0.33となり、あまり学習しませんでした。
やはり勝利するにおいて「現在のスコア」「たくさんひっくりかえせた時のスコア」と、「何番目に多くひっくりかえせるか」の間には特に関係がないようです。
マイナスはよく分かりません。

LinearRegression:
train score:	 0.001085326957450894
test score:	 -0.0031733477118673914 

LogisticRegression:
train score:	 0.35084033613445376
test score:	 0.3735294117647059 

Ridge:
train score:	 0.0010853269328233717
test score:	 -0.0031728892849689494 

DecisionTreeClassifier:
train score:	 0.446218487394958
test score:	 0.34215686274509804 

KNeighborsClassifier:
train score:	 0.39915966386554624
test score:	 0.3362745098039216 

LinearSVC:
train score:	 0.35252100840336137
test score:	 0.37549019607843137

さらに変えてみる 1hand_3select_n

プログラムは省略しますが、上に加えて現在のターン数も参考にして学習させてみました。
何の成果も得られませんでした。

LinearRegression:
train score:	 0.0008553690780881418
test score:	 -0.0009434933282730373 

LogisticRegression:
train score:	 0.3441234792861354
test score:	 0.3303738902088283 

Ridge:
train score:	 0.0008553690780020995
test score:	 -0.0009434910964263299 

DecisionTreeClassifier:
train score:	 0.38308591028458117
test score:	 0.3294985619607353 

KNeighborsClassifier:
train score:	 0.4714614931132429
test score:	 0.343878954607978 

LinearSVC:
train score:	 0.3390856959108205
test score:	 0.33324996873827684

方針2 3history

この方法も上手くいきませんでした。

次に考えたのは、過去のスコアから次に置く場所を決定するアルゴリズムです。
具体的には、過去三回のスコアから判別します。

データづくり

やっていることとしては上の説明を読めば分かると思うので、プログラムの説明はしません。

think

data.c

void think(int num){
    char line[3], col[3];
    char board_leaf[SIZE][SIZE];
    int score, top_score = -SIZE * SIZE, sec_score = top_score;
    int thr_score = top_score;
    int i, j;
    bool one = false, two = false, thr = false;

    for (i = 0; i < SIZE; i++){
        for (j = 0; j < SIZE; j++){
            if (check(i, j)){
                copy(board, board_leaf);
                put(i, j, board_leaf);
                score = cal_score(board_leaf);
                if (score > top_score){
                    top_score = score;
                    line[1] = line[0];
                    col[1] = col[0];
                    line[0] = i, col[0] = j;
                    one = true;
                }else if (score > sec_score){
                    sec_score = score;
                    line[1] = i, col[1] = j;
                    two = true;
                }else if (score > thr_score){
                    thr_score = score;
                    line[2] = i, col[2] = j;
                    thr = true;
                }
            }
        }
    }

    if (one && !two) {
        point[num] = -1;
        put(line[0], col[0], board);
    }else if (one && two && !thr){
        point[num] = -1;
        int place = rand() % 2;
        put(line[place], col[place], board);
    }else if (thr){
        int place = rand() % 3;
        put(line[place], col[place], board);
        point[num] = place;
    }

    score_his[num][3] = turn;
}

print

data.c

void print(FILE * fp){
    int my = 3 - PLAYER;

    for (int i = 0; i < SIZE * SIZE - 4; i++){
        if (score_his[i][3] == my){
            // fprintf(fp, "%3d", i);
            for (int j = 0; j < 3; j++)
                fprintf(fp, "%3d", score_his[i][j]);
            fprintf(fp, "%d\n", point[i]);
        }
    }
}

main

data.c

void main(void) {
    double start = clock();

    int turn_num;
    bool can, old_can;
    FILE * fp;
    if (PLAYER == BLACK) fp = fopen("whitedata.txt", "w");
    else fp = fopen("blackdata.txt", "w");

    if (fp == NULL){
        printf("text file error\n");
        exit(-1);
    }

    for (int num = 0; num < NUM; num++){
        
        setup();
        srand((unsigned int)time(NULL) + num);

        while (!win()){
            turn = BLACK;
            can = true, old_can = true;
            turn_num = 0;
            setup();
            // printb();
            while ((can = check_all()) || old_can){
                if (can){
                    if (turn_num)
                        for (int i = 0; i < 3; i++)
                            score_his[turn_num][i] = score_his[turn_num - 1][i];
                    for (int i = 0; i < 2; i++)
                        score_his[turn_num][i] = score_his[turn_num][i + 1];
                    score_his[turn_num][2] = cal_score(board);
                    think(turn_num);
                    // printb();
                    turn_num++;
                }
                old_can = can;
                turn = 3 - turn;
            }
        }

        if (turn_num == SIZE * SIZE - 4)
          print(fp);
    }

    fclose(fp);
    printf("実行時間: %.4lf[s]\n", (clock() - start) / 1000000);
}

機械学習

read.py

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Ridge
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import LinearSVC

x = []
y = []

with open("whitedata.txt", "r") as data:
    for num in data:
        x.append([])
        for i in range(3):
            x[-1].append(int(num[0:3]))
            num = num[3:]
        y.append(int(num.strip("\n")))
        if y[-1] == -1:
            del x[-1]
            del y[-1]
        
# print(x)
# print(y)

x_train, x_test, y_train, y_test = train_test_split(x, y,
                                                    test_size=0.3,
                                                    random_state=0)

model = LinearRegression()
model.fit(x_train, y_train)
print("LinearRegression:")
print("train score:\t", model.score(x_train, y_train))
print("test score:\t", model.score(x_test, y_test), "\n")

""" 省略 """

model = LinearSVC()
model.fit(x_train, y_train)
print("LinearSVC:")
print("train score:\t", model.score(x_train, y_train))
print("test score:\t", model.score(x_test, y_test))

学習結果

何の成果も得られませんでした。

LinearRegression:
train score:	 0.00017720325419889882
test score:	 -7.086520582322287e-05 

LogisticRegression:
train score:	 0.3408530909489131
test score:	 0.3386561062851303 

Ridge:
train score:	 0.00017720325419523508
test score:	 -7.08642913171964e-05 

DecisionTreeClassifier:
train score:	 0.35328259322126704
test score:	 0.3409555442003066 

KNeighborsClassifier:
train score:	 0.4045337567759952
test score:	 0.33891159938681653 

LinearSVC:
train score:	 0.3346657175710453
test score:	 0.34006131834440473

方針3　3history_n

この方法も上手くいきませんでした。

今度は、「今までに選んだ1～3番目の記録」から、「今から1～3番目にひっくりかえせる場所のうちどれを選ぶか？」を決定することを考えました。

データづくり

省略します。

機械学習

省略します。

学習結果

履歴もあまり関係ないようです。

LinearRegression:
train score:	 0.0006311622547798823
test score:	 -0.0007779856590277578 

LogisticRegression:
train score:	 0.33259612112963594
test score:	 0.32270441915850756 

Ridge:
train score:	 0.0006311622501095071
test score:	 -0.0007778664263304869 

DecisionTreeClassifier:
train score:	 0.34484518543722353
test score:	 0.31913204551468644 

KNeighborsClassifier:
train score:	 0.32936372915957807
test score:	 0.33037840698597515 

LinearSVC:
train score:	 0.33259612112963594
test score:	 0.32270441915850756

方針4 turnnum_and_num

この方法も上手くいきませんでした。

「今が何ターン目なのか」、「自分の駒の数」、「相手の駒の数」から「1～3番目にひっくりかえせる場所のうちどれを選ぶか？」を考えてみました。

データづくり

省略します。

機械学習

省略します。

学習結果

LinearRegression:
train score:	 0.00016355140663726342
test score:	 -0.00039965597186197854 

LogisticRegression:
train score:	 0.3406409202958094
test score:	 0.3309049079754601 

Ridge:
train score:	 0.00016359837351365147
test score:	 -0.00039964174666518026 

DecisionTreeClassifier:
train score:	 0.3515529991783073
test score:	 0.3290644171779141 

KNeighborsClassifier:
train score:	 0.3533607230895645
test score:	 0.3384969325153374 

LinearSVC:
train score:	 0.33377156943303204
test score:	 0.33312883435582824

方針5 customscore

この方法も上手くいきませんでした。

ボード上の四隅など、とっていると有利になれる場所に重みを付け、その累計を「カスタムスコア」と名付けました。
そのカスタムスコアから予測してみます。
また、今更ですが正解率の低かったLinearSVCと、そもそも分類ではないLinearRegressionとRidge抜きで学習させてみます。

データづくり

具体的なスコア内容です。

data.c

for (int i = 0; i < SIZE; i++){
    for (int j = 0; j < SIZE; j++){
        board_score[i][j] = 1;
        if (i == j || i + j == SIZE - 1)
            board_score[i][j] = 5;
        if ((i == 0 && j == 0)
         || (i == 0 && j == SIZE - 1)
         || (i == SIZE - 1 && j == 0)
         || (i == SIZE - 1 && j == SIZE - 1))
            board_score[i][j] = 10;
    }
}

機械学習

省略します。

学習結果

LogisticRegression:
train score:	 0.3382401439064608
test score:	 0.3382064913262451 

DecisionTreeClassifier:
train score:	 0.35739769150052464
test score:	 0.3285534415221041 

KNeighborsClassifier:
train score:	 0.4144805876180483
test score:	 0.33855623950755453

方針6 turnnum_and_num_

方針4は、「今が何ターン目なのか」、「自分の駒の数」、「相手の駒の数」から「1～3番目にひっくりかえせる場所のうちどれを選ぶか？」を考えました。
しかし一度プログラムミスで「今が何ターン目なのか」、「置いた後の自分の駒の数」、「置いた後の相手の駒の数」から「1～3番目にひっくりかえせる場所のうちどれを選ぶか？」を考えるとやや正解率が上がったため、これを使ってみたいと思います。

データづくり

省略します。

機械学習

省略します。

学習結果

今までよりはマシな結果であることがわかります。

LogisticRegression:
train score:	 0.38835591599907693
test score:	 0.39589262364433503 

DecisionTreeClassifier:
train score:	 0.41779579995384564
test score:	 0.38643181293746637 

KNeighborsClassifier:
train score:	 0.39761975406323147
test score:	 0.36374125067302515

フルバージョン

方針の隣に書いていた文字列が、11フォルダ内のディレクトリ名に対応しています。

実際にやってみた

方針を決めただけなので、まだ完成ではありません。

次回は

いろいろ上手く行きませんでしたが、上手く行かない方法が分かったので今後の参考になりそうです。
次回は、今回決めた方針を元に、学習方法を最適化するなどして完成形を目指します。

次回

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

コンピュータとオセロ対戦11 ～機械学習、方針決定～

今回の目標

ここから本編

方針1 1hand_2eslect

データづくり

mainの上に書くやつ

copy

cal_score

think

print

main

機械学習

学習結果

少し変えてみる 1hand_3select

学習結果

さらに変えてみる 1hand_3select_n

方針2 3history

データづくり

think

print

main

機械学習

学習結果

方針3 3history_n

データづくり

機械学習

学習結果

方針4 turnnum_and_num

データづくり

機械学習

学習結果

方針5 customscore

データづくり

機械学習

学習結果

方針6 turnnum_and_num_

データづくり

機械学習

学習結果

フルバージョン

実際にやってみた

次回は

コンピュータとオセロ対戦11　～機械学習、方針決定～

方針1　1hand_2eslect

方針3　3history_n