@kanekanekaneko(金子濯)in

株式会社プライム・ブレインズ

オセロで学ぶ遺伝的アルゴリズム

Last updated at 2025-06-29Posted at 2025-06-29

0. 評価値

「評価値」は文脈によって微妙に異なる意味で用いられる言葉ですが、本記事ではオセロにおいて、次の自分のコマをどこに置けば良いかを判断するための0~99の整数という意味で使います。

例えば、下図のような評価値が設定されていて1ターン目の黒番のプレイヤーは評価値に従ってプレイするとすると、丸で囲った次におけるマスのうち最も評価値の高いオレンジの丸で囲ったセルにコマを置くことになります。

評価値に従ってプレイするプレイヤーの勝率を高くしたい時、評価値はどのように決めれば良いでしょうか。

1. 遺伝的アルゴリズムによる勝てる評価値の作成

遺伝的アルゴリズムとは、生物の進化を参考に考案されたアルゴリズムです。
複数のパラメータを調整することで最適解を求めたい場合などに適しています。
交配と突然変異を十分な世代の間繰り返すことで、染色体に見立てたパラメータが次第に最適化されていきます。

1-1. 処理全体の流れ

今回は

課題：勝率を最大化すること
パラメータ：各マスの評価値

として、遺伝的アルゴリズムを利用して下記の手順に従って評価値を洗練させていきます。

ランダムな評価値を30パターン用意する。それぞれのパターンを個体と呼ぶ
すべての個体はランダムな手を差す相手との対戦を100戦行い、その勝率を記録する
勝率によるランキング選択(後述)により親を選び、一様交叉(後述)により次世代の個体を作る。これを30回繰り返す
次世代の個体は低い確率で突然変異(後述)を起こす。突然変異が起きた個体は、遺伝子の一部をランダムに変更する
2~4を50世代繰り返す

ここからは処理の流れをより詳細に解説していきます。
1-1, 1-2は準備段階の処理であり、本記事の本題である遺伝的アルゴリズムとは関係が薄いので飛ばしていただいていも問題なく理解出来ます。

1-2. ランダムな評価値を30パターン用意する

オセロの盤面のサイズに合わせて、64の要素を持つgenes配列を0~99のランダムな整数で初期化します。
実装はc言語で行います。

1-2.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define SIZE 8
#define POP_SIZE 30

typedef struct {
    int genes[SIZE * SIZE];
} Individual;

int random_val() {
    return rand() % 100;
}

void random_individual(Individual* ind) {
    for (int i = 0; i < SIZE * SIZE; ++i) {
        ind->genes[i] = random_val();
    }
}

int main() {
    srand((unsigned int)time(NULL));
    Individual population[POP_SIZE];

    for (int i = 0; i < POP_SIZE; ++i) {
        random_individual(&population[i]);
    }
    // 確認のために出力
    for (int i = 0; i < POP_SIZE; ++i) {
        for (int y = 0; y < SIZE; ++y) {
            for (int x = 0; x < SIZE; ++x) {
                printf("genes[%2d] = %2d", SIZE * y + x, population[i].genes[SIZE * y + x]);
                if (x != SIZE - 1) printf(" | ");
            }
            printf("\n");
        }
        printf("-------------------------------------------------------------------------------------------------------------------------------------\n");
    }

    return 0;
}

出力から1個体抜き出したものが下記です。genes配列に初期値が設定されていることが確認できました。

genes[ 0] = 24 | genes[ 1] = 18 | genes[ 2] = 99 | genes[ 3] = 15 | genes[ 4] = 74 | genes[ 5] = 78 | genes[ 6] = 98 | genes[ 7] = 38
genes[ 8] = 75 | genes[ 9] = 70 | genes[10] = 55 | genes[11] = 75 | genes[12] =  5 | genes[13] = 11 | genes[14] = 20 | genes[15] = 81
genes[16] = 46 | genes[17] = 48 | genes[18] = 39 | genes[19] = 22 | genes[20] = 71 | genes[21] = 95 | genes[22] = 93 | genes[23] = 42
genes[24] = 48 | genes[25] = 51 | genes[26] = 25 | genes[27] = 50 | genes[28] = 20 | genes[29] = 15 | genes[30] = 54 | genes[31] = 54
genes[32] = 27 | genes[33] = 82 | genes[34] = 71 | genes[35] = 87 | genes[36] = 29 | genes[37] = 22 | genes[38] =  2 | genes[39] = 52
genes[40] = 42 | genes[41] =  9 | genes[42] = 55 | genes[43] = 93 | genes[44] = 76 | genes[45] = 37 | genes[46] = 92 | genes[47] = 97
genes[48] = 39 | genes[49] =  4 | genes[50] = 53 | genes[51] = 20 | genes[52] = 74 | genes[53] = 16 | genes[54] = 52 | genes[55] = 76
genes[56] = 61 | genes[57] = 58 | genes[58] = 31 | genes[59] = 62 | genes[60] = 63 | genes[61] = 90 | genes[62] = 72 | genes[63] =  9

1-3. ランダムな対戦相手との対戦

次に、各個体をランダムな対戦相手と100回対戦させます。
各試合黒番または白番のどちらでプレイするかはランダムに決めています。

score関数の返り値として、100試合行った勝利数(そのまま勝率(%))を得ることができます。

1-3.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "../othello/othello_auto.h"
#include "1-2.h"

#define BLACK 1
#define WHITE 2
#define SIZE 8
#define PLAYERS 2

#define POP_SIZE 30
#define BATTLES 100

int calc_score(int result, int color) {
    switch (result) {
        case BLACK:
            return (color == BLACK) ? 1 : 0;
        case WHITE:
            return (color == WHITE) ? 1 : 0;
        default:
            return 0;
    }
}

int score(const Individual* ind, int enemies[BATTLES][SIZE * SIZE], int color) {
    int score = 0;

    for (int b = 0; b < BATTLES; ++b) {
        int rating_value_input[PLAYERS][SIZE * SIZE];
        for (int i = 0; i < SIZE * SIZE; i++) {
            rating_value_input[0][i] = (color == BLACK) ? ind->genes[i] : enemies[b][i];
            rating_value_input[1][i] = (color == BLACK) ? enemies[b][i] :  ind->genes[i];
        }

        score += calc_score(battle(rating_value_input), color);
    }
    return score;
}

int main() {
    srand((unsigned int)time(NULL));
    Individual population[POP_SIZE];
    int scores[POP_SIZE];
    int enemies[BATTLES][SIZE * SIZE] = {0};

    for (int i = 0; i < POP_SIZE; ++i) {
        random_individual(&population[i]);
    }
    // ランダムな評価値を持った対戦相手位を100体作成
    for (int e = 0; e < BATTLES; ++e) {
        for (int i = 0; i < SIZE * SIZE; ++i) {
            enemies[e][i] = random_val();
        }
    }
    // 各個体の勝率を記録
    for (int i = 0; i < POP_SIZE; ++i) {
        scores[i] = score(&population[i], enemies, (rand() % 2 == 0) ? BLACK : WHITE);
    }
    // 確認用に出力
    for (int i = 0; i < POP_SIZE; ++i) {
        printf("%2d, ", scores[i]);
    }
    printf("\n");
    return 0;
}

実行すると、30個体がそれぞれ100試合した結果の勝率(%)を確認できます。

48, 42, 48, 59, 27, 41, 60, 37, 27, 57, 32, 44, 54, 49, 30, 56, 54, 63, 57, 48, 50, 38, 75, 48, 60, 58, 48, 42, 52, 37,

score関数は、引数としてその個体、100試合分対戦相手としてのランダムに初期化された評価値、その個体がプレイする番(黒なら1, 白なら2)を受け取ります。

battle関数の中身は長くなるのでここでの紹介は省略しますが、個体と対戦相手が持つ二つの評価値を受け取り、評価値に従ってプレイした結果の勝者(黒なら1, 白なら2)を返却します。

battle関数の作成にあたっては、上記のブログで紹介されているC言語によるオセロの実装を参考にさせていただきました。
本記事の最後にコード全体を載せるので、battle関数の中身を確認したい方はそちらをご覧ください。

1-4. 勝率による選択

次に、勝率による次世代の親の 選択(Selection) を行います。
選択は遺伝的アルゴリズムの根幹に関わる部分なので、詳細に解説します。

選択では、より環境に適応できている個体が子孫を残すことで、次世代にはさらに環境に適応した個体が生まれることを期待します。
今回の問題に当てはめて考えると、単純に考えれば勝率の最も高かった2個体を親にするのが良いアイデアに思えますが、実はそうではありません。

選択は確率的に行われる必要があります。
勝率の高い個体は親に選ばれやすく、そうでない個体は選ばれなくくなるという重みづけはありますが、勝率の低い個体にも親になるチャンスを残しておくことが大切です。

これは、遺伝子の多様性を保つためです。
多様性の少ない遺伝子プールを持つ個体群では、親世代と子世代の特徴がほとんど同じになってしまい、新たな探索を突然変異のみに頼ることになってしまいます。

環境に最も適している個体しか親になれない世界

もし、現実世界の選択が「環境に最も適している個体が必ず親になり、そうでない個体は子孫を残せない」というルールで行われていたとしたら、いまだに陸で生きる動物などいなかったのではないかと思います。最初の生き物が水中で誕生したのであれば、陸で生きるための機能は水中で生きるためにはいらない機能です。

環境に適していない個体でも親になれる可能性のある世界

水中で生きる上では不利な特徴を持った個体でも子孫を残すことができていたから、言い換えると、今生きている環境で必ずしも有利とは言えない特徴を持った個体でも子孫を残すことが出来たから、現実世界には今様々な生き物が生きているのです。

1-5. 一様交叉

交叉とは、二体の親の染色体から子供の染色体を作成することです。
一様交叉では、染色体配列のそれぞれの要素が50%の確率で親Aと同じに、50%の確率で親Bと同じになるように交叉します。
他にも色々な交叉方式があり、交叉方式と課題には相性があります。

1-5.c

void crossover(const Individual* p1, const Individual* p2, Individual* child) {
    for (int i = 0; i < SIZE * SIZE; ++i) {
            child->genes[i] = (rand() % 2) ? p1->genes[i] : p2->genes[i];
    }
}

上記のように、非常にシンプルな実装ができるため今回は一様交叉を選びました。
他の交叉方法に興味がある方は、下記のページが色々な種類の交叉方法が載っていて楽しいです。

1-6. 突然変異

交叉した後、一定の低い確率で突然変異に見立てて染色体の一部の要素の値を変更します。

突然変異率が低すぎるとなかなか新しい特徴を持った個体が生まれず収束までに時間がかかります。
一方、高すぎても親の良い形質を引き継ぐことが難しくなります。
そのため、課題によってアドホックに突然変異率を調整するのが一般的です。

突然変異も下記のようにシンプルに実装しました。

1-6.c

void mutate(Individual* ind, double mutation_rate) {
    for (int i = 0; i < SIZE * SIZE; ++i) {
        if ((double)rand() / RAND_MAX < mutation_rate) {
            ind->genes[i] = random_val();
        }
    }
}

2. 実行結果を確認

50世代経過後に一番勝率が高い個体の評価値をオセロの盤面に重ねてみました。
オセロのセオリー通り、角のマスは高く、角を囲む三つのマスは低い評価値になっています。

世代ごとの各個体の勝率と平均の勝率をプロットすると下記のグラフのようになりました。
最初はランダムに初期化された評価値同士の対戦のため勝率は50%程度になっていますが、世代を重ねるにつれて勝率が高くなっていくのが確認できます。

3. 実装全体と実行コマンドの紹介

最後に、実装全体と、C言語に馴染みはないけどすぐに動かしてみたいという方のために実行コマンドの紹介です。

3-1. othello_auto.c（オセロの自動対戦を行う）

上記で紹介されているオセロの人対人対戦用のプログラムを参考に、評価値を2組受け取って下記の勝利結果を返すbattle関数を実装しました。

黒が勝利：return 1
白が勝利：return 2
引き分け：return 0

othello_auto.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define SIZE 8
#define EMPTY 0
#define BLACK 1
#define WHITE 2

#define PLAYERS 2
#define DRAW 0

int board[SIZE][SIZE];
int rating_value[PLAYERS][SIZE][SIZE];

int dx[] = { 1, 1, 0, -1, -1, -1,  0, 1 };
int dy[] = { 0, 1, 1,  1,  0, -1, -1, -1 };

// 盤面初期化
void init_board() {
    for (int y = 0; y < SIZE; y++)
        for (int x = 0; x < SIZE; x++)
            board[y][x] = EMPTY;

    board[3][3] = WHITE;
    board[4][4] = WHITE;
    board[3][4] = BLACK;
    board[4][3] = BLACK;
}

// 指定マスに置けるか
int can_put(int x, int y, int color) {
    if (board[y][x] != EMPTY) return 0;

    int opponent = (color == BLACK) ? WHITE : BLACK;
    for (int dir = 0; dir < 8; dir++) {
        int nx = x + dx[dir], ny = y + dy[dir];
        int found = 0;

        while (nx >= 0 && nx < SIZE && ny >= 0 && ny < SIZE) {
            if (board[ny][nx] == opponent) {
                found = 1;
            } else if (board[ny][nx] == color && found) {
                return 1;
            } else {
                break;
            }
            nx += dx[dir];
            ny += dy[dir];
        }
    }

    return 0;
}

// 石をひっくり返す
void flip_stones(int x, int y, int color) {
    int opponent = (color == BLACK) ? WHITE : BLACK;

    for (int dir = 0; dir < 8; dir++) {
        int nx = x + dx[dir], ny = y + dy[dir];
        int found = 0;

        while (nx >= 0 && nx < SIZE && ny >= 0 && ny < SIZE && board[ny][nx] == opponent) {
            nx += dx[dir];
            ny += dy[dir];
            found = 1;
        }

        if (found && nx >= 0 && nx < SIZE && ny >= 0 && ny < SIZE && board[ny][nx] == color) {
            nx = x + dx[dir];
            ny = y + dy[dir];
            while (board[ny][nx] == opponent) {
                board[ny][nx] = color;
                nx += dx[dir];
                ny += dy[dir];
            }
        }
    }
}

// パスすべきか
int has_valid_move(int color) {
    for (int y = 0; y < SIZE; y++)
        for (int x = 0; x < SIZE; x++)
            if (can_put(x, y, color)) return 1;
    return 0;
}

// 空白マスのカウント
int has_empty_cell() {
    for (int y = 0; y < SIZE; y++)
        for (int x = 0; x < SIZE; x++)
            if (board[y][x] == EMPTY) return 1;
    return 0;
}

// 石を置く
int put_stone(int x, int y, int color) {
    if (!can_put(x, y, color)) return 0;
    board[y][x] = color;
    flip_stones(x, y, color);
    return 1;
}

// 自動で石を置く
int put_stone_ai(int color) {
    int best_x = -1, best_y = -1, best_rating_value = -1;
    for (int y = 0; y < SIZE; y++) {
        for (int x = 0; x < SIZE; x++) {
            if (can_put(x, y, color) && rating_value[color - 1][y][x] > best_rating_value) {
                best_x = x;
                best_y = y;
                best_rating_value = rating_value[color - 1][y][x];
            }
        }
    }
    if (best_x != -1 && best_y != -1) {
        board[best_y][best_x] = color;
        flip_stones(best_x, best_y, color);
        return 1;
    } else {
        return 0;
    }
}

int result() {
    int black = 0, white = 0;
    for (int y = 0; y < SIZE; y++)
        for (int x = 0; x < SIZE; x++) {
            if (board[y][x] == BLACK) black++;
            if (board[y][x] == WHITE) white++;
        }

    if (black > white) return BLACK;
    else if (white > black) return WHITE;
    else return DRAW;
}

int battle(int rating_value_input[PLAYERS][SIZE * SIZE]) {
    init_board();
    for (int player = 0; player < PLAYERS; ++player)
        for (int i = 0; i < SIZE * SIZE; ++i)
            rating_value[player][i / SIZE][i % SIZE] = rating_value_input[player][i];

    int turn = BLACK;
    int pass_count = 0;

    while (1) {
        if (!has_empty_cell()) break;

        if (!has_valid_move(turn)) {
            pass_count++;
            if (pass_count >= 2) break;
            turn = (turn == BLACK) ? WHITE : BLACK;
            continue;
        }

        pass_count = 0;

        if (!put_stone_ai(turn)) {
            printf("エラー\n");
            continue;
        }

        turn = (turn == BLACK) ? WHITE : BLACK;
    }

    return result();
}

3-2. othello_auto.h（battle関数を遺伝的アルゴリズム側から呼び出すためのヘッダファイル）

othello_auto.h

#ifndef OTHELLO_AUTO
#define OTHELLO_AUTO

#define SIZE 8
#define EMPTY 0
#define BLACK 1
#define WHITE 2
#define DRAW 0

#define PLAYERS 2

int battle(int rating_value_input[PLAYERS][SIZE * SIZE]);

#endif

3-3. main.c

遺伝的アルゴリズムのメインの処理を行うファイルです。
オセロの対戦部分は先述したbattle関数を呼び出して行います。

main.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <math.h>
#include "othello_auto.h"

#define POP_SIZE 30
#define GENERATIONS 50
#define BATTLES 100
#define MUTATION_RATE 0.01

typedef struct {
    int genes[SIZE * SIZE];
} Individual;

int random_val() {
    return rand() % 100;
}

void random_individual(Individual* ind) {
    for (int i = 0; i < SIZE * SIZE; ++i) {
        ind->genes[i] = random_val();
    }
}

int calc_score(int result, int color) {
    switch (result) {
        case BLACK:
            return (color == BLACK) ? 1 : 0;
        case WHITE:
            return (color == WHITE) ? 1 : 0;
        default:
            return 0;
    }
}

int score(const Individual* ind, int enemies[BATTLES][SIZE * SIZE], int color) {
    int score = 0;

    for (int b = 0; b < BATTLES; ++b) {
        int rating_value_input[PLAYERS][SIZE * SIZE];
        for (int i = 0; i < SIZE * SIZE; i++) {
            rating_value_input[0][i] = (color == BLACK) ? ind->genes[i] : enemies[b][i];
            rating_value_input[1][i] = (color == BLACK) ? enemies[b][i] :  ind->genes[i];
        }
        score += calc_score(battle(rating_value_input), color);
    }
    return score;
}

void crossover(const Individual* p1, const Individual* p2, Individual* child) {
    for (int i = 0; i < SIZE * SIZE; ++i) {
            child->genes[i] = (rand() % 2) ? p1->genes[i] : p2->genes[i];
    }
}

void mutate(Individual* ind, double mutation_rate) {
    for (int i = 0; i < SIZE * SIZE; ++i) {
        if ((double)rand() / RAND_MAX < mutation_rate) {
            ind->genes[i] = random_val();
        }
    }
}

const Individual* ranking_selection(Individual population[], int scores[]) {
    // 個体のインデックスを保持する配列（0〜POP_SIZE-1）
    int indices[POP_SIZE];
    for (int i = 0; i < POP_SIZE; ++i) {
        indices[i] = i;
    }

    // 勝率に従って昇順にソート
    for (int i = 0; i < POP_SIZE - 1; ++i) {
        for (int j = 0; j < POP_SIZE - 1 - i; ++j) {
            if (scores[indices[j]] > scores[indices[j + 1]]) {
                int temp = indices[j];
                indices[j] = indices[j + 1];
                indices[j + 1] = temp;
            }
        }
    }

    // ランクに比例した重みをつける（最下位:1, 最上位:POP_SIZE）
    int rank_weights[POP_SIZE];
    int total_rank = 0;
    for (int i = 0; i < POP_SIZE; ++i) {
        rank_weights[i] = i + 1;
        total_rank += rank_weights[i];
    }

    int pick = rand() % total_rank;
    int current = 0;
    for (int i = 0; i < POP_SIZE; ++i) {
        current += rank_weights[i];
        if (current > pick) {
            return &population[indices[i]];
        }
    }

    // 念のため最後の個体を返す（到達しないはず）
    return &population[indices[POP_SIZE - 1]];
}

int ga(double mutation_rate) {
    Individual population[POP_SIZE];
    Individual new_population[POP_SIZE];
    int scores[POP_SIZE];

    // 初期化
    for (int i = 0; i < POP_SIZE; ++i) {
        random_individual(&population[i]);
    }
    int enemies[BATTLES][SIZE * SIZE] = {0};
    int history[POP_SIZE][GENERATIONS] = {0};
    int best_index = 0;
    int total_generation = 0;
    for (int generation = 0; generation < GENERATIONS; ++generation) {
        int best_score = 0;
        ++total_generation;

        for (int e = 0; e < BATTLES; ++e) {
            for (int i = 0; i < SIZE * SIZE; ++i) {
                enemies[e][i] = random_val();
            }
        }

        // 勝率評価
        for (int i = 0; i < POP_SIZE; ++i) {
            scores[i] = score(&population[i], enemies, BLACK);
            history[i][generation] = scores[i];
            if (scores[i] > best_score) {
                best_score = scores[i];
                best_index = i;
            }
        }

        printf("世代 %d: (勝率: %d)\n", generation, best_score);

        // 次世代の作成
        for (int i = 0; i < POP_SIZE; ++i) {
            const Individual* parent1;
            const Individual* parent2;

            parent1 = ranking_selection(population, scores);
            parent2 = ranking_selection(population, scores);

            crossover(parent1, parent2, &new_population[i]);

            mutate(&new_population[i], mutation_rate);
        }

        // 世代交代
        memcpy(population, new_population, sizeof(population));
    }

    for (int y = 0; y < SIZE; ++y) {
        for (int x = 0; x < SIZE; ++x) {
            printf("%2d, ", population[best_index].genes[SIZE * y + x]);
        }
        printf("\n");
    }
    
    return 0;
}

int main() {
    srand((unsigned int)time(NULL));
    ga(MUTATION_RATE);

    return 0;
}

3-4. コンパイル

main.c1、othello_auto.cを指定してコンパイルします。

gcc ./main.c ./othello_auto.c

3-5. 実行

コンパイルによってa.outという名前で実行ファイルが作られるので、下記のように実行します。

./a.out

3-6. 出力

1世代を経るごとに、その世代で最も勝率の高かった個体の勝率を出力します。
50世代終了後に50世代目の個体のうち最も勝率が高かった個体の評価値を出力します。

main.cのhistory配列を出力すると各世代の全個体の勝率を出力することが出来ます。

世代 0: (勝率: 73)
世代 1: (勝率: 74)
世代 2: (勝率: 72)
世代 3: (勝率: 84)
.
.
.
世代 47: (勝率: 93)
世代 48: (勝率: 90)
世代 49: (勝率: 96)
63,  8, 95, 66, 63, 74, 27, 87, 
 8,  4, 98, 20, 22, 62, 18, 30, 
72, 79, 32, 33, 57, 46, 90, 79, 
42, 11, 87, 23, 54, 44, 17, 43, 
39, 86, 43, 71, 22,  8, 17, 47, 
65, 45, 55, 70, 18, 15, 68,  4, 
35, 14, 76, 80, 38, 11, 53,  0, 
76, 20, 44, 75, 97, 24, 35, 62,

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up