More than 3 years have passed since last update.

gnubg のニューラルネットワーク

gnubg

Last updated at 2021-03-01Posted at 2021-03-01

そろそろ次のプロジェクトに取り掛かろうと思い、深層強化学習の勉強を始めました。作りたいのは無論バックギャモンAIです。勉強の題材としては簡単そうだし、何よりモチベーションがあるからね！

どこから始めるかと考えた時に、まず思い浮かぶのは GNU Backgammon でしょう。先人たちはどのようなニューラルネットワーク(NN)を使っていたのか、調べてみました。

# /usr/games/gnubg -t
GNU Backgammon 1.06.002 Jan 25 2020
Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004 by Gary Wong.
Copyright (C) 2018 by Gary Wong and the AUTHORS; for details type `show version'.
This program comes with ABSOLUTELY NO WARRANTY; for details type `show warranty'.
This is free software, and you are welcome to redistribute it under certain conditions; type `show copying' for details.

どんな NN が使われているかは、以下のようにコマンドを入力すると分かります。

(No game) show engine
 * Contact neural network evaluator:
   - version 1.00, 250 inputs, 128 hidden units.

 * Crashed neural network evaluator:
   - version 1.00, 250 inputs, 128 hidden units.

 * Race neural network evaluator:
   - version 1.00, 214 inputs, 128 hidden units.

 * In memory 1-sided bearoff database evaluator
   - generated by GNU Backgammon
   - up to 15 chequers on 6 points (54264 positions) per player
   - database includes gammon distributions

 * In memory 2-sided bearoff database evaluator
   - generated by GNU Backgammon
   - up to 6 chequers on 6 points (924 positions) per player
   - database includes both cubeful and cubeless equities

 * Weights file and databases installed in:
   - /usr/share/gnubg

局面の評価に使われている NN は3つあり、それぞれコンタクトがある状況、"creashed"(これはあとで調べよう)、レースに分かれている。

それにしても記述が淡白だ。隠れ層が128ユニットというが、一体何層のネットワークなのでしょうか？

neuralnet.c

static void
Evaluate(const neuralnet * pnn, const float arInput[], float ar[], float arOutput[], float *saveAr)
{
    const unsigned int cHidden = pnn->cHidden;
    unsigned int i, j;
    float *prWeight;

    /* Calculate activity at hidden nodes */
    for (i = 0; i < cHidden; i++)
        ar[i] = pnn->arHiddenThreshold[i];

    prWeight = pnn->arHiddenWeight;

    for (i = 0; i < pnn->cInput; i++) {
        float const ari = arInput[i];

        if (ari == 0.0f)
            prWeight += cHidden;
        else {
            float *pr = ar;

            if (ari == 1.0f)
                for (j = cHidden; j; j--)
                    *pr++ += *prWeight++;
            else
                for (j = cHidden; j; j--)
                    *pr++ += *prWeight++ * ari;
        }
    }

    if (saveAr)
        memcpy(saveAr, ar, cHidden * sizeof(*saveAr));

    for (i = 0; i < cHidden; i++)
        ar[i] = sigmoid(-pnn->rBetaHidden * ar[i]);

    /* Calculate activity at output nodes */
    prWeight = pnn->arOutputWeight;

    for (i = 0; i < pnn->cOutput; i++) {
        float r = pnn->arOutputThreshold[i];

        for (j = 0; j < cHidden; j++)
            r += ar[j] * *prWeight++;

        arOutput[i] = sigmoid(-pnn->rBetaOutput * r);
    }
}

なんということでしょう。隠れ層は「1つ」でした。そんなに単純だったのか・・

なお、入力として与えられている250の情報は、チェッカーの情報以外にも「ギャモンの知識」がいくつかエンコードされています。これを参照。
出力は5つあり、勝率、ギャモン勝ち率、バックギャモン勝ち率、ギャモン負け率、バックギャモン負け率です。

このネットワークが"World Class"の強さを発揮するとは、バックギャモンはかくも簡単なゲームなのか？　それが本音なのかい？　僕にはそうは思えないよ。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up