Qiita Engineer Festa20242024年7月17日まで開催中！

ビデオポーカーというゲーム。AIを使ってこのゲームを攻略できるかどうか試してみる。

Last updated at 2024-07-15Posted at 2024-07-15

AIトレーニングの冒険

序章
ある日、ヴィクトールという若きプログラマーが、カジノのゲームに夢中になっていた。彼は常に新しい挑戦を求めていたが、最近特に興味を持ったのはビデオポーカーというゲームだった。このゲームは、デッキからランダムに配られたカードを一度だけ交換できるというシンプルなルールだが、その奥深い戦略に魅了されていた。

ヴィクトールは、AIを使ってこのゲームを攻略できるかどうか試してみることにした。彼は自分のスキルと情熱を活かし、勝率を最大化するためのAIモデルを作り上げることを決意した。

データ収集
まず、ヴィクトールはAIが学習するためのデータを生成する必要があった。彼はデッキをシャッフルし、ランダムに手札を配り、それぞれの手札がどのように評価されるかを記録するプログラムを書いた。何千ものゲームをシミュレーションし、手札とその評価結果を収集した。

彼のコンピュータは昼夜を問わずカードをシャッフルし、手札を配り続けた。ヴィクトールは、この膨大なデータがAIの学習に必要だと確信していた。

モデルの構築
データが集まると、ヴィクトールは次にAIモデルの構築に取り掛かった。彼はTensorFlowという強力なツールを使い、カードの組み合わせを理解し、最適な交換戦略を学習するニューラルネットワークを設計した。

彼の設計したモデルは、カードの組み合わせを54次元のベクトルとして入力し、どのカードを交換すべきかを54次元のベクトルとして出力するものだった。ヴィクトールは、モデルの精度を上げるために層の数を増やし、ドロップアウト層を導入して過学習を防ぐ工夫をした。

トレーニング
ヴィクトールは、集めたデータを使ってモデルをトレーニングし始めた。彼はデータをトレーニングセットとテストセットに分け、モデルが見たことのないデータに対しても正確に予測できるようにした。エポック数を増やし、バッチサイズを調整して、モデルのパフォーマンスを最大化した。

トレーニングの最中、彼の部屋にはコンピュータのファンの音が響き渡っていた。ヴィクトールは結果が出るのを待ちながら、何度もプログラムを見直し、細かな調整を加えていった。

テストと結果
ついに、モデルのトレーニングが完了した。ヴィクトールはテストデータを使ってモデルの精度を評価し、その結果に満足した。次に、100回のゲームをプレイし、その勝率を計算することにした。

ヴィクトールのモデルは、予測を基にカードを交換し、手札を評価する。彼は勝利の条件を満たすために最適な手札を作り上げることができるかどうかを見守った。

100回のゲームが終わると、ヴィクトールはモデルの勝率を計算した。彼の努力の成果がここに表れた。モデルの勝率は驚異的なものだった。

実行結果。　Win Rate: 64.00%

エピローグ
ヴィクトールは、AIを使ってビデオポーカーの攻略に成功したことに満足感を覚えた。彼は新たな挑戦に向けて意欲を燃やしつつ、この経験から学んだことを胸に刻んだ。

彼の冒険はまだ始まったばかりだった。次なる挑戦が彼を待っている。AIと共に新たな可能性を追求する彼の旅は、終わりなき冒険の始まりだった。

説明

デッキの生成とエンコード:

前回と同様にデッキを生成し、カードをエンコード・デコードする関数を定義。

トレーニングデータの生成:

まず手札を配り、交換するカードをランダムに決定。
新しい手札を評価し、勝ち手（One Pair以上）の場合のみ、トレーニングデータとして保存。

ニューラルネットワークの構築とトレーニング:

tf.keras.Sequential モデルを使用し、3層のニューラルネットワークを構築。
勝ちデータを用いてモデルをトレーニングし、評価。

勝率の計算:

トレーニングしたモデルを使用して100回のゲームをシミュレート。
各ゲームの結果を評価し、勝ち数をカウント。
勝率を計算して表示。

このコードは勝ち手のみを使用してニューラルネットワークをトレーニングし、そのモデルを用いてゲームをプレイし、勝率を計算します。

実行結果。　Win Rate: 64.00%

AIトレーニングの冒険のコード。

import random
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split

# デッキの生成
suits = ['hearts', 'diamonds', 'clubs', 'spades']
ranks = ['2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K', 'A']
deck = [(rank, suit) for suit in suits for rank in ranks]

def encode_card(card):
    rank, suit = card
    return ranks.index(rank) + suits.index(suit) * len(ranks)

def decode_card(value):
    suit = suits[value // len(ranks)]
    rank = ranks[value % len(ranks)]
    return (rank, suit)

def deal_hand(deck, hand_size=5):
    return random.sample(deck, hand_size)

def evaluate_hand(hand):
    if not hand:
        return "No Hand"
    
    rank_counts = {rank: 0 for rank in ranks}
    suit_counts = {suit: 0 for suit in suits}
    for card in hand:
        rank, suit = card
        rank_counts[rank] += 1
        suit_counts[suit] += 1

    if 4 in rank_counts.values():
        return "Four of a Kind"
    elif 3 in rank_counts.values() and 2 in rank_counts.values():
        return "Full House"
    elif 5 in suit_counts.values():
        return "Flush"
    elif len(hand) == 5 and sorted([ranks.index(card[0]) for card in hand]) == list(range(min([ranks.index(card[0]) for card in hand]), max([ranks.index(card[0]) for card in hand]) + 1)):
        return "Straight"
    elif 3 in rank_counts.values():
        return "Three of a Kind"
    elif list(rank_counts.values()).count(2) == 2:
        return "Two Pair"
    elif 2 in rank_counts.values():
        return "One Pair"
    else:
        return "High Card"

def generate_training_data(num_samples=10000):  # データセットのサイズを増やす
    inputs = []
    outputs = []
    for _ in range(num_samples):
        deck_copy = deck[:]
        hand = deal_hand(deck_copy)
        encoded_hand = [encode_card(card) for card in hand]
        input_vector = np.zeros(54)
        for card in encoded_hand:
            input_vector[card] = 1

        # ランダムに交換するカードを決定
        num_changes = random.randint(1, 5)
        change_indices = random.sample(range(5), num_changes)
        output_vector = np.zeros(54)
        for idx in change_indices:
            output_vector[encoded_hand[idx]] = 1

        new_hand = deal_hand(deck_copy, num_changes)
        new_hand += [hand[i] for i in range(5) if i not in change_indices]
        result = evaluate_hand(new_hand)

        if result in ["One Pair", "Two Pair", "Three of a Kind", "Straight", "Flush", "Full House", "Four of a Kind"]:
            inputs.append(input_vector)
            outputs.append(output_vector)

    return np.array(inputs), np.array(outputs)

# データ生成
X, y = generate_training_data(10000)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TensorFlowモデルの定義
model = tf.keras.Sequential([
    tf.keras.layers.Dense(256, activation='relu', input_shape=(54,)),  # ユニット数を増加
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(54, activation='sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),  # 学習率を調整
              loss='binary_crossentropy',
              metrics=['accuracy'])

# トレーニング
history = model.fit(X_train, y_train, epochs=50, batch_size=64, validation_split=0.2, verbose=0)  # エポック数とバッチサイズを調整

# テスト
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f'Test Accuracy: {test_acc * 100:.2f}%')

# 勝率の計算
def play_game(model):
    deck_copy = deck[:]
    hand = deal_hand(deck_copy)
    encoded_hand = [encode_card(card) for card in hand]
    input_vector = np.zeros(54)
    for card in encoded_hand:
        input_vector[card] = 1

    prediction = model.predict(np.array([input_vector]), verbose=0)
    predicted_changes = np.round(prediction).astype(int)

    change_indices = [i for i in range(54) if predicted_changes[0][i] == 1]
    new_hand = deal_hand(deck_copy, len(change_indices))
    new_hand += [hand[i] for i in range(5) if encode_card(hand[i]) not in change_indices]

    result = evaluate_hand(new_hand)
    return result in ["One Pair", "Two Pair", "Three of a Kind", "Straight", "Flush", "Full House", "Four of a Kind"]

# 100回のゲームプレイ
num_games = 100
num_wins = sum(play_game(model) for _ in range(num_games))
win_rate = num_wins / num_games * 100
print(f'Win Rate: {win_rate:.2f}%')

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

ビデオポーカーというゲーム。AIを使ってこのゲームを攻略できるかどうか試してみる。

AIトレーニングの冒険

実行結果。 Win Rate: 64.00%

説明

実行結果。 Win Rate: 64.00%

AIトレーニングの冒険のコード。

実行結果。　Win Rate: 64.00%

実行結果。　Win Rate: 64.00%