【2048】専用MCP x AIエージェントで、AIにゲームをプレイさせる

Posted at 2025-04-16

まずはプレイ動画

当倍速で、こんな感じです。

サクサク動いて、高スコアをたたき出すのはまだまだ先そうです。
最初の一歩はこんなもんだと思います。

はじめに

はじめまして
ふっきーです。

結構前から、Model Context Protocol(MCP)が話題になっていて、様々な活用方法が見出されていると思います。MCPの解説はたくさん優良記事があるので任せるとして、今回はその活用事例を共有します。

個人的に、MCPが一般化するトリガーになるのは、エンタメ分野(特にゲーム)になるんじゃないかなと予想しています。例えば、配信でLLMにゲームをプレイさせてその成長を見守ったり、視聴者コメントでLLMのプレイを手助けをしたり、24時間365日配信したり等です。(すでにAnthoropicのポケモンの例があります)

上記のようなことが将来的に、自分の好きなゲームで、自分の好きなスタイルでできるといいな！と思っています。まずは自分が先達になろうと思い、操作とルールが比較的簡単な"2048"というゲームを題材に、専用MCPとゲーミングエージェントを自作して動かしてみました。

構成

ゲーミングエージェントの構成をいろいろ検討したのですが、特定のゲームに特化したMCPサーバーと汎用的なゲーミングエージェントという組み合わせが様々なゲームをプレイさせるという点において良さそうです。

なので、ゲームの盤面把握とゲーム操作をツール化した2048MCPサーバーと、そのツールを扱うゲーミングエージェントを実装しました。これらは、LangChain/LangGraphを使って実装しています。

イメージはこんな感じです。

実装

2048専用MCPサーバーとゲーミングエージェントの実装を紹介します。
一部、自作モジュールを使っていますが、実現方法の大枠はつかめると思います。

2048 MCPサーバー

MCPサーバーの実現には、FastMCPを利用しています。
参考：https://github.com/modelcontextprotocol/python-sdk

MCPサーバー実装のポイントは以下の通りです。

ゲームとのインタフェースとする
MCPサーバー内でLLMを活用

もともとは、ゲームをプレイするゲーミングエージェント側で盤面把握をして、上下左右の操作をMCPtoolで実現しようとしていました。ただ、前述のとおり、ゲーミングエージェントを汎用的に扱うためには、2048MCPサーバーがゲームのインタフェースの役割を担う必要があり、盤面把握もMCP側で実現しています。

盤面把握には、画像認識をする必要があります。もしかしたらトラディショナルな画像処理で盤面把握できるかもしれないですが、LLMに任せることにしました。結果、LLMが使用するMCPサーバーがLLMを使っているというややこしい関係になりました。

py 2048_mcp_server.py

#!/usr/bin/env python3

from pynput.keyboard import Controller, Key
from pydantic import BaseModel, Field
from mcp.server.fastmcp import FastMCP

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts import HumanMessagePromptTemplate

from llm.base_llm import BaseLLM
from llm.models import OPENAI_4o_MINI
from tools import get_cropped_screenshot_base64

# Create our MCP server instance
mcp = FastMCP("pcgame-control")
keyboard = Controller()


class LLM(BaseLLM):
    class BoardInfo(BaseModel):
        """
        Board information for the game.
        """

        board: list[list[int]] = Field(description="The game board. 4 x 4 grid.")

    def __init__(self, llm):
        system_prompt = """
        Please analyze the game board and set into list.

        ex)
        [
          [0, 0, 0, 0],
          [2, 0, 0, 4],
          [0, 0, 2, 32],
          [0, 0, 32, 64]
        ]
        """
        structure = self.BoardInfo

        super().__init__(
            llm=llm, system_prompt=system_prompt, output_structure=structure
        )

    @property
    def prompt(self):
        return ChatPromptTemplate.from_messages(
            [
                (
                    "system",
                    self.system_prompt,
                ),
                self._get_human_message_template(),
            ]
        )

    def _get_human_message_template(self):
        return HumanMessagePromptTemplate.from_template(
            ["{input}", self._get_image_template()]
        )

    def _get_image_template(self):
        encoded_image = get_cropped_screenshot_base64()
        return {"image_url": {"url": f"data:image/png;base64,{encoded_image}"}}

    def run(self, query: str) -> str:
        chain = self.prompt | self.model
        output = chain.invoke(query)
        return output

    def to_str(self, output: BoardInfo) -> str:
        return str(output.board)


llm = LLM(OPENAI_4o_MINI)


@mcp.tool()
async def get_board_info() -> str:
    """
    Returns a string indicating that the board info was received.
    """
    query = "Please analyze the game board and set into list."
    response = llm.run(query)
    board_str = llm.to_str(response)
    print(board_str)
    return board_str


@mcp.tool()
async def move_up() -> str:
    """
    Presses the up arrow key.
    """
    keyboard.press(Key.up)
    return "Up key pressed."


@mcp.tool()
async def move_down() -> str:
    """
    Presses the down arrow key.
    """
    keyboard.press(Key.down)
    return "Down key pressed."


@mcp.tool()
async def move_left() -> str:
    """
    Presses the left arrow key.
    """
    keyboard.press(Key.left)
    return "Left key pressed."


@mcp.tool()
async def move_right() -> str:
    """
    Presses the right arrow key.
    """
    keyboard.press(Key.right)
    return "Right key pressed."


if __name__ == "__main__":
    # Launch the MCP server on stdio transport by default.
    mcp.run(transport="stdio")

ゲーミングエージェント

ゲーミングエージェントは、盤面把握、次の操作の決定、操作をします。
実装のポイントは以下の通りです。

reasoningモデルを使わない
操作のStepごとにLLM/Agentを分ける

どちらも高速化のためです。人間ならコンマ数秒でできる盤面把握に、LLMは数秒～十数秒かかります。そこからreasoningして次の一手を考えて、操作して…を毎回繰り返していると、ゲームの進行が遅くて仕方がありません。

ハイスコアもどうなるか気になりますが夜通しプレイさせないといけなさそうなので、今回はプレイ速度>ハイスコアの優先度で実現しています。

from pydantic import BaseModel, Field
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.prebuilt import create_react_agent


from llm.base_agent import BaseAgent
from llm.base_llm import BaseLLM


class DirectionClass(BaseModel):
    direction: str = Field(description="move direction: up, down, left, right")


class Player:
    def __init__(self, llm, tools):
        self.llm = llm
        self.tools = tools
        self.observe_agent = create_react_agent(self.llm, self.tools)
        self.move_agent = create_react_agent(self.llm, self.tools)

    async def observe(self) -> str:
        output = await self.observe_agent.ainvoke(
            {
                "messages": [
                    (
                        "human",
                        "observe 2048 board collectly",
                    ),
                ]
            }
        )
        res = output["messages"][-1].content
        print(res)
        return res

    async def decide(self, query: str) -> str:
        _llm = self.llm.with_structured_output(DirectionClass)
        prompt = [HumanMessage(query)]
        res = await _llm.ainvoke(prompt)
        print(res)
        return res

    async def move(self, query: str) -> str:
        output = await self.move_agent.ainvoke(
            {
                "messages": [
                    (
                        "human",
                        f"move {query}",
                    ),
                ]
            }
        )
        res = output["messages"][-1].content
        return res

エントリポイント

main関数でMCPサーバーのツールを取得しているので、そちらの実装も載せておきます。
ご覧の通り、無限ループ内でゲームのプレイ処理を呼び出しています。終了判定＆break処理を実装していないため、誰か(自分の財布)が死ぬまで終わらない、闇のゲームになっています。

from langchain_mcp_adapters.client import MultiServerMCPClient

from llm.models import OPENAI_4o_MINI
from player import Player

_2048_MCP = {
    "command": "python",
    "args": ["./pcgame_server.py"],
    "transport": "stdio",
}


async def main():
    async with MultiServerMCPClient({"2048": _2048_MCP}) as client:
        tools = client.get_tools()

        player = Player(OPENAI_4o_MINI, tools)
        while True:
            board_info = await player.observe()
            direction = await player.decide(board_info)
            await player.move(direction)


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

実装は以上です。

さいごに

これまでたくさんMCPの情報を見てきました。が、ゲームに特化したMCPサーバーを作って、ゲームをプレイさせるというのを見なかったので、紹介しました。

「自分が第一人者である」とここで、宣言しておきます（笑）

おそらくエンタメ利用以外にも、モデルのベンチマークになるんじゃないかなと思ったりしています。
ゲームのスコア=モデルの精度といった感じで、多くの人にとってイメージしやすくなるからです。

ただ、現時点では、処理に時間がかかる以上アクションゲームは苦手ですし、操作が複雑なゲームも厳しそうです。ゲーム側が公式ＭＣＰ出してくれると、状況は大きく変わりそうな気がしています。

みなさんもぜひ試してみてください。
:::message
ゲームが自動操作を許していない場合があるので、利用規約を確認しましょう
:::

ちなみに最終スコアはこれ

次は、精度改善してみます

以上になります。
最後まで読んでいただき、ありがとうございました！！

↓おともだちがすくないですふぉろーしてください
https://x.com/fky_create

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up