AIがご飯の写真にリアクションを押してくれるDiscord BOTを作る【ローカルLLM】

Last updated at 2025-06-17Posted at 2025-06-17

はじめに

私が運営するDiscordサーバには、ご飯の写真を投稿するテキストチャンネルがあります。

たまにサーバメンバがリアクションを押してくれることもありますが、正直自分が投稿したご飯の写真にリアクションが少ないと悲しいですよね。

なので今回は、投稿された写真に 「自動的にリアクションを付与するDiscord BOT」 を作成します。

すでにDiscord BOTのチュートリアル的な記事で、自動リアクションBOTの作り方を解説したものは多くあります。しかし、いくらリアクションが欲しいと言っても、まったく関係のない絵文字でリアクションされると腹が立ちます。

そこで役立つのが昨今のトレンドであるAIなのですが、ChatGPTやGeminiのAPIはお金がかかるので、今回はローカルLLMを用いて画像解析を行います。そして、ローカルLLMが選んだ最適な絵文字をDiscord BOTが付与する、というワークフローで作成していきましょう。

環境

Python 3.10.6
discord.py 2.3.2
Ollama 0.6.0
Gemma 3 12B

今回はOllama上で動くGemma 3にリクエストを投げます。Gemma 3はマルチモーダル対応のため、画像を認識することが可能です。

環境構築

・ディレクトリ構造

pkg/
├── main.py
├── modules/
│ 　　└── local_llm.py
└── cogs/
     └── auto_reaction.py

・必要パッケージのインストール

pip install discord.py emoji ollama

Cogを作る

pkg > main.py

# pkg > main.py

from discord.ext import commands
import discord, asyncio

intents = discord.Intents.all()
TOKEN = """トークン"""
guilds = """サーバのIDのリスト"""
bot = commands.Bot(command_prefix='h!', intents = intents)


async def main():
    INITIAL_EXTENSIONS = [
        'cogs.auto_reaction',
    ]

    for cog in INITIAL_EXTENSIONS:
        await bot.load_extension(cog)

    await bot.start(TOKEN)


asyncio.run(main())

pkg > cogs > auto_reaction.py

# pkg > cogs > auto_reaction.py

import discord
from discord import app_commands
from discord.ext import commands
import emoji

import base64, io
from PIL import Image

from module.local_llm import ConnectLLM


class AutoReaction(commands.Cog):

    def __init__(self, bot):
        self.bot = bot
        self.cllm = ConnectLLM()
        self.ctx_menu = app_commands.ContextMenu(name="AddReaction", callback=self.add_reaction_command)
        self.bot.tree.add_command(self.ctx_menu)

    @commands.Cog.listener()
    async def on_ready(self):
        print('Successfully loaded : AutoReaction')

    @commands.Cog.listener()
    async def on_message(self, message: discord.Message):
        target_text_channel = [
            # リアクションを押す対象にするテキストチャンネルのIDのリスト
        ]
        
        if message.channel.id not in target_text_channel:
            return
        
        if message.attachments is None or len(message.attachments) <= 0:
            return
        
        await self.do_add_emoji(message)

    async def add_reaction_command(self, interaction: discord.Interaction, message: discord.Message):
        if message.attachments is None or len(message.attachments) <= 0:
            # 添付ファイルなし
            await interaction.response.send_message("画像が添付されていないメッセージです。 ", ephemeral=True)

        else:
            # 添付ファイルあり
            await interaction.response.defer(ephemeral=True)
            res = await self.do_add_emoji(message)

        if res:
            await interaction.followup.send("追加しました", ephemeral=True)
        await interaction.send_message("追加に失敗しました", ephemeral=True)

    async def add_emoji_to_post_with_image(self, message: discord.Message):
        if 'image' not in message.attachments[0].content_type:
            return

        img = await message.attachments[0].read()

        def process_image(img_bytes):
            with Image.open(io.BytesIO(img_bytes)) as image:
                output_buffer = io.BytesIO()
                image.save(output_buffer, format='PNG')
                return output_buffer.getvalue()
            
        png_bytes = await asyncio.to_thread(process_image, img)

        result = await self.cllm.get_reaction(base64.b64encode(png_bytes).decode('utf-8'))

        if result is None:
            return
        
        for s in result:
            if emoji.emoji_count(s) == 1:
                await message.add_reaction(s)
            
        return True


def setup(bot):
    return bot.add_cog(AutoReaction(bot))

pkg > modules > local_llm.py

# pkg > modules > local_llm.py

from ollama import AsyncClient

import os


class ConnectLLM:

    def __init__(self):
        self.model = "gemma3:12b"
        self.client = AsyncClient(host="http://your-server-ip:11434")

    async def get_reaction(self, img_b64: str) -> str|None:
        system_prompt = """
            You are an expert in outputting the most relevant Emoji from a given photo.
            You are an expert in outputting from one to a maximum of five most relevant Emoji for a given photo in one continuous line.
            Do not output any useless output other than Emoji.
            Output only Emoji.
        """
        # 日本語訳
        # あなたは、与えられた写真から最も関連性の高い絵文字を出力するエキスパートです。
        # あなたは、指定された写真に最も関連性の高い絵文字を1行で1つから最大5つまで出力できるエキスパートです。
        # 絵文字以外の無駄な出力はしないでください。
        # 絵文字だけを出力してください。

        chat_base = [
            {"role": "system", "content": system_prompt},
        ]

        try:
            response = await self.client.chat(
                model=self.model,
                messages= chat_base + [{
                    "role": "user",
                    "content": "Please classify the given sentence into one of the following Emoji.",
                    "images": [img_b64],
                }],
            )
        except Exception as e:
            print(e)
            return None

        if response.done:
            return response.message.content
        return None

ポイント解説

from ollama import AsyncClient

    def __init__(self):
        self.model = "gemma3:12b"
        self.client = AsyncClient(host="http://your-server-ip:11434")

    async def get_reaction(self, img_b64: str) -> str|None:
        response = await self.client.chat(
        # 以下略

まず、Ollama-pythonライブラリは外部マシンで実行されているOllama Serverにアクセスする場合、"OLLAMA_HOST"環境変数にマシンのアドレスを指定するか、Clientを作成する必要があります。

次に、Ollamaに送信する画像はbase64形式且つPNG形式に変換するので、以下の処理を行います。

    async def add_emoji_to_post_with_image(self, message: discord.Message):
        if 'image' not in message.attachments[0].content_type:
            return

        img = await message.attachments[0].read()

        def process_image(img_bytes):
            with Image.open(io.BytesIO(img_bytes)) as image:
                output_buffer = io.BytesIO()
                image.save(output_buffer, format='PNG')
                return output_buffer.getvalue()
            
        png_bytes = await asyncio.to_thread(process_image, img)

        result = await self.cllm.get_reaction(base64.b64encode(png_bytes).decode('utf-8'))

        if result is None:
            return
        
        for s in result:
            if emoji.emoji_count(s) == 1:
                await message.add_reaction(s)
            
        return True

画像が複数枚あるとローカルマシンのスペック上かなりの時間がかかってしまうため、1枚目のみを読み込むようにしています。また、Gemma3からのレスポンスは以下のように空白が混じったりするので、emojiモジュールを使って絵文字判定を行っています。

Gemma3からのレスポンス

コンテキストメニューに追加したコマンドにも同様の機能を持たせてあります。ただし、Gemma3が画像を認識してレスポンスを返すまでは時間がかかるため、インタラクティブはdeferで遅延します。

    async def add_reaction_command(self, interaction: discord.Interaction, message: discord.Message):
        if message.attachments is None or len(message.attachments) <= 0:
            # 添付ファイルなし
            await interaction.response.send_message("画像が添付されていないメッセージです。 ", ephemeral=True)

        else:
            # 添付ファイルあり
            await interaction.response.defer(ephemeral=True)
            res = await self.do_add_emoji(message)

        if res:
            await interaction.followup.send("追加しました", ephemeral=True)
        await interaction.send_message("追加に失敗しました", ephemeral=True)

動作確認

以下のように、画像を投稿すると自動的に最適なリアクションを押してくれます。

🥩🧀🌿🍽️😋

以上

今回は画像に対してリアクションを返すBOTを作成しましたが、もちろんテキストに対してのリアクションを付けたり、テキストtoテキストで会話するBOTも作成可能です。さらにローカルLLMはどれだけ使用しても一切の費用がかからない（電気代を除く）ため、そこそこのスペックのマシンを持っているのであればサーバを盛り上げるために利用してみてもいいのではないでしょうか。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up