ローカル環境で Gemma をマルチモーダル API サーバーとして立ち上げて画像認識させる方法

Last updated at 2026-07-05Posted at 2026-07-05

MacBook（Apple Silicon）のローカル環境に Ollamaを導入し、Gemmaシリーズのマルチモーダルモデルを利用して、画像認識が可能なOpenAI互換APIサーバーを構築する方法を紹介します。

対象環境

macOS
Apple Silicon（Mシリーズ）
Ollama
OpenAI SDK（Python / Node.js）

Ollamaのインストール

brew install ollama

起動確認

ollama --version

APIエンドポイント

http://localhost:11434

モデルのダウンロード

ollama pull gemma3:12b

画像をBase64化

IMG_DATA=$(base64 -i "$HOME/Desktop/sample.png")

OpenAI互換APIで画像認識

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d "{
    \"model\":\"gemma3:12b\",
    \"messages\":[
      {
        \"role\":\"user\",
        \"content\":[
          {\"type\":\"text\",\"text\":\"この画像を日本語で説明してください。\"},
          {
            \"type\":\"image_url\",
            \"image_url\":{
              \"url\":\"data:image/png;base64,${IMG_DATA}\"
            }
          }
        ]
      }
    ]
  }"

レスポンス例

{
  "id": "chatcmpl-437",
  "object": "chat.completion",
  "model": "gemma3:12b",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "画像にはMacBookのデスクトップが表示されています。ブラウザ、Visual Studio Code、ターミナルが起動しており、GitHubのページが表示されています。"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 1087,
    "completion_tokens": 86,
    "total_tokens": 1173
  }
}

Pythonから利用する

import base64
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

with open("sample.png", "rb") as f:
    image = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="gemma3:12b",
    messages=[
        {
            "role":"user",
            "content":[
                {"type":"text","text":"画像を説明してください"},
                {
                    "type":"image_url",
                    "image_url":{
                        "url":f"data:image/png;base64,{image}"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

運用のポイント

画像は1024〜1200px程度にリサイズすると高速
初回推論は時間がかかるため、SDKでは timeout=60 程度を推奨
OpenAI互換APIなので既存コードをほぼそのまま利用可能

まとめ

OllamaとGemmaを組み合わせることで、Macだけで完結するローカルVLM環境を簡単に構築できます。

完全オフライン
OpenAI互換API
Python・Node.jsから利用可能
機密データを外部へ送信しない安全な画像認識環境

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up