@vfujiin

Ollamaを使ってローカルLLMでVision、Tool、Embeddingを試した

Last updated at 2025-05-25Posted at 2025-05-22

はじめに

Ollamaを使ってTools、embedding、Visionの動作確認を行った記録です。OpenAI互換APIエンドポイントとして動くと色々と使い勝手がいいかな、という意味で、合わせて確認しています。

検証環境

Mac mini M1 2020
16GBメモリ
macOS Sequoia 15.3.2
ollama 0.6.2

検証

検証環境の準備

Ollamaのサイトにアクセスし、Downloadボタンを押して利用環境にあわせてインストールするだけです。macOS以外も同じように試してみましたが、Windowsでもほとんど同じでした。Linuxの場合は、コマンドを実行するように記載があります。それぞれDownloadページを確認いただければ簡単なことがわかります。

Dockerコンテナとして利用したい場合は、docker hub/ollamaを使うと良いようです。公式GitHub/ollamaよりリンクされていました。私はまだ試していません。

OpenAI互換API

Gemma3:12b

まず、基本的な動作として、OpenAI互換APIでの動作を確認しました。モデルはgemma3:12bを試してみました。

次のcurlコマンドをterminalにて実行してみると、期待通りに動作しました。最後の行の| jq .はjsonを整形して出力したかったので追加しています。不要な場合は無視してください。

curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
   "model": "gemma3:12b",
   "messages": [
      {"role": "user", "content": "tell me about magic in two sentences."}
   ]
 }' | jq .

回答結果はこちらになります。生成する文章の量をコントロールしないと、Mac mini M1 2020 16GBメモリでは、かなり時間がかかります。実質使い物にはならないかなー、と思えるレスポンスでした。

{
  "id": "chatcmpl-353",
  "object": "chat.completion",
  "created": 1743849975,
  "model": "gemma3:12b",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Magic, whether through illusions, folklore, or spiritual beliefs, represents the ability to manipulate reality in ways that defy natural laws. It often evokes a sense of wonder, mystery, and the potential for extraordinary experiences."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 17,
    "completion_tokens": 43,
    "total_tokens": 60
  }
}

次に日本語で質問した結果です。

curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
   "model": "gemma3:12b",
   "messages": [
      {"role": "user", "content": "手品について２センテンスで解説して。"}
   ]
 }' | jq .

回答結果は自然な日本語に見えます。

{
  "id": "chatcmpl-406",
  "object": "chat.completion",
  "created": 1743850006,
  "model": "gemma3:12b",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "手品は、観客を欺くことを目的とした技術と演出を組み合わせたパフォーマンスです。巧妙なトリックや心理的な操作によって、超自然的な現象に見せかけ、観客を楽しませます。"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 45,
    "total_tokens": 65
  }
}

Llama3.1:8b

参考としてgemma3:12bとllama3.1:8bを比較してみます。上の結果はgemma3:12bの結果で、次がllama3.1:8bの結果となります。

curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
   "model": "llama3.1",
   "messages": [
      {"role": "user", "content": "tell me about magic in two sentences."}
   ]
 }' | jq .

回答結果はこちらになります。

{
  "id": "chatcmpl-60",
  "object": "chat.completion",
  "created": 1743850438,
  "model": "llama3.1",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Magic is a concept that has been present in human culture for thousands of years, often representing the mysterious and supernatural forces that are beyond human control, and can take many forms including spell-casting, divination, and illusions. Despite its roots in mythology and superstition, modern concepts of magic have also evolved to encompass various genres of performance art, trickery, and fantasy storytelling."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 77,
    "total_tokens": 95
  }
}

次に日本語で質問しました。

curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
   "model": "llama3.1",
   "messages": [
      {"role": "user", "content": "手品について２センテンスで解説して。"}
   ]
 }' | jq .

回答結果は日本語にはなっていますが、不自然でした。

{
  "id": "chatcmpl-112",
  "object": "chat.completion",
  "created": 1747835887,
  "model": "llama3.1",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "手品は、見る目にもなくする物や予期せぬ変化によって楽しみを感じさせる形の上手く掛け合わせた技術やアクションの総称です。古代から続く多くの国の伝統文化の中には、使われていました。"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 22,
    "completion_tokens": 62,
    "total_tokens": 84
  }
}

Tools

Llama3.1 8b 英語

次にtoolsについて試してみました。toolsに対応したモデルのリストより確認できます。llama3.1 8bを試しました。まずは英語で確認しました。

curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "llama3.1:latest",
    "messages": [
        {
            "role": "user",
            "content": "What is the weather like in Paris today?"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current temperature for a given location.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and country e.g. Bogotá, Colombia"
                        }
                    },
                    "required": [
                        "location"
                    ],
                    "additionalProperties": false
                },
                "strict": true
            }
        }
    ]
}' | jq .

動作していることが確認できました。

{
  "id": "chatcmpl-511",
  "object": "chat.completion",
  "created": 1743850979,
  "model": "llama3.1:latest",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_4br30iig",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"Paris, France\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 175,
    "completion_tokens": 19,
    "total_tokens": 194
  }
}

Llama3.1 8b 日本語

日本語でも試してみましたが、機能しない場合がありました。

curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "llama3.1:latest",
    "messages": [
        {
            "role": "user",
            "content": "今日の名古屋の天気を教えて"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "与えられた場所の天気の情報を返す",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "天気を知りたい場所の県や市の名前 例）愛知県名古屋市"
                        }
                    },
                    "required": [
                        "location"
                    ],
                    "additionalProperties": false
                },
                "strict": true
            }
        }
    ]
}' | jq .

こちらはうまくいった例です。

{
  "id": "chatcmpl-944",
  "object": "chat.completion",
  "created": 1743851356,
  "model": "llama3.1:latest",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_zi0l8qp9",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"愛知県名古屋市\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 193,
    "completion_tokens": 22,
    "total_tokens": 215
  }
}

locationが意味をなさない場合の結果です。感覚的には3回に1回程度しか成功しない感じです。

{
  "id": "chatcmpl-157",
  "object": "chat.completion",
  "created": 1743851263,
  "model": "llama3.1:latest",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_3yodxphi",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"对測東市\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 193,
    "completion_tokens": 32,
    "total_tokens": 225
  }
}

Vison

次は画像分析です。Visionに対応したモデルのリストより確認できます。

OpenAI Images and Visionを参考にcurlコマンドを実行しましたが、URLをリンクする方法は機能しませんでした。OpenAI互換のAPIエンドポイントではURL指定でもBase64による画像埋め込みのいずれにおいても動作しませんでした。

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma3:12b",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'| jq .

URLを指定した場合、次のようなエラーが出力されました。

{
  "error": {
    "message": "invalid image input",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

/api/chatエンドポイントを使ってbase64画像埋め込みで動作を確認できました。OllamaのAPIリファレンスを参考にしています。

macosでは、imagesにcat {画像ファイル}|base64|pbcopyにて、クリップボードにコピーしたbase64エンコードした画像を入力して利用しています。コピペだと大きくなりすぎるため、このようにしています。

画像ファイルをbase64エンコードしてIMGBASE64に格納します。

IMGBASE64=`cat {画像ファイル}|base64`

上記の変数をcurlに渡しています。

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3:12b",
  "messages": [
    {
      "role": "user",
      "content": "how many people in this image?",
      "images": ["'${IMGBASE64}'"]
    }
  ],
  "stream": false
}'| jq .

こちらが結果です。下記はUnsplashで見つけた画像をSサイズでダウンロードしたものです。画像はこちらにある３名の白衣を着た医師が写っている写真です。

{
  "model": "gemma3:12b",
  "created_at": "2025-04-05T12:42:39.892683Z",
  "message": {
    "role": "assistant",
    "content": "Based on the image, there are **three** people visible. \n\n*   One person appears to be a doctor or medical professional, standing and pointing at the X-ray images.\n*   There are two other people seated, facing the X-ray display."
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 74066393500,
  "load_duration": 74944459,
  "prompt_eval_count": 276,
  "prompt_eval_duration": 66311338583,
  "eval_count": 56,
  "eval_duration": 7676580959
}

今度は、こちらにある交差点を渡っている人の画像で人数をカウントしてみました。上部に写っている人はカウントできないようですね。

{
  "model": "gemma3:12b",
  "created_at": "2025-04-05T13:02:19.847515Z",
  "message": {
    "role": "assistant",
    "content": "Based on the image, I can identify **8 people**. \n\nHere's a breakdown:\n\n*   **Group 1:** 3 people crossing the crosswalk in the foreground.\n*   **Group 2:** 2 people crossing the crosswalk slightly further away.\n*   **Group 3:** 3 people walking on the sidewalk."
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 83711237084,
  "load_duration": 7771317417,
  "prompt_eval_count": 276,
  "prompt_eval_duration": 65624979750,
  "eval_count": 75,
  "eval_duration": 10310694125
}

embedding

最後にembeddingモデルを使ってOpenAI互換APIエンドポイントにて試しました。Embeddingに対応したモデルはの一覧より確認できます。今回はnomic-embed-textを使ってみました。

curl http://localhost:11434/v1/embeddings -d '{
    "input": "Your text string goes here",
    "model": "nomic-embed-text"
  }'|jq .

Nomic-embed-textはデフォルトだと768次元のようです。768行の結果が出力されるため省略しています。

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        -0.00182828,
## 中略 ##
        -0.020011207
      ],
      "index": 0
    }
  ],
  "model": "nomic-embed-text",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

参考情報

デフォルト設定のOllamaでは、リモートホストからのアクセスを受け付けません。次のように設定する必要があります。

macos

terminalより下記のコマンドを実行します。

launchctl setenv OLLAMA_HOST "0.0.0.0:11434"

次にOllamaアプリを再起動します。再起動後はリモートホストからのアクセスを受け付けるようになっています。

参考情報

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up