ollamaによるローカルLLMで、テキスト生成、マルチモーダル推論、Embeddingを試してみた

Last updated at 2025-01-30Posted at 2024-04-05

ollamaはオープンソースの大規模言語モデル（LLM）をローカルで実行できるOSSツールです。様々なテキスト推論・マルチモーダル・Embeddingモデルを簡単にローカル実行できるということで、どれくらい簡単か？実際に使えそうか？試してみました。

検証環境

本記事は以下を使用しての検証になります。

ローカル環境
- 端末
  - Mac
- ollamaアプリ
  - ollama 0.1.32
- その他
  - curl
  - Node.js v20.11.1

0. セットアップ

ollama公式ページからダウンロードし、アプリケーションディレクトリに配置します。

アプリケーションを開くと、ステータスメニューバーにひょっこりと可愛いラマのアイコンが表示され、ollama コマンドが使えるようになります。

% ollama --version
ollama version is 0.1.32

1. 使い方

1-1. テキスト生成モデル

代表的なテキスト生成モデル

DeepSeek R1
- 中国のAIスタートアップDeepSeek社によるOpenAI o1 に匹敵するとも言われるLLM
gemma2
- Google社が公開している、Geminiをもとに設計されたLLM
llama3.3
- Meta社が公開している、小型かつ高性能なLLM
phi4
- Microsoft社による大規模言語モデル(LLM)ならぬ小規模言語モデル(SLM)
Command R+
- カナダのAIスタートアップ、Cohere社による、RAGとToolの使用などの長いコンテキスト処理に最適化したモデル
mistral
- フランスのAIスタートアップ、Mistral AI社による、パフォーマンスと使いやすさを追求したモデル
mixtral
- 同じくMistral AI社による混合エキスパートモデル

テキスト生成モデルの使い方

例）gemma2を使う場合

CLI

使いたいLLMモデルを指定し pullすることで、モデルファイルがダウンロードされ、準備が整います。

% ollama pull gemma2    
pulling manifest 
pulling 456402914e83... 100% ▕████████████████▏ 5.2 GB                         
pulling 097a36493f71... 100% ▕████████████████▏ 8.4 KB                         
pulling 109037bec39c... 100% ▕████████████████▏  136 B                         
pulling 22a838ceb7fb... 100% ▕████████████████▏   84 B                         
pulling a443857c4317... 100% ▕████████████████▏  483 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success

run コマンドでLLMを立ち上げると会話モードに入り

% ollama run gemma2                              
>>> Send a message (/? for help)

メッセージを送信できるようになります。

>>> こんにちは
こんにちは！元気ですか？

今日はどのようなことでしょうか？何か教えてください。

>>> Send a message (/? for help)

メッセージを送信すると返答が表示され、チャット形式で会話ができます。

% ollama run gemma2
>>> こんにちは
こんにちは！元気ですか？

今日はどのようなことでしょうか？何か教えてください。

>>> /bye
%

/byeで会話モードが終了します。

最初の run 時にメッセージを引数で渡すことで、１回のコマンドで一往復のチャットを実行することも可能です。

% ollama run gemma2 こんにちは
こんにちは。元気ですか？

今日はどのようなことにしますか？何か教えてください。

%

また、最初にわざわざ pull しなくとも、何も考えずに run しておけば、必要に応じてモデルファイルを自動で pull してくれるようです。

% ollama run gemma2 こんにちは
pulling manifest 
pulling 456402914e83... 100% ▕████████████████▏ 5.2 GB                         
pulling 097a36493f71... 100% ▕████████████████▏ 8.4 KB                         
pulling 109037bec39c... 100% ▕████████████████▏  136 B                         
pulling 22a838ceb7fb... 100% ▕████████████████▏   84 B                         
pulling a443857c4317... 100% ▕████████████████▏  483 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
こんにちは！私はこれからあなたの質問に答えていきます。何がいしますか？

%

API

ollamaが起動すると、11434ポートにてAPIが起動しますので、curlでも呼び出すことができます。

% curl -X POST http://localhost:11434/api/generate -d '{
  "model": "gemma2",
  "prompt":"こんにちは"
 }'
{"model":"gemma2","created_at":"2024-04-05T03:20:32.451571Z","response":"こんにちは","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:32.677702Z","response":"！","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:32.902438Z","response":"私も","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:33.128141Z","response":"元気","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:33.354655Z","response":"です","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:33.593319Z","response":"か","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:33.816519Z","response":"？","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:34.038756Z","response":"\n\n","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:34.262049Z","response":"今日は","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:34.48455Z","response":"どんな","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:34.771552Z","response":"こと","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:35.018624Z","response":"します","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:35.24343Z","response":"か","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:35.464974Z","response":"？","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:35.687634Z","response":"何か","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:35.909764Z","response":"教えて","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:36.130682Z","response":"ください","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:36.360726Z","response":"。","done":false}
{"model":"gemma2","created_at":"2024-04-05T03:20:36.586798Z","response":"","done":true,"context":[106,1645,108,32789,107,108,106,2516,108,32789,235482,59901,88113,36310,235544,109,40574,42021,4302,9178,235443,235544,50837,68383,13319,235362,107,108],"total_duration":4366374131,"load_duration":573619,"prompt_eval_duration":229813000,"eval_count":19,"eval_duration":4134878000}

1-2. マルチモーダルモデル

画像を元にしたテキスト推論ができます。

代表的なマルチモーダルモデル

llava
- Microsoft社やウィスコンシン大学などの研究チームが開発した、画像に基づいて返答できるモデル
bakllava
- Shunk Works社が開発したMistral 7BベースのマルチモーダルLLM

マルチモーダルモデルの使い方

CLI

例) 以下の画像をimage.jpgというファイル名で準備

by https://www.pexels.com/photo/cat-and-dog-together-16395151/

% ollama run bakllava "Explain this picture ./image.jpg"
Added image './image.jpg'

In the image, a black and white dog is standing on top of a table, attentively looking at a spotted cat that is sitting on the floor 
below. The dog appears to be larger than the cat, which is perched closer to the lower part of the table. 

There are two balls located in between the dog and the cat, likely toys for them to play with. One ball can be found near the middle 
of the table, while the other ball is situated closer to the edge of the table on the same side as the cat. The scene captures an 
interesting interaction between the two animals.

(簡易翻訳)
画像では、白黒の犬がテーブルの上に立って、床に座っている斑点のある猫を注意深く見ています
下に。犬は猫よりも大きく見え、猫はテーブルの下の方に止まっています。

犬と猫の間にはボールが 2 つあり、犬と猫が遊ぶためのおもちゃと思われます。中央付近にボールが 1 つ見つかります
もう一方のボールはテーブルの端近くの猫と同じ側にあります。シーンは、
二匹の動物の間の興味深い相互作用。

API

curlで呼び出す場合は、画像のバイナリをBASE64エンコードして送ります。

% curl http://localhost:11434/api/generate -d '{
  "model": "bakllava",
  "prompt":"What is in this picture?",
  "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0...AElFTkSuQmCC"]
}'
{"model":"bakllava","created_at":"2024-04-05T05:08:37.125346Z","response":" In","done":false}
{"model":"bakllava","created_at":"2024-04-05T05:08:37.350192Z","response":" this","done":false}
{"model":"bakllava","created_at":"2024-04-05T05:08:37.587829Z","response":" picture","done":false}
...
{"model":"bakllava","created_at":"2024-04-05T05:08:45.997211Z","response":" wave","done":false}
{"model":"bakllava","created_at":"2024-04-05T05:08:46.226788Z","response":" pose","done":false}
{"model":"bakllava","created_at":"2024-04-05T05:08:46.440163Z","response":".","done":false}
{"model":"bakllava","created_at":"2024-04-05T05:08:46.671716Z","response":"","done":true,"context":[28705,13,11123,28747,1824,349,297,456,5754,28804,13,4816,3055,920,12738,28747,28705,560,456,5754,28725,736,349,264,1741,7548,4973,8527,28725,8189,264,18958,442,264,1326,1499,28725,395,2032,2282,304,264,4610,5782,28723,415,7548,4973,3233,8045,298,347,275,1652,442,2548,272,8333,15596,28723],"total_duration":50819836584,"load_duration":448401,"prompt_eval_count":1,"prompt_eval_duration":41268467000,"eval_count":44,"eval_duration":9545981000}

1-3. Embeddingモデル

文章のベクトル化（Embedding）もサポートされてます。

代表的なEmbeddingモデル

snowflake-arctic-embed
- Snowflake社によるオープンソースEmbeddingモデル
nomic-embed-text
- Nomic AI社によるオープンソースEmbeddingモデル
mxbai-embed-large
- Mixedbread AI社によるEmbeddingモデル、OpenAI社のtext-embedding-3-largeを上回るという噂も

呼び出し方

API

% curl http://localhost:11434/api/embeddings -d '{
  "model": "mxbai-embed-large",
  "prompt": "Represent this sentence for searching relevant passages: The sky is blue because of Rayleigh scattering"
}'
{"embedding":[-0.5319108366966248,0.0873393714427948,0.46782824397087097,...(1024)]}

2. 活用方法

主にNode.jsプロジェクトでの活用の仕方

2-1. ollamaライブラリ

ollamaを単体で使う場合の最初の選択肢です。

インストール

% npm install ollama

使い方

メッセージ生成

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'gemma',
  messages: [{ role: 'user', content: 'こんにちは' }],
})
console.log(response.message.content)

ストリーミング応答

import ollama from 'ollama'

const message = { role: 'user', content: 'こんにちは' }
const response = await ollama.chat({ model: 'gemma', messages: [message], stream: true })
for await (const part of response) {
  process.stdout.write(part.message.content)
}

Embedding

import ollama from 'ollama'

const response = await ollama.embeddings({
  model: 'mxbai-embed-large',
  prompt: 'こんにちは',
})
console.log(response.embedding)

2-2. LangChain用ollamaライブラリ

LangChainを使っている場合はこちらも便利そうです。

インストール

% npm install @langchain/community

使い方

メッセージ生成

import { Ollama } from "@langchain/community/llms/ollama";

const ollama = new Ollama({
  baseUrl: "http://localhost:11434",
  model: "gemma",
});

const stream = await ollama.stream(
  `こんにちは`
);

const chunks = [];
for await (const chunk of stream) {
  chunks.push(chunk);
}

console.log(chunks.join(""));

2-3. OpenAIライブラリ

なんと、OpenAI APIと互換性があるとのことで、OpenAIライブラリが使えるようです。

OpenAIライブラリ - インストール

% npm install openai

OpenAIライブラリ - 使い方

メッセージ生成

import OpenAI from 'openai'

const openai = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama', //実際は使われないのでダミーでOK
})

const completion = await openai.chat.completions.create({
  model: 'gemma',
  messages: [{ role: 'user', content: 'こんにちは' }],
})

console.log(completion.choices[0].message.content)

最後に

LLMをローカルで動かすには、高性能のCPU、GPU、メモリなどが必要でハードル高い印象を持っていましたが、ollamaを使うことで、普段使いのPCで驚くほど簡単にローカルLLMを導入できてしまいました。パフォーマンスはマシンスペックによって大きく変わってきそうですが、導入の簡単さを鑑みれば十分かと思います。開発時、検証時、バッチなどで大量にLLMを呼び出す場合はコンテキストウィンドウやトークン使用料が気になりがちですので、特に重宝しそうです。結論、とても便利！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up