ローカルでSakanaAI(TinySwallow-1.5B)を動かしてみる

Posted at 2025-06-22

環境

Windsurf：1.10.5
python：3.11.5
pip：25.1.1

本題

どうも、たちゅみです。
今回はローカルでSakanaAI(TinySwallow-1.5B-Instruct-Q5_K_M.gguf)を動かすまでの手順を記します。
※仮想環境の準備などは省きます
※最新版のpipでない場合、うまくいかないことがあるようなので必要であればアップデートしてください

1.モデルをダウンロードする

まず、モデルをダウンロードするためにhuggingface_hubをインストールします。
pip install -U huggingface_hub

次にSakanaAIをダウンロードします。
※本記事ではモデル「TinySwallow-1.5B-Instruct-Q5_K_M」を指定します
　用途に応じて変更してください
huggingface-cli download tensorblock/TinySwallow-1.5B-Instruct-GGUF TinySwallow-1.5B-Instruct-Q5_K_M.gguf --local-dir ./models --resume-download

２.llama-cpp-pythonをインストール

下記コマンドでllama-cpp-pythonをインストールします。
CPUまたはGPUを使用するかでコマンドが異なるので注意してください。

CPUの場合

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

GPUの場合（cu121を指定）

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

※CMakeとC/C++ビルド環境がない場合、下記のエラーが出ます。
　ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
　その場合、ビルド環境を整える必要があり、別記事で記載予定です。

３.動かしてみる

下記コードを記載し、実行します。
※CPU,GPUに応じてn_gpu_layersの値を変更する必要があります

from llama_cpp import Llama

MODEL = "./models/TinySwallow-1.5B-Instruct-Q5_K_M.gguf"
llm = Llama(
    model_path=MODEL,
    chat_format="qwen",   # TinySwallow は Qwen2 テンプレ互換
    n_ctx=4096,           # 最大トークン長（必要に応じて増減）
    n_gpu_layers=0,      # GPU 全層／CPU のみなら 0
    verbose=False,
)

system_prompt = "あなたは親切な日本語アシスタントです。"
history = [{"role": "system", "content": system_prompt}]

print(">>> TinySwallow チャット (終了: exit)")

while True:
    user_msg = input("🧑‍💻> ")
    if user_msg.lower() in {"exit", "quit", "q"}:
        break

    history.append({"role": "user", "content": user_msg})

    # ② プロンプトを投げる
    response = llm.create_chat_completion(
        messages=history,
        max_tokens=512,
        temperature=0.7,
        top_p=0.95,
    )

    # ③ 応答を取り出して表示
    bot_msg = response["choices"][0]["message"]["content"].strip()
    print(f"🤖> {bot_msg}\n")

    history.append({"role": "assistant", "content": bot_msg})

SakanaAIとやりとりすることができるようになります！

以上です！
ローカルで動くものの割にCPUでもレスポンスがはやく、回答精度も悪くありません。
ぜひ試してください！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up