Windows11 で Bonsai-8Bを使う

Last updated at 2026-05-11Posted at 2026-04-29

前提

Cudaをインストールしている。※GPUを使う場合。CPUでも余裕で動きます。
私の環境では13.1でした。

>nvidia-smi
Wed Apr 29 15:34:04 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 591.86                 Driver Version: 591.86         CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060      WDDM  |   00000000:01:00.0  On |                  N/A |
|  0%   49C    P8             17W /  170W |    1854MiB /  12288MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

フォルダ構成を作る

C:\Bonsai\
  model\
  bin\

モデルをDLする

https://huggingface.co/prism-ml/Bonsai-8B-gguf
から Bonsai-8B-Q1_0.gguf をDLして model フォルダに入れる。

PrismML フォークの llama.cpp バイナリを取得

https://github.com/PrismML-Eng/llama.cpp/releases
から、cudart-llama-bin-win-cuda-12.4-x64.zip
llama-prism-b1-d104cf1-bin-win-cuda-12.4-x64.zip をDLして bin フォルダに展開する。

動作確認

コマンドプロンプトで C:\Bonsai に移動して、以下を実行。

bin\llama-cli.exe ^
  -m model\Bonsai-8B-Q1_0.gguf ^
  -n 256 ^
  --temp 0.5 ^
  --top-p 0.85 ^
  --top-k 20 ^
  -ngl 99

以下の画面が表示されれば起動できています。

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 12287 MiB):
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, VRAM: 12287 MiB

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b1-d104cf1
model      : Bonsai-8B-Q1_0.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


>

日本語を投げてみます。

> 日本語であいさつしてください。

こんにちは！私はBonsaiと申します。日本語で会話できます。何かお手伝いできますか？

[ Prompt: 158.6 t/s | Generation: 76.0 t/s ]

>

コーディングエージェントとして使う

サーバとして使う場合。OpenAI互換APIもあるので、コーディングエージェントとして使うこともできます。

bin\llama-server.exe ^
  -m model\Bonsai-8B-Q1_0.gguf ^
  --host 0.0.0.0 ^
  --port 18888 ^
  -ngl 99 ^
  --ctx-size 65536

Zed Editor の場合、LLM Providers から +Add Provider をクリックして、OpenAIを選びます。
ProviderName: Bonsai-8B
API URL: http://localhost:18888/
API Key: dummy
Model Name: Bonsai-8B-Q1_0

で、Save Provider すれば使えると思います。

公式サイト https://prismml.com/news/bonsai-8b によれば、Bonsai-8B のベースは Qwen3 8B らしいので、相応のコーディング性能もありますね。
Qwen3 8B をGPUに展開するとVARMが16GB必要になりますが、Bonsai-8B は 1.15GB のようです。

私の持っているRTX3060ではVRANの容量が足りないので、これはとてもありがたいです。
これとは別に Fedora 42 / Core i3 10100T でも Bonsai-8B が動いています。

これが発展するとVRAM番長は必要なくなりますね。
凄いですね。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up