RaspberryPiでローカルLLM（onnxruntime-genai & Phi4-mini）

Last updated at 2025-03-09Posted at 2025-03-09

LLMを動かすフレームワークとしてonnxruntime-genaiがあります。
RaspberryPiでも利用できるのですが、バイナリは提供されていなくソースコードからビルドする必要があります。
今回はビルドの手順と最新のモデル（Phi4-min）を使ったサンプルの紹介をします。

使用するハードウェア

・RaspberryPi5 8GB
・SSD（なるべく速いストレージ）

SDでも大丈夫ですが、時間がかかるのでなるべくSSDを使う方がいいと思います。

ソフトウェア

・RaspberryPi OS 64bit
・onnxruntime 最新版
・CMake 最新版

パッケージ版のCMakeは少し古くてビルドに失敗するので、最新版をソースコードからインストールしてください。
こちらで手順を説明しています。

ビルド

任意のディレクトリにonnxruntime-genaiのレポジトリをクローンします。

$ git clone https://github.com/microsoft/onnxruntime-genai
$ cd onnxruntime-genai

onnxruntimeも必要なので最新版のバイナリをダウンロードします。
ダウンロード後解凍してからortディレクトリに移動させておきます。

$ curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.21.0/onnxruntime-linux-aarch64-1.21.0.tgz -o onnxruntime-linux-aarch64-1.21.0.tgz
$ tar zxvf onnxruntime-linux-aarch64-1.21.0.tgz
$ mkdir ort
$ mv onnxruntime-linux-aarch64-1.21.0 ort

次にビルドスクリプトを実行します。

$ python build.py --config Release

ビルドが完了すると onnxruntime-genai/build/Linux/Release/wheel/ にwhlファイルができています。

それではonnxruntime-genaiを使って最新のPhi-4-miniを動作させてみましょう。
pythonの仮想環境を作成します。Phi-4-miniとします。

$ python -m venv phi-4-mini
$ cd phi-4-mini
$ source ./bin/activate

onnxruntime-genaiをインストールします。

$ pip install ../onnxruntime-genai/build/Linux/Release/wheel/onnxruntime_genai-0.7.0.dev0-cp311-cp311-linux_aarch64.whl

続いてHugginfaceのコマンドラインインターフェイスをインストールします。

$ pip install huggingface-hub[cli]

次はHuggingfaceからモデルをダウンロードします。

$ huggingface-cli download microsoft/Phi-4-mini-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir .

サンプルプログラムをダウンロードして実行します。

$ curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py -o phi3-qa.py
$ python phi3-qa.py -m cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 -e cpu

プロンプトが表示されたら質問をしてみましょう。日本語での質問も可能です。

Input: Tell me a joke about generative AI

Output: Sure, here's a joke about generative AI:

Why did the generative AI refuse to attend the birthday party?

Because it couldn't generate enough excitement to keep up the conversation and it didn't want to repeat the same old jokes! Plus, it got scared of the birthday candles burning itself up!

Input: 生成AIのジョークを教えて

Output: もちろん、こちらは生成AIのジョークです:

なぜAIはパーティーに行かなかったのですか？

それは、彼らが生成されたから、すべてのジョークが既知のパターンに従っていたからです。さらに、彼らは「AIがAIを言うのはAIだ」と言うのが嫌だったからです！

（注: このジョークはAIが生成するものであり、AIがAIを言うのはAIだという概念を冗談として使っています。）

Input:

RaspberryPi5でもそこそこのスピードで回答してくれます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up