Minisforum N5 Pro AI NAS で Ollama with iGPU (GPT-oss:20b & 120b)

Posted at 2025-08-14

概要

AMD の APU 搭載 NAS で Local LLM 出来ないか試みました。
gpt-oss モデルで動作確認して、以下のパフォーマンスが得られました。
20b : 10.4 tokens/s (GPU のみで動作)
120b : 5.7 tokens/s (CPUもそれなりに食う）

システム構成

Minisforum N5 Pro AI NAS
https://www.minisforum.jp/products/minisforum-n5-pro-nas

CPU: AMD Ryzen™ AI 9 HX PRO 370
RAM: 96GB, (48GB for VRAM)
Disk: SSD 1TB

OS: Ubuntu 24.04 server

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04.2 LTS
Release:        24.04
Codename:       noble
$ uname -r
6.8.0-64-generic

ROCm/AMDGPU install

公式に従って入れます。

amdgpu-install_6.4.60401-1_all.deb 入れるときにパーミッションエラーが出たのでそこだけフルパスに変更

# install ROCm
wget https://repo.radeon.com/amdgpu-install/6.4.1/ubuntu/noble/amdgpu-install_6.4.60401-1_all.deb
# ----------------
# sudo apt install ./amdgpu-install_6.4.60401-1_all.deb <- これだとエラーが出たのでフルパスに書き換え
sudo apt install /home/<path to dir>/amdgpu-install_6.4.60401-1_all.deb
# ----------------
sudo apt update
sudo apt install python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
sudo apt install rocm

# install AMDGPU
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo apt install amdgpu-dkms

終わったら、とりあえず再起動

sudo reboot

Ollama の install

公式に従って入れます。
https://ollama.com/download/linux
アップデートするときも、このコマンドを実行するだけのようです。（古いバージョンは勝手にアンインストールされる）

curl -fsSL https://ollama.com/install.sh | sh

インストーラーが良い感じに使えるデバイスを探して対応してくれるらしい。
ログを取るのを忘れましたが、GPU使えます！みたいなメッセージが出ます。

ollama の設定

AMD Ryzen™ AI 9 HX PRO 370 の内蔵GPU 890M のGPUIDは gfx1150 ですが、HSA_OVERRIDE_GFX_VERSION は 11.0.0 にしておきます。（11.5にするとセグフォる　2025年7月22日時点）

sudo systemctl edit ollama.service

### Editing /etc/systemd/system/ollama.service.d/override.conf
### Anything between here and the comment below will become the contents of the drop-in file

[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"
Environment="LD_LIBRARY_PATH=/opt/rocm/lib"
Environment="OLLAMA_HOST=0.0.0.0"

### Edits below this comment will be discarded

Environment="OLLAMA_HOST=0.0.0.0" は外部から localhost 以外からのアクセスを許可する設定なのでお好みで。

ためしてみる

流行の gpt-oss モデルを試しましたが、 20b(GPUのみ), 120b(GPU+CPU) 共に動作が確認出来ました。
gpt-oss:20b は 10.4 tokens/s
gpt-oss:120b は 5.7 tokens/s (但し、CPUもそれなりに食う）

gpt-oss:20b

動作中のリソース消費：
GPUは99%に張り付き、
CPUは1コア（スレッド）占有

$ ollama run gpt-oss:20b --verbose
>>> 空はなぜ青いのですか？
Thinking...
The user asks in Japanese: "空はなぜ青いのですか？" meaning "Why is the sky blue?" We should answer in Japanese,
explaining Rayleigh scattering, the shorter wavelengths scattering more, etc. Maybe mention also at sunrise/sunset

< --- 略 --- >

total duration:       1m57.311512004s
load duration:        42.410712ms
prompt eval count:    77 token(s)
prompt eval duration: 2.19813595s
prompt eval rate:     35.03 tokens/s
eval count:           1201 token(s)
eval duration:        1m55.070071236s
eval rate:            10.44 tokens/s

========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device  Node  IDs              Temp    Power     Partitions          SCLK  MCLK     Fan  Perf  PwrCap  VRAM%  GPU%
              (DID,     GUID)  (Edge)  (Socket)  (Mem, Compute, ID)
====================================================================================================================
0       1     0x150e,   18837  51.0°C  43.035W   N/A, N/A, 0         N/A   2800Mhz  0%   auto  N/A     30%    99%
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================

 %CPU  %MEM     TIME+ COMMAND
 106.6   3.8   0:27.82 ollama

gpt-oss:120b

GPUに加えてCPU 5～8スレッドくらい食います。
GPU負荷が上がりきらないので、 VRAM に 70GB くらい割り当てられれば iGPU だけで動くかも？あるいは NPU が使えれば...

>>> 空はなぜ青いのですか？
< --- 略 --- >
total duration:       3m1.011234749s
load duration:        53.811736ms
prompt eval count:    77 token(s)
prompt eval duration: 1.449320539s
prompt eval rate:     53.13 tokens/s
eval count:           975 token(s)
eval duration:        2m50.248872405s
eval rate:            5.73 tokens/s

======================================== ROCm System Management Interface ========================================
================================================== Concise Info ==================================================
Device  Node  IDs              Temp    Power     Partitions          SCLK  MCLK  Fan  Perf  PwrCap  VRAM%  GPU%
              (DID,     GUID)  (Edge)  (Socket)  (Mem, Compute, ID)
==================================================================================================================
0       1     0x150e,   18837  50.0°C  36.057W   N/A, N/A, 0         N/A   N/A   0%   auto  N/A     94%    61%
==================================================================================================================
============================================== End of ROCm SMI Log ===============================================

 %CPU  %MEM     TIME+ COMMAND
 527.2  41.4   4:03.88 ollama

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up