ããã«ã¡ã¯ãããŒã«ã« LLM éžã³ã§æ¯åã¢ãã«åã®æµ·ã«ããŒããŠããã¢ãŒããã¯ãã®ããŸã±ãïŒã§ã ð
è£è¶³ã³ã¡ã³ãã質åãããããæ¡æ£ããã²ãé¡ãããŸã ð¥ºïŒ
ééã£ãŠããã åªãã æããŠãã ããïŒ
TL;DR
- whichllm 㯠ãèªåã® GPU / CPU / RAM ã§å®éã«å®çšçã«åãããŒã«ã« LLMã ããHuggingFace ã®ææ°åè£ããèªåã§ã©ã³ãã³ã°ããŠããã CLI ã§ãã
- ãVRAM ã«å ¥ãæå€§ã¢ãã«ãã§ã¯ãªãããã³ãããŒã¯ scoreã»æšå® tok/sã»quantïŒéåå圢åŒïŒã»fit typeïŒFull GPU / Partial / CPUïŒ ããŸãšããŠè©äŸ¡ããŠãããã®ããã€ã³ãã
- ç§ã® RTX 4060 Ti 16GB / Ryzen 7 8700G / RAM 61.6GB ã§å®éã«åããããéåžžçšéã¯
Qwen3.6-27BQ3_K_Mãcoding ã¯Qwen3-Coder-30B-A3Bã Full GPU ã§çŸå®çãšããçµæã«ãªããŸããã - 70B çŽã¯ Q4_K_M ã§çŽ 44.2GB å¿ èŠã§ãå®¶åºçš 24GB / 32GB ã¯ã©ã¹ã§ã¯è¶³ãã 48GB ã¯ã©ã¹ïŒäŸ: L40SïŒãæäœã©ã€ã³ ãšè¡šç€ºãããŸããã
- Windows + PowerShell ã§ã¯ UTF-8 åïŒ
PYTHONUTF8=1ã»ãïŒ ã ãäºåã«ãã£ãŠãããšãRich ã®çœ«ç·ã§æååãããã«æžã¿ãŸãã
â ïž è¡šç€ºããã
tok/sã score ã¯ãããŸã§ whichllm ã® æšå®/ã©ã³ãã³ã°å€ ã§ãã宿ž¬å€ã§ã¯ãªãç¹ãèžãŸããŠãè³Œå ¥å€æãæ€èšŒã® åéãäžããææ ãšããŠäœ¿ãã®ãããããã§ãã
â¹ïž èšäºäžã®ã¢ãã«åïŒ
Qwen3.6-27BãQwen3-Coder 30B-A3Bãªã©ïŒã¯ whichllm ãéçŽããŠãã HuggingFace åè£ã®ãã¡ãç§ã®ç°å¢ã§ãããã«æ¥ããã®ãæç²ããŠããŸããwhichllm ã®è¡šç€ºåãš HuggingFace äžã®ãªããžããªåãå®å šäžèŽããªãã±ãŒã¹ãããã®ã§ãå®éã«ããŠã³ããŒããããšãã¯Qwen/...ãªã©ã®ãªãŒã¬ãã€ãŒãŒã·ã§ã³åããæ¢ãã®ãå®å šã§ãã
ãªãã§ whichllmïŒãçµå±ã©ãåãã®åé¡ãã 1 ã³ãã³ãã§æ®Žã
ããŒã«ã« LLMãæ¥œãããã§ããã©ãæ¯åãããªããŸãããïŒ
- HuggingFace ã§æ°ã¢ãã«ãèŠã€ããŠã¯ã¯ã¯ã¯ãã
- VRAM èšç®æ©ãå©ã
- GGUF ã®
Q3_K_MãšQ4_K_Mã©ã£ã¡ïŒïŒãªãQ3_K_M/Q4_K_M㯠éåå圢åŒãæ°åãå°ããã»ã© VRAM ã¯è»œããå質ã¯èœã¡ãïŒã§ 10 åæ©ã - ãã§ãçµå±èªåã®ãã·ã³ã§äœãæ°æã¡ããåããã ã£ãâŠïŒã
whichllmïŒäœè 㯠@Andyyyy64 ããïŒã¯ãããã ãããŒãèªåæ€åº â ã¢ãã«ã©ã³ãã³ã° â å¿ èŠ VRAM 詊ç®ããŸã§äžçº ã§ãã£ãŠããã OSS ã® CLI ã§ãã
ç§ããããªãšæã£ãã®ã¯ãã® 3 ç¹ã
- ãã³ã score ãšæšå®é床ãåæã«åºã: ãã ããã£ããããããã ããããªããããã£ãããããã©é ããããã¢ãã«ã¯èªåã§èœãšããŸãã
-
--gpuã§è²·ãåã®ã·ãã¥ã¬ãŒã·ã§ã³ãã§ãã: ãRTX 5090 è²·ã£ããäœãåãïŒãã2x RTX 4090 ã ã£ããïŒãã宿©ãªãã§è©Šããã -
plan/upgradeãµãã³ãã³ãã䟿å©: ãLlama 3 70B åããã«ã¯äœ GB å¿ èŠïŒãã4090 â 5090 ã§äœãå€ããïŒããå³çã§ããã
å
¬åŒ README æ°ããããšãã° RTX 4090 ç°å¢ã§ã¯ Qwen3-32B ããã£ããããã«ãããããã Qwen3.6-27B ã #1 ãšããŠæšãããŸããããµã€ãºãå
¥ã = æè¯ãã§ã¯ãªãããšãããã³ã score ã§ã¡ãããšæ®Žã£ãŠããããã§ããïœïŒ
ã»ããã¢ããïŒuv ã§ 1 åïŒWindows / RTX 4060 Ti 16GBïŒ
ç§ã¯ã¯ãŒã¯ã¹ããŒã¹é
äžã«æ®éã« clone ããŠãAstral 補ã®é«éããã±ãŒãžãããŒãžã£ uv ã§ä»®æ³ç°å¢ãäœããŸãããMac / Linux ã§ãæé ã¯åãã§ãïŒæååã話㯠Windows éå®ãªã®ã§ã次ã®ç« ã¯é£ã°ã㊠OKïŒã
# ä»»æã®ãã©ã«ãã§
git clone https://github.com/Andyyyy64/whichllm.git
cd whichllm
# uv ã§ .venv ã»ããã¢ããïŒdev äŸåãäžç·ã«å
¥ããïŒ
uv sync --dev
# åäœç¢ºèªïŒ.venv å
ã§ whichllm ãèµ°ãããïŒ
uv run whichllm --help
â ïž README ã¯
uv sync --dev衚èšãæ°ãã uv ã§ã¯uv sync --group devïŒpyproject.tomlã®[dependency-groups]ãåç §ïŒã䜿ããŸããã©ã¡ãã§ã OKã
ð¡ ãã¿ã³ 1 çºã®ã詊ããªã
uvx whichllm@latestã§ããããŸãããã¡ãã¯ã€ã³ã¹ããŒã«äžèŠã§åžžã«ææ°ãèµ°ãã®ã§ãè²·ãæ¿ãæ€èšæã®ããšããããå©ããŠã¿ããçšã«ã¯ãã£ã¡ã®æ¹ãæè»œã§ãã
æ€åºãããç°å¢ã¯ãããªæãã§ããã
| é ç® | å€ |
|---|---|
| GPU | NVIDIA GeForce RTX 4060 Ti 16GB |
| iGPU | AMD Radeon 780M Graphics |
| CPU | AMD Ryzen 7 8700G |
| RAM | çŽ 61.6GB |
| OS | Windows 11 |
| CUDA | 13.1 |
Windows ã§èžã¿ãããèœãšã穎 2 ã€
ããã¯èšäºãšããŠæ®ããŠããããã£ãéšåã§ããMac / Linux ã®èšäºã ãšåºãŠããªããã©ãWindows + PowerShell ã ãš ãããã確å®ã«èžã ãã€ã
â cp932 ã§ Rich ã UnicodeEncodeError ã«ãªã
whichllm ã¯åºåã« Rich ã䜿ã£ãŠããŸãã
PowerShell ã®æ¢å®ãšã³ã³ãŒãã£ã³ã° cp932 ã ãšã眫ç·ãèšå·ã§ UnicodeEncodeError ãåºããŠæ¢ãŸãããšããããŸãã
å®è¡åã«ããã ãæµããŠããã° OKã
$env:PYTHONUTF8 = '1'
$env:PYTHONIOENCODING = 'utf-8'
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
ã¿ãŒããã«åäœã§åºå®ãããå Žåã¯ã.vscode/settings.json ã® terminal.integrated.env.windows ã«æžããŠãããšæ¯åæå¹ã«ãªããŸãã
{
"python.defaultInterpreterPath": "${workspaceFolder}\\.venv\\Scripts\\python.exe",
"terminal.integrated.env.windows": {
"PYTHONUTF8": "1",
"PYTHONIOENCODING": "utf-8"
}
}
â¡ symlink ã䜿ããã¹ãã WinError 1314 ã§ 3 ä»¶èœã¡ã
uv run pytest ãèµ°ããããšãtests/test_asahi_detection.py ãŸããã§ Path.symlink_to ã WinError 1314: ã¯ã©ã€ã¢ã³ãã¯èŠæ±ãããç¹æš©ãä¿æããŠããŸãã ã§èœã¡ãŸããã
ãã㯠äŸåäžåã§ã¯ãªã Windows ã® symlink æš©é ã®åé¡ã§ã
- Windows ã® éçºè ã¢ãŒãã ON ã«ããããŸãã¯
- 管çè æš©éã®ã·ã§ã«ã§å®è¡ãã
ã®ã©ã¡ããã§éããŸããæ¬äœã® CLI åäœã«ã¯åœ±é¿ããªãã®ã§ãç§ã¯ dev mode ON ã®ãŸãŸäœ¿ã£ãŠããŸãã
å®è¡çµæ â ïŒéåžžçšéãš coding çšéã§ãé¡ãå€ããã
ãããããæ¬çªãwhichllm ãåŒæ°ãªãã§å©ããšãæ€åºããèªåã®ãã·ã³åãã«äžäœåè£ãåºããŠãããŸããç§ã® RTX 4060 Ti 16GB ç°å¢ã§ã¯ãFull GPU éå®ã»é床 usable 以äžã§æ¬¡ã®ãããªçµæã«ãªããŸããã
# éåžžçšé
uv run whichllm --fit full-gpu --speed usable --top 8
# coding ç¹åïŒ--profile coding ã§çµã蟌ã¿ïŒ
uv run whichllm --profile coding --fit full-gpu --speed usable --top 5
éåžžçšéã®ããã㯠Qwen/Qwen3.6-27B ã® Q3_K_Mãcoding çšéã®ããã㯠Qwen/Qwen3-Coder-30B-A3B-Instruct ã® Q3_K_Mã䞊ã¹ããšããã
| çšé | ãããåè£ | quant | å¿ èŠ VRAM | æšå®é床 | score |
|---|---|---|---|---|---|
| éåžž | Qwen/Qwen3.6-27B |
Q3_K_M | çŽ 14.7GB | çŽ 11.8 tok/s | 85.6 |
| coding | Qwen/Qwen3-Coder-30B-A3B-Instruct |
Q3_K_M | çŽ 13.8GB | çŽ 109.7 tok/s | 79.2 |
â¹ïž éåžžçšéã®å€ã¯ flow / map ã®å³ã§ãåãã§ããcoding è¡ã® score
79.2ãš-Instructæ¥å°Ÿã¯ãå³ã«ã¯èŒããªã宿© whichllm åºåããåããŸããã--profile codingãä»ããŠæå ã§å©ããšåçŸã§ããŸãã
é¢çœãã®ã¯ãã® 2 ç¹ã
- 16GB VRAM ã«åãŸãç¯å²ã ãèŠãŠããéåžž 27B çŽã»coding 30B (MoE) çŽãçŸå®ã©ã€ã³ ã«ä¹ã£ãŠããŠãããäžå¹Žåã¯ã16GB ã§ 27B Full GPUããªããŠçºæ³ãªãã£ãã®ã§ãquant ã®é²åãçŽ çŽã«äœæããŸããã
- ãããšèŠãcoding ã®æ¹ã 10 åéãã®ïŒããšæããŸãããããã¯
Qwen3-Coder-30B-A3Bã MoE (Mixture of Experts) ã§ active params ãå°ãªã ãããwhichllm 㯠é床ã active paramsãå質ã total params ã§èŠããšããã¥ã¡ã³ãã«æèšãããŠããŠãæŽåæ§ãæ°æã¡ãããšããŠããŸãã
ð åè: whichllm README ã® "See it" ç¯
Note #3: a MoE model at 102 t/s â speed is ranked on active params, quality on total.
å®è¡çµæ â¡ïŒ70B çŽãš GPU upgrade ã plan / upgrade ã§èŠã
ãçµå± 70B çŽã£ãŠã©ãããçŸå®ãªãïŒãã 1 è¡ã§åºããã®ã plan ãµãã³ãã³ãã
uv run whichllm plan "llama 3 70b"
çµæã¯ meta-llama/Llama-3.3-70B-Instruct Q4_K_M ã§å¿
èŠ VRAM çŽ 44.2GBãFull GPU ã§èŒããããã®æäœã©ã€ã³ãšã㊠48GB ã¯ã©ã¹ïŒäŸ: L40SïŒ ã瀺ãããRTX 4090ïŒ24GBïŒã»RTX 5090ïŒ32GBïŒã¯ partial offload æ±ãïŒäžéš CPU/RAM ã«éããïŒã«ãªããŸãã
ãã®ãŸãŸ upgrade ã§ 4060 Ti â 4070 Ti SUPER â 4090 â 5090 ãæ¯èŒããŠã¿ãŸãã
uv run whichllm upgrade "RTX 4070 Ti SUPER" "RTX 4090" "RTX 5090" --top 1
åã Qwen3.6-27B ãåãããšãã«ãquant ãšæšå® tok/s ãã©ãå€ããããæšªäžŠã³ã§åºãŸãã
| GPU | VRAM | quant | Quality | æšå® tok/s | äžèš |
|---|---|---|---|---|---|
| RTX 4060 TiïŒCurrentïŒ | 16GB | Q3_K_M | 85.6 | 12 | Baseline / ä»ã§ãå®çšå |
| RTX 4070 Ti SUPER | 16GB | Q3_K_M | 88.3 (+2.7) | 28 (+16) | é床æ¹åã䞻圹 |
| RTX 4090 | 24GB | Q5_K_M | 92.4 (+6.7) | 27 | éååå質ãäžãã |
| RTX 5090 | 32GB | Q6_K | 94.3 (+8.7) | 40 (+28) | å質ãé床ãæç¢ºã«äŒžã³ã |
4090 ã§ éååãäžæ®µäžãã£ãŠå質ã䌞ã³ã5090 ã§ å質ãšé床ã®äž¡æ¹ã䌞ã³ãããšããæ¹åæ§ãã¯ã£ããèŠããã®ã奜ãã§ããããšããããææ°ãè²·ããã§ã¯ãªããæå ã®ã¢ãã«ãåºæºã«äœãæ¹åããã ãã·ãã¥ã¬ãŒã·ã§ã³ã§ããã®ã¯ãè²·ãæ¿ã倿ãšããŠããªã匷ãã
â ïž ããã§åºãæ°å€ã¯ whichllm ã®æšå®/ã·ãã¥ã¬ãŒã·ã§ã³ ã§ã宿©ãã³ãã§ã¯ãããŸãããæ¯èŒã¯åäž CPU / RAM åæã§ããè³Œå ¥å€æã® "åé" ãäžããææãšããŠèªãã®ãå®å šã
whichllm ã®ã¹ã³ã¢ãªã³ã°ããã£ããäžèº«ãèªãã§ã¿ã
ãçµå±ãªãã§ 32B ãã 27B ãäžäœãªãïŒããæ°ã«ãªã£ãŠãsrc/whichllm/ ãèŠããŠã¿ãŸããããã©ã«ãæ§æã¯æ¬¡ã®ãããªæãã
src/whichllm/
âââ cli.py # Typer ããŒã¹ã® CLI
âââ constants.py
âââ engine/ # ã¹ã³ã¢ãªã³ã° / ã©ã³ãã³ã°ããžãã¯
âââ hardware/ # GPU / CPU / RAM æ€åº
âââ models/ # HuggingFace ã®ã¢ãã«åè£éçŽ
âââ data/
âââ output/ # Rich ã䜿ã£ã衚瀺
ãã£ããèªããšãã©ã³ãã³ã°ã¯
-
hardware/ ã§å®ãã·ã³ïŒãããã¯
--gpuã§æå®ãããä»®æ³ãã·ã³ïŒã® VRAM / RAM / GPU æ°ãç¢ºå® - models/ ã§åè£ã¢ãã« Ã quant ã®çµãåæããåçµã®å¿ èŠ VRAM ã詊ç®
- engine/ ã§ãfit typeïŒFull GPU / Partial / CPUïŒããæšå® tok/sãããã³ã scoreããçµã¿åãããŠç·åã¹ã³ã¢ã«ãã
- 衚瀺æã«
--speedã--fitã§ãã£ã«ã¿ãã
ãšããæµãã«ãªã£ãŠããŸãã
ã ããããµã€ãºã倧ãã = é« scoreããããªããæ°äžä»£ã¢ãã«ãé«å質 quant ãäžäœã«æ¥ããã§ããã
ãããèªåã§èšç®æ©ãäœããšãã¡ããã¡ãé¢åãªã®ã§ãCLI äžçºã§æžãã®ã¯æ¬åœã«åãã
VS Code ãã䜿ããããããïŒããŒã«ã« task ã«ç»é²
æ¯åã¿ãŒããã«ã§ãã©ã°ãæãåºãã®ãèŸãã®ã§ãã¯ãŒã¯ã¹ããŒã¹çŽäžã«å°ã㪠task ãä»èŸŒãã§ãããŸããã
// .vscode/tasks.json
{
"version": "2.0.0",
"tasks": [
{
"label": "whichllm: hardware",
"type": "shell",
"command": "uv",
"args": ["run", "whichllm", "hardware"],
},
{
"label": "whichllm: top models",
"type": "shell",
"command": "uv",
"args": [
"run",
"whichllm",
"--fit",
"full-gpu",
"--speed",
"usable",
"--top",
"8",
],
},
{
"label": "whichllm: coding models",
"type": "shell",
"command": "uv",
"args": [
"run",
"whichllm",
"--profile",
"coding",
"--fit",
"full-gpu",
"--speed",
"usable",
"--top",
"5",
],
},
{
"label": "whichllm: plan llama 3 70b",
"type": "shell",
"command": "uv",
"args": ["run", "whichllm", "plan", "llama 3 70b"],
},
],
}
ããã« UTF-8 ç°å¢å€æ°ãå¹ãããããã.vscode/settings.json ãåãããŠå
¥ããŠãããšãRich ã®çœ«ç·ãå£ããã«è¡šç€ºãããŸãã
ð¡ ãã®
.vscode/㯠upstream ãªããžããªã®.gitignore察象ãªã®ã§ãããŒã«ã«å°çšèšå®ãšããŠæ®ããŠãããŸããfork ããã«äœ¿ããã®ã¯å°å³ã«ããããã
è§Šã£ãŠã¿ãŠã®ãŸãšã
ããŒã«ã« LLM ã®äžçãã¢ãã«ã quant ãããŒããåããéããŠãå幎åã®æèŠã§éžã¶ãšæ®éã«å€ããŸãã
whichllm ãè§Šã£ãŠããã£ãã®ã¯ããåã / åããªããã ãã§ãªããæ°æã¡ããåããããŸã§ã 1 ã³ãã³ãã§è¿ããŠããã ããšã
ç§ã® RTX 4060 Ti 16GB ãšããããæé GPU ã®æ ã§èŠãŠã
- éåžžçšéã§ 27B çŽãcoding ã§ 30B (MoE) çŽãŸã§ Full GPU ã«ä¹ã
- 70B çŽã¯çŽ çŽã« CPU offload ãããã48GB çŽã® GPU ãèŠããããªã
ãšããçŸå®ã©ã€ã³ãã¯ã£ããèŠããŸããã
ãããè²·ãæ¿ãæ€èšã«ããæ¡ä»¶ã§ãããŒã«ã« LLM ã©ããŸã§çŸå®ïŒãã£ãŠèããããšãã®åéã«ãããã¡ããã¡ãå¹ããŸãã
次㯠whichllm run ã§å®éã« 7B / 14B ã¯ã©ã¹ãåŒã£åŒµã£ãŠããŠãæšå® tok/s ãšå®æž¬ tok/s ã®å·® ãèšã£ãŠã¿ãããªãšæã£ãŠããŸãããã£ããå¥èšäºã§ãŸãšããŸããïœïŒ


