More than 1 year has passed since last update.

株式会社スピードリンクジャパン

【Llama2】M1 macで動かしてみた

Last updated at 2023-09-28Posted at 2023-09-28

はじめに

こんにちは、エンジニア2年目の嶋田です。
まずは、この記事を開いていただきありがとうございます！

今回は、Pythonを初めて使い、どうしてもLlama2をローカルで動かしたかった私が
さまざまなサイトを読み、試し、失敗した結果、たどり着いた手順を公開したいと思います！！
結果としてはPythonは不要で動かすところまでできました。

是非、最後までお付き合いください😄

私の環境

項目	値
コンピュータ名	MacBook Pro
チップ	Apple M1
メモリ	16GB
オペレーティングシステム	macOS Ventura
バージョン	13.4.1

$ python --version
$ Python 3.11.5

3系でやりました！Homebrewを使ってインストールしました。
こちらの記事を参考にしてみてください！

手順

以下が手順です。しっかり1つずつ実行していきましょう！

md5sumをインストールする

Llamaはモデルをダウンロードする際のチェックでmd5sumが呼ばれるのでインストールします。
md5sumについてはこちらを読んでみてください。

# 実行コマンド
$ brew install md5sha1sum

# 実行結果
$ brew install md5sha1sum
==> Downloading https://ghcr.io/v2/homebrew/core/md5sha1sum/manifests/0.9.5_1-1
Already downloaded: /Users/USERNAME/Library/Caches/Homebrew/downloads/3f9eccc5f7337814f7cf70cc13cf941044c6de6fc1b23d1d3a5e3e79d4d16e60--md5sha1sum-0.9.5_1-1.bottle_manifest.json
==> Fetching md5sha1sum
==> Downloading https://ghcr.io/v2/homebrew/core/md5sha1sum/blobs/sha256:1055a4e7c14927621a28916d8847a9d07cd7c2fa3a0b7c5b9a087aa67350fbfb
Already downloaded: /Users/USERNAME/Library/Caches/Homebrew/downloads/f06ecf4b4c4b990e70c2558f14ca6d9e9a2e92afb0bb7782fcbc1268cfdf1e06--md5sha1sum--0.9.5_1.arm64_ventura.bottle.1.tar.gz
==> Pouring md5sha1sum--0.9.5_1.arm64_ventura.bottle.1.tar.gz
🍺  /opt/homebrew/Cellar/md5sha1sum/0.9.5_1: 8 files, 76.6KB
==> Running `brew cleanup md5sha1sum`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).

実行後はバージョン確認をして、コマンドが使えるようになっていることを確認しましょう。

# 実行コマンド
$ md5sum --version

# 実行結果
$ md5sum --version                                
Microbrew md5sum/sha1sum/ripemd160sum 0.9.5 (Wed Dec  6 12:48:56 EST 2006)
  Compiled Nov 19 2022 at 04:26:05
Written by Bulent Yilmaz

Copyright (C) 2004,2006 Microbrew Software

無事、インストールできました。

モデルのダウンロード

現在、llama.cpp がGGMLのサポートを終了しGGUF形式への変換が必要になっています。
なので、以下のファイルがGGUF形式になっているので、ダウンロードしてください。

llama.cppのビルド

LlamaをMacBookなどで使えるようにするllama.cppというプロジェクトがあります。
好きな場所にllama.cppのリポジトリをcloneします。

# 実行コマンド
$ git clone https://github.com/ggerganov/llama.cpp

# 実行結果
$ git clone https://github.com/ggerganov/llama.cpp
Cloning into 'llama.cpp'...
remote: Enumerating objects: 9672, done.
remote: Counting objects: 100% (3799/3799), done.
remote: Compressing objects: 100% (453/453), done.
remote: Total 9672 (delta 3555), reused 3477 (delta 3346), pack-reused 5873
Receiving objects: 100% (9672/9672), 8.69 MiB | 6.40 MiB/s, done.
Resolving deltas: 100% (6698/6698), done.

先ほどダウンロードしたモデルを、llama.cpp/modelsに移動させます。

# 実行コマンド
$ mv {元のファイルパス}/llama-2-7b.Q4_0.gguf {移動先のディレクトリパス}/llama.cpp/models/

finderでダウンロードしたモデルのパスをコピーして実行するとわかりやすいかもしれませんね！
finderを使うのであれば、直接ドラッグ&ドロップで移動させることも可能です。

移動コマンドを実行、もしくはfinderで直接移動させたら移動できていることを確認しましょう。

# 実行コマンド
$ cd llama.cpp/models # modelsまで移動
$ ls # 確認

# 実行結果
$ cd llama.cpp/models
$ ls                                                                                   
ggml-vocab-llama.gguf	llama-2-7b.Q4_0.gguf

llama-2-7b.Q4_0.ggufが入っていればもう、終わったも同然です！

llama.cpp直下でmakeを叩いてbuildします。

# 実行コマンド
$ cd ../
$ make

# 実行結果
$ cd ../
$ make
I llama.cpp build info: 
I UNAME_S:   Darwin
I UNAME_P:   arm
I UNAME_M:   arm64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -pthread 
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread   
I NVCCFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    -Wno-pedantic -Xcompiler " "
I LDFLAGS:   -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
I CC:        Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)

cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -pthread    -c ggml.c -o ggml.o
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    -c llama.cpp -o llama.o
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    -c common/common.cpp -o common.o
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    -c common/console.cpp -o console.o
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    -c common/grammar-parser.cpp -o grammar-parser.o
cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -pthread  -c k_quants.c -o k_quants.o
cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -pthread  -c ggml-metal.m -o ggml-metal.o
cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -pthread    -c ggml-alloc.c -o ggml-alloc.o
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/main/main.cpp ggml.o llama.o common.o console.o grammar-parser.o k_quants.o ggml-metal.o ggml-alloc.o -o main -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 

====  Run ./main -h for help.  ====

c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/quantize/quantize.cpp ggml.o llama.o k_quants.o ggml-metal.o ggml-alloc.o -o quantize -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/quantize-stats/quantize-stats.cpp ggml.o llama.o k_quants.o ggml-metal.o ggml-alloc.o -o quantize-stats -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/perplexity/perplexity.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o ggml-alloc.o -o perplexity -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/embedding/embedding.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o ggml-alloc.o -o embedding -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    pocs/vdot/vdot.cpp ggml.o k_quants.o ggml-metal.o ggml-alloc.o -o vdot -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    -Wno-missing-declarations -Wno-missing-prototypes examples/train-text-from-scratch/train-text-from-scratch.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o ggml-alloc.o -o train-text-from-scratch -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp ggml.o llama.o k_quants.o ggml-metal.o ggml-alloc.o -o convert-llama2c-to-ggml -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/simple/simple.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o ggml-alloc.o -o simple -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/save-load-state/save-load-state.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o ggml-alloc.o -o save-load-state -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    -Iexamples/server examples/server/server.cpp ggml.o llama.o common.o grammar-parser.o k_quants.o ggml-metal.o ggml-alloc.o -o server -framework Accelerate -framework Foundation -framework Metal -framework MetalKit  
c++ --shared -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/embd-input/embd-input-lib.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o ggml-alloc.o -o libembdinput.so -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/embd-input/embd-input-test.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o ggml-alloc.o -o embd-input-test -framework Accelerate -framework Foundation -framework Metal -framework MetalKit  -L. -lembdinput
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/gguf/gguf.cpp ggml.o llama.o k_quants.o ggml-metal.o ggml-alloc.o -o gguf -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/llama-bench/llama-bench.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o ggml-alloc.o -o llama-bench -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/baby-llama/baby-llama.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o ggml-alloc.o -o baby-llama -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/beam-search/beam-search.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o ggml-alloc.o -o beam-search -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/speculative/speculative.cpp ggml.o llama.o common.o grammar-parser.o k_quants.o ggml-metal.o ggml-alloc.o -o speculative -framework Accelerate -framework Foundation -framework Metal -framework MetalKit 
cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -pthread  -c tests/test-c.c -o tests/test-c.o
c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/metal/metal.cpp ggml.o k_quants.o ggml-metal.o ggml-alloc.o -o metal -framework Accelerate -framework Foundation -framework Metal -framework MetalKit

これで環境の準備は完了です！

動かす

では早速、Llama2をllama.cpp経由で呼び出してみましょう。
llama.cpp/examplesにサンプルコードがあるので、ファイル作成をせずに動くことを確認できます。

今回は、llama.cpp/examples/chat.shのコードで試します。
同様の流れで試せば、作成したファイルでも試すことができます。
chat.shのコードは以下です。

#!/bin/bash

#
# Temporary script - will be removed in the future
#

cd `dirname $0`
cd ..

# Important:
#
#   "--keep 48" is based on the contents of prompts/chat-with-bob.txt
#
./main -m ./models/llama-7b/ggml-model-q4_0.gguf -c 512 -b 1024 -n 256 --keep 48 \
    --repeat_penalty 1.0 --color -i \
    -r "User:" -f prompts/chat-with-bob.txt

mainを呼び出し、その際にモデルなどの各種パラメータが渡されます。
パラメータについて詳しくはmainのREADMEをご覧ください。

簡単にまとめると以下のようになっています。

パラメータ	説明	使用例
-m FNAME, --model FNAME	LLaMAモデルファイルのパスを指定します。	`-m models/7B/ggml-model.bin`
-i, --interactive	対話モードを有効にし、リアルタイムで入力を受け付けてテキスト生成を行います。
--prompt PROMPT	プロンプトをコマンドラインオプションとして直接指定します。
--file FNAME	プロンプトが含まれるファイルのパスを指定します。ファイル内のプロンプトを使用します。
--random-prompt	ランダムなプロンプトからテキスト生成を開始します。
-n N, --n-predict N	テキスト生成時に予測するトークン数を設定します。
-c N, --ctx-size N	プロンプトのコンテキストサイズを設定します。
-r PROMPT, --reverse-prompt PROMPT	リバースプロンプトを指定します。特定のテキストが出現すると、対話モードに切り替わります。	`-r "User:"`
-p, --color	カラー出力を有効にします。

chat.shに書かれたモデルのpathは必要に応じて各自の環境に合わせてください。
今回、この記事と全く同じように実行している場合は
pathを./models/llama-2-7b.Q4_0.ggufに編集すれば良いです。

プロンプトとしてprompts/chat-with-bob.txtが渡されています。
prompts/chat-with-bob.txtの内容は以下です。

Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:

要するに、chat.shを実行するとBobとの会話が始まります。

chat.shを実際に呼んでみます。

# 実行コマンド
$ ./examples/chat.sh

# 実行結果
$ ./examples/chat.sh
Log start
main: warning: changing RoPE frequency base to 0 (default 10000.0)
main: warning: scaling RoPE frequency by 0 (default 1.0)
main: build = 1270 (c091cdf)
main: built with Apple clang version 14.0.3 (clang-1403.0.22.14.1) for arm64-apple-darwin22.5.0
main: seed  = 1695524702
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from ./models/llama-2-7b.Q4_0.gguf (version GGUF V2 (latest))

..................略..................

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:

Userの後ろが私の入力した内容です。
会話を続けてみましょう！

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.

..................以下が入力して会話した結果..................

User:how about Japan
Bob: Tokyo is the capital of Japan.
User:how far is it from Russia to Japan
Bob: Tokyo is located 10.690 kilometers from Moscow.
User:how much money and time does it take to travel from Russia to Japan
Bob: The average price for a round trip flight from Russia to Tokyo is $602. The flight duration is 10 hours and 20 minutes.
User:thanks

動いていることを確認できました！
Ctrl+Cで終了することができます。

おまけ

日本語での会話もしてみたいなーと思い、Bobを日本人化してみました。
性格も指定できるみたいですね、面白い。

先ほどのchat-with-bob.txtと同じ階層にchat-with-bob-jp.txtを作成します。
内容は以下にしました。

Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, and skilled in writing.He never fails to answer the User's requests immediately and with precision. Bob converses in a friendly and casual manner, using fluent Japanese.

User: 元気？
Bob: 元気だよ！今日の調子はどう？
User:

Bob converses in a friendly and casual manner, using fluent Japanese.の部分を追記しています。
敬語ではなく、フレンドリーにフランクに話すという目的の為です。

そして、llama.cpp/examples配下、chat.shにchat-jp.shを作成しました。
内容は以下です。

#!/bin/bash

#
# Temporary script - will be removed in the future
#

cd `dirname $0`
cd ..

# Important:
#
#   "--keep 48" is based on the contents of prompts/chat-with-bob.txt
#
./main -m ./models/llama-2-7b.Q4_0.gguf -c 512 -b 1024 -n 256 --keep 48 \
    --repeat_penalty 1.0 --color -i \
    -r "User:" -f prompts/chat-with-bob-jp.txt

実行してみましょう！！

# 実行コマンド
$ ./examples/chat-jp.sh

# 実行結果
$ ./examples/chat-jp.sh
Log start
main: warning: changing RoPE frequency base to 0 (default 10000.0)
main: warning: scaling RoPE frequency by 0 (default 1.0)
main: build = 1270 (c091cdf)
main: built with Apple clang version 14.0.3 (clang-1403.0.22.14.1) for arm64-apple-darwin22.5.0
main: seed  = 1695526979
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from ./models/llama-2-7b.Q4_0.gguf (version GGUF V2 (latest))
..................略..................

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is a Japanese AI who speaks in Kansai dialect, adding a touch of humor to his responses. He's known for his friendly and witty interactions with users. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: 元気？
Bob: 元気だよ！今日の調子はどう？
User:

会話を続けてみます！

User: 元気？
Bob: 元気だよ！今日の調子はどう？
..................以下が入力して会話した結果..................
User:いい感じだよ！
Bob: はいです！そして、今日は暑いわね！
User:今日は最近に比べて涼しい方だよ。気持ち良い風が吹いてる。
Bob: いい感じだよ！暑い気持ちよい風が吹いてる！
User:暑くないってば
Bob: そうです！
User:

少し会話の質が落ちましたね…。

日本語での会話はもう少し改善の余地がありそうです！

最後に

ここまでお付き合いいただき、ありが乙ございます。
Llama2をどうしても実行したいと思って頑張ったのですが、さまざまな記事を参考にしても
GGML形式であったり、Pythonのバージョンとパッケージのバージョンの差異で躓き、かなり苦戦しました…。
同じ環境であれば、この手順で動かすところまでいけると思います。(そしてこの手順だと動かすところまではPython不要です😅)
私自身の環境で、別プロジェクトとして再度試しましたがうまくいきました。
お役に立てたら幸いです。

私自身、まだまだ勉強中です。何かこの記事に不備がありましたら、ぜひコメントで教えていただけるとありがたいです！

今後は日本語での流暢な会話や、外部データを読み込んで学習できるように挑戦していきたいです…
何か知っている情報ありましたら、こちらもぜひ、コメントで教えてください！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up