More than 1 year has passed since last update.

Metaの大規模言語モデル「Llama 2」との会話をMacBook(M2)でPythonで実装してみた記録

Posted at 2023-08-03

Supershipの名畑です。今時の漫画アプリって「無料で1日1話読める」というのが多いように思いますが、私、なぜか家に単行本で全巻揃っている漫画もその1日1話無料で毎日1話ずつ読み直したりするんですよね。最近だと修羅の門とか。まとめて読めるのに、わざわざ毎日コツコツと。それがなぜかは自分でも答えが出ていません。

はじめに

Metaがリリースした大規模言語モデルLlama 2(ラマ2)。
前回の記事「Metaの大規模言語モデル「Llama 2」をMacBook(M2)にダウンロードして会話をしてみるまでの記録」ではこのLlama 2をローカルに落としてサンプルプログラムを叩いて会話をしてみました。

今回はこのLlama 2をPythonで呼び出してみます。
前回と同様にローカル環境です。

私の環境

前回の記事から変わっていません。

MacBook Pro(Apple M2 Proチップ)です。OSはmacOS 13 Venturaです。

Pythonのバージョンは3.10.12です。

$ python --version
Python 3.10.12

モデル

前回の記事で生成したggml-model-q4_0.binをそのまま使用します。
こいつを好きな場所に置いてください。

今回は ./models/7B/ に配置したと仮定して進めます。

llama-cpp-pythonのインストール

LlamaをMacBookなどから呼び出せるようにしてくれるllama.cppをさらにPythonから呼び出せるようにするBindingsとしてllama-cpp-pythonというものがあります。

今回はこちらを使わせていただきます。

まずはpipでインストールします。

$ pip install llama-cpp-python
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.1.73.tar.gz (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 4.6 MB/s eta 0:00:00

略

Successfully built llama-cpp-python
Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python
Successfully installed diskcache-5.6.1 llama-cpp-python-0.1.73 numpy-1.25.1 typing-extensions-4.7.1

Pythonで呼んでみる

まずは公式のGetting Startedのコードを参考にして「Where are you from now?」という質問をしてみます。model_pathは各自の環境に合わせてください。

非常に短いコードです。

from llama_cpp import Llama
llm = Llama(model_path="./models/7B/ggml-model-q4_0.bin")
output = llm("Q: Where are you from now ? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)

下記のような結果が出力されました。「30 minutes away from where I was born, in the suburbs of Paris. sierpni 2016」という回答です。
成否はともかくとしてそれっぽい回答ではあります。

{
    "id": "cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "object": "text_completion",
    "created": 1690084597,
    "model": "./models/7B/ggml-model-q4_0.bin",
    "choices": [
        {
            "text": "Q: Where are you from now ? A: 30 minutes away from where I was born, in the suburbs of Paris. sierpni 2016",
            "index": 0,
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 12,
        "completion_tokens": 26,
        "total_tokens": 38
    }
}

会話をしてみる

せっかくなので質問ではなく会話をしてみましょう。

API Referenceにあるcreate_chat_completionを用います。

必須な引数としてChatCompletionMessageのListがあります。

ChatCompletionMessageの型定義については私の探し方が悪いのかAPI Reference上に見つからなかったのでllama_types.pyのコードを見ました。

class ChatCompletionMessage(TypedDict):
    role: Literal["assistant", "user", "system"]
    content: str
    user: NotRequired[str]

roleとしてassistant、user、systemのいずれかを選び、メッセージをcontentとして指定すればいいようです。

会話をするために実際に書いたコードは下記です。会話の履歴を
messagesに保持してcreate_chat_completionに渡しているだけです。

from llama_cpp import Llama, ChatCompletionMessage

llm = Llama(model_path="./models/7B/ggml-model-q4_0.bin")
messages = list()  # 会話の履歴を格納する配列

while True:
    your_message = input("Your turn: ")
    if your_message == "quit":  # quitと入力されたら終了
        exit()

    chat_completion_message: ChatCompletionMessage = dict(role="user", content=your_message)
    messages.append(chat_completion_message)
    output = llm.create_chat_completion(messages=messages)

    res = output["choices"][0]["message"]
    print("Llama 2's response: " + res["content"])

    if res["role"] == "assistant":  # レスポンス確認
        chat_completion_message = dict(role="assistant", content=res["content"])
        messages.append(chat_completion_message)
    else:
        exit()

呼び出して会話をしてみた結果は下記です。
読みやすさのためにシステム系のログは削除しています。
Your turnの後ろが私の書いた内容で、Llama 2's responseの後ろがLlama 2によるレスポンスです。

Your turn: Please introduce yourself.
Llama's response: I am an assistant.
Your turn: I give a name. Your name is Tanaka Ichiro.
Llama's response: I am Tanaka Ichiro.
Your turn: Please introduce yourself with your name.
Llama's response: My name is Tanaka Ichiro.

与えた名前が次の返答で使われているため、会話が成り立っていそうです。

レスポンスはllama.cppを直接呼んだ時と比べると遅く感じました。それでも数秒以内ではありますが。

日本語で会話をしてみた

最後に、日本語での会話もしてみようと思います。

今回はmessagesに初期値を与えてみましょう。
前回の記事を踏まえた内容です。roleをsystemとして設定を記載しています。

messages = list()
messages.append(dict(role="system", content="Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at Japanese writing, and never fails to answer the User's requests immediately and with precision."))
messages.append(dict(role="user", content="こんにちは、ボブ"))
messages.append(dict(role="assistant", content="こんにちは。今日はいかがいたしましたか？"))

実際に会話をすると下記でした。

Your turn: 日本の首都はどこですか？
Llama's response: 東京です。
Your turn: 日本で最北端の都道府県はどこですか？
Llama's response: 北海道です。
Your turn: あなたの名前を教えてください
Llama's response: ボブです。

いい感じですね。

最後に

少し触ってみただけですが、かなり使いやすかったです。

他のメソッドも触ってみようかと思います。

触っているだけで楽しくなりますね。

宣伝

SupershipのQiita Organizationを合わせてご覧いただけますと嬉しいです。他のメンバーの記事も多数あります。

Supershipではプロダクト開発やサービス開発に関わる方を絶賛募集しております。
興味がある方はSupership株式会社採用サイトよりご確認ください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up