More than 1 year has passed since last update.

株式会社 Qualiteg

話題の小型LLM 「Phi-3-mini」に中二レベルの連立方程式を解かせたら、ちゃんと解いてきた

Last updated at 2024-05-08Posted at 2024-04-25

概要

おととい(2024/4/23)マイクロソフトから発表された新LLM「Phi-3-mini-128k-instruct」を使ってみました
3.82B の小さなモデルですが賢い！
（小さいのでLLMではなく"SLM"ともいうみたいですね）
中二で習う連立方程式なんて楽々解いてきました
量子化なしでVRAM消費量は8GB弱です
4Bit 量子化して実行しても頭脳は明晰なままで、VRAM消費量は 3GB弱でした

デモ

本記事で試した Phi-3-mini-128k-instruct を当社プロダクト ChatStream^🄬 の Playground にホスティングしました！

以下をチャットUIに入力すると、方程式を解いてくれます。

Solve the following simultaneous equations:
3x - y = 3
2x + y = -9

環境

NVIDIA RTX A5000
Python 3.11.4
使用したモデル microsoft/Phi-3-mini-128k-instruct

解かせた問題

以下の連立方程式を解きなさい

\begin{align}
3x - y &= 3\\
2x + y &= -9
\end{align}

入力プロンプト

実際の入力プロンプトは以下のとおりです

messages = [
    {"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
    {"role": "user", "content": "Solve the following simultaneous equations:\n3x - y = 3\n2x + y = -9"},
]

Phi-3-mini 君の答え

Phi-3-mini君は以下を出力してきました。下部に日本語抄訳をつけました

To solve these simultaneous equations, we can use the method of addition (also known as the elimination method). This method involves adding the two equations together in such a way that one of the variables gets eliminated.

Given equations:
1) 3x - y = 3
2) 2x + y = -9

First, let's add the two equations together. When we do this, the y terms will cancel out because -y and +y are additive inverses.

Adding equation 1) and equation 2):
(3x - y) + (2x + y) = 3 + (-9)

Combining like terms:
3x + 2x = 3 - 9
5x = -6

Now, we can solve for x by dividing both sides of the equation by 5:

5x / 5 = -6 / 5
x = -6/5
x = -1.2

Now that we have the value of x, we can substitute it back into either of the original equations to find the value of y. Let's use equation 2) for this purpose:

2x + y = -9
2(-1.2) + y = -9
-2.4 + y = -9

Now, add 2.4 to both sides of the equation to solve for y:

y = -9 + 2.4
y = -6.6

So, the solution to the simultaneous equations is:
x = -1.2
y = -6.6

（抄訳）

この連立方程式を解くために、加法消去法を使用できます。
この方法は、一方の変数を消去するために二つの方程式を足し合わせます。

与えられた方程式：
① 3x - y = 3
② 2x + y = -9

まず、2つの方程式を合わせましょう。これを行うと、yの項が打ち消し合います。
なぜなら、-y と +y は加法逆数だからです。

方程式① と方程式②を加える：
(3x - y) + (2x + y) = 3 + (-9)

同類項を結合する：
3x + 2x = 3 - 9
5x = -6

次に、方程式の両側を5で割ってxを求めます：

5x / 5 = -6 / 5
x = -6/5
x = -1.2

xの値が求まったので、yの値を求めるためにどちらかの元の方程式に代入します。この目的で方程式②を使いましょう：

2x + y = -9
2(-1.2) + y = -9
-2.4 + y = -9

次に、方程式の両側に2.4を加えてyを求めます：

y = -9 + 2.4
y = -6.6

したがって、連立方程式の解は次のようになります：
x = -1.2
y = -6.6

計算結果だけでなく、計算過程の説明も筋が通っていますね！

ソースコード

ソースコードは以下のとおりです。
今回は量子化など行わない素のままのコードです。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from datetime import datetime


def cout_memory_availability(label):
    # 現在時刻を取得して指定されたフォーマットでフォーマットする
    now = datetime.now()
    formatted_now = now.strftime("%Y年%m月%d日%H時%M分%S秒")

    total_memory = torch.cuda.get_device_properties(0).total_memory / (1024 ** 2)  # 総メモリをMB単位で取得
    reserved_memory = torch.cuda.memory_reserved(0) / (1024 ** 2)  # 予約済みメモリをMB単位で取得
    available_memory = total_memory - reserved_memory  # 利用可能なメモリを計算

    print(f"{formatted_now} - {label}: GPUの総メモリ: {total_memory:.2f} MB")
    print(f"{formatted_now} - {label}: 予約済みメモリ: {reserved_memory:.2f} MB")
    print(f"{formatted_now} - {label}: 利用可能なメモリ: {available_memory:.2f} MB")


torch.random.manual_seed(0)

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

cout_memory_availability('モデル読み込み後')

messages = [
    {"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
    {"role": "user", "content": "Solve the following system of equations:\n3x - y = 3\n2x + y = -9"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

cout_memory_availability('推論後')

メモリ消費量

8GB弱ですね。

2024年04月25日23時46分58秒 - 推論後: GPUの総メモリ: 24563.50 MB
2024年04月25日23時46分58秒 - 推論後: 予約済みメモリ: 7760.00 MB
2024年04月25日23時46分58秒 - 推論後: 利用可能なメモリ: 16803.50 MB

4bit 量子化しても頭脳は明晰なままか

では、4bit 量子化して同じ問題を解かせてみましょう

以下は4bit 量子化を含むソースコードです

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
from datetime import datetime


def cout_memory_availability(label):
    # 現在時刻を取得して指定されたフォーマットでフォーマットする
    now = datetime.now()
    formatted_now = now.strftime("%Y年%m月%d日%H時%M分%S秒")

    total_memory = torch.cuda.get_device_properties(0).total_memory / (1024 ** 2)  # 総メモリをMB単位で取得
    reserved_memory = torch.cuda.memory_reserved(0) / (1024 ** 2)  # 予約済みメモリをMB単位で取得
    available_memory = total_memory - reserved_memory  # 利用可能なメモリを計算

    print(f"{formatted_now} - {label}: GPUの総メモリ: {total_memory:.2f} MB")
    print(f"{formatted_now} - {label}: 予約済みメモリ: {reserved_memory:.2f} MB")
    print(f"{formatted_now} - {label}: 利用可能なメモリ: {available_memory:.2f} MB")


torch.random.manual_seed(0)

# 4ビット量子化の設定を作成
quantization_config = BitsAndBytesConfig(load_in_4bit=True)

# 4ビット量子化して読み込む
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-128k-instruct",
                                             quantization_config=quantization_config,
                                             device_map="auto",
                                             trust_remote_code=True,
                                             )

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

cout_memory_availability('モデル読み込み後')

messages = [
    {"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
    {"role": "user", "content": "Solve the following simultaneous equations:\n3x - y = 3\n2x + y = -9"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

cout_memory_availability('推論後')

4bit量子化したときの出力

以下のように出力され、こちらも計算過程の筋立ての説明、計算結果ともにあっていました。

Alright, to solve these simultaneous equations, we can use the method of addition (also known as the elimination method). 

First, let's look at the two equations:

1) 3x - y = 3
2) 2x + y = -9

If we add these two equations together, the y terms will cancel each other out because -y and +y are additive inverses. 

So, let's add them:

(3x - y) + (2x + y) = 3 + (-9)

This simplifies to:

5x = -6

Now, to solve for x, we divide both sides of the equation by 5:

x = -6 / 5
x = -1.2

Now that we have the value of x, we can substitute it into the second equation to solve for y:

2(-1.2) + y = -9
-2.4 + y = -9

Adding 2.4 to both sides of the equation to isolate y gives us:

y = -9 + 2.4
y = -6.6

4bit量子化したときのメモリ消費量

GPUメモリ消費は3GB弱です。

4bitでも答えをだすまで15秒程度なので、私よりも連立方程式を解くの早いかもしれません^^;

2024年04月25日23時58分10秒 - モデル読み込み後: GPUの総メモリ: 24563.50 MB
2024年04月25日23時58分10秒 - モデル読み込み後: 予約済みメモリ: 2358.00 MB
2024年04月25日23時58分10秒 - モデル読み込み後: 利用可能なメモリ: 22205.50 MB

2024年04月25日23時58分25秒 - 推論後: GPUの総メモリ: 24563.50 MB
2024年04月25日23時58分25秒 - 推論後: 予約済みメモリ: 2842.00 MB
2024年04月25日23時58分25秒 - 推論後: 利用可能なメモリ: 21721.50 MB

まとめ

注目のLLM Phi-3-mini は、4bit量子化しても実行でき、3GB 程度のメモリ消費であっさり連立一次方程式を解いてくれました
小型版 Llama3 超えとの声もありましたが、近ごろ、とくに小型LLM (SLM)の躍進を感じます
もっと難解そうなタスクもいろいろ試してみたいとおもいます

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

話題の小型LLM 「Phi-3-mini」 に中二レベルの連立方程式を解かせたら、ちゃんと解いてきた

概要

デモ

環境