More than 1 year has passed since last update.

internlm-chat-7b をCPUで動かしてみるその２

Posted at 2023-08-24

V1.1が公開されたので推論速度のテストを行う

まとめ

無印バージョンでも推論速度が上がっておりCPUでも動作が現実的になっていた
モデル読み込み時RAMピークが30GB近くになるためそれ以上のメモリが必要
何度か日本語で1から10の足し合わせのコードを書かせてみたがV1.1では1から10を出力するコードになってしまったため、V1.1では日本語での処理が少し苦手になっていそうだった

推論テスト

推論用コードV1.1

import torch
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-chat-7b-V1_1", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("internlm/internlm-chat-7b-V1_1", trust_remote_code=True).cpu()
model = model.eval()
model = model.to("cpu")
history = []

while True:
    txt = input("msg:")
    start_time = time.perf_counter()
    response, history = model.chat(tokenizer, txt, history=history)
    print(response)
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"worktime:{elapsed_time}")

推論結果

meg:おはよう
こんにちは

meg:hello
Hello! How can I assist you today?

meg:你好
您好！有什么可以帮您的吗？

meg:pythonで1から10まで足し合わせるコードを書いて
以下は、Pythonで1から10までの整数を足し合わせるコードを書いたものです。

```python
for i in range(1, 11):
    print(i)
```

このコードは、for文を使って1から10までの整数をループで表示します。print文は、1から10までの整数を 出力します。

meg:Write a code that adds up from 1 to 10 in python
Here's a simple code snippet that adds up the numbers from 1 to 10 in Python:

```python
total = 0
for i in range(1, 11):
    total += i
print(total)
```

This code initializes a variable `total` to 0, then uses a `for` loop to iterate over the numbers from 1 to 10 (inclusive) and adds each number to `total`. Finally, it prints the value of `total`, which should be 55 (1 + 2 + 3 +... + 10).

推論速度テスト

推論速度テスト無印用コード

import torch
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-chat-7b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("internlm/internlm-chat-7b", trust_remote_code=True).cpu()
model = model.eval()
model = model.to("cpu")

testdata = ["おはよう","hello","你好",
            "pythonで1から10まで足し合わせるコードを書いて",
            "Write a code that adds up from 1 to 10 in python"]

for txt in testdata:
    start_time = time.perf_counter()
    response, history = model.chat(tokenizer, txt, history=[])
    print(f"in:{txt}\nout:{response}")
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"worktime:{elapsed_time}")

推論速度テストV1.1用コード

import torch
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-chat-7b-V1_1", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("internlm/internlm-chat-7b-V1_1", trust_remote_code=True).cpu()
model = model.eval()
model = model.to("cpu")

testdata = ["おはよう","hello","你好",
            "pythonで1から10まで足し合わせるコードを書いて",
            "Write a code that adds up from 1 to 10 in python"]

for txt in testdata:
    start_time = time.perf_counter()
    response, history = model.chat(tokenizer, txt, history=[])
    print(f"in:{txt}\nout:{response}")
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"worktime:{elapsed_time}")

テスト環境A
CPU:E5-2623V3 * 2
Cinebench R23:4700pts
RAM:104GB

テスト環境B
CPU:i9-12900F
Cinebench R23:15691pts
RAM:128GB

input	tokens	無印-環境A	無印-環境B	V1.1-環境A	V1.1-環境B
おはよう	3	12.17s	5.87s	10.21s	10.39s
hello	1	11.68s	9.26s	13.93s	11.08s
你好	1	10.53s	9.28s	13.90s	11.12s
pythonで1から10まで足し合わせるコードを書いて	16	54.65s	50.05s	96.54s	71.40s
Write a code that adds up from 1 to 10 in python	14	146.55s	94.19s	86.91s	114.46s

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

internlm-chat-7b をCPUで動かしてみる その２

まとめ

推論テスト

推論速度テスト

internlm-chat-7b をCPUで動かしてみるその２