More than 1 year has passed since last update.

internlm-chat-7b をCPUで動かしてみる

Last updated at 2023-08-24Posted at 2023-07-11

追加情報

V1.1が追加されてテストしていたところ推論速度が速くなっていたので
再度測定を行ったので下の記事を参照してください

まとめ

CPUでも動作は可能だかトークンが増えると推論にかかる時間が現実的でない
モデル読み込み時RAMピークが38GB近くになるためそれ以上のメモリが必要
英語と中国語で学習しているモデルだが日本語でも簡単なpythonのコードを書くことができた

推論テスト

推論用コード

import torch
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-chat-7b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("internlm/internlm-chat-7b", trust_remote_code=True).cpu()
model = model.eval()
model = model.to("cpu")
history = []

while True:
    txt = input("msg:")
    start_time = time.perf_counter()
    response, history = model.chat(tokenizer, txt, history=history)
    print(response)
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"worktime:{elapsed_time}")

推論結果

msg:おはよう
はい、何事が欲しいですか？

msg:hello
Hello! How can I assist you today?

msg:你好
你好，有什么我可以帮助你的吗？

msg:pythonで1から10まで足し合わせるコードを書いて
以下は、Pythonで1から10までの和を計算するコードです。

```python
sum = 0
for i in range(1, 11):
    sum += i
print(sum)
```

msg:Write a code that adds up from 1 to 10 in python
Sure, here's a simple code snippet that adds up the numbers from 1 to 10 in Python:

```
sum = 0
for i in range(1, 11):
    sum += i
print(sum)
```

This code initializes a variable `sum` to 0, then uses a `for` loop to iterate through the numbers 1 to 10 (inclusive) and add each number to the `sum` variable. Finally, it prints the value of `sum`, which should be 55 (1 + 2 + 3 +... + 10).

推論速度テスト

推論速度テスト用コード

import torch
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-chat-7b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("internlm/internlm-chat-7b", trust_remote_code=True).cpu()
model = model.eval()
model = model.to("cpu")

testdata = ["おはよう","hello","你好",
            "pythonで1から10まで足し合わせるコードを書いて",
            "Write a code that adds up from 1 to 10 in python"]

for txt in testdata:
    start_time = time.perf_counter()
    response, history = model.chat(tokenizer, txt, history=[])
    print(f"in:{txt}\nout:{response}")
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"worktime:{elapsed_time}")

テスト環境A
CPU:E5-2623V3 * 2
Cinebench R23:4700pts
RAM:104GB

テスト環境B
CPU:i9-12900F
Cinebench R23:15691pts
RAM:128GB

input	tokens	環境A	環境B
おはよう	3	14.90s	9.75s
hello	1	14.09s	11.43s
你好	1	13.07s	10.38s
pythonで1から10まで足し合わせるコードを書いて	16	1359.67s	1041.29s
Write a code that adds up from 1 to 10 in python	14	1289.46s	1029.31s

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up