Ubuntu 22.04.1 LTSの環境でトライ
1.GPUドライバのインストール
NVIDIAのドキュメント通りにインストールを行う。
https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html
$ sudo apt-get install linux-headers-$(uname -r)
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
$ wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
$ sudo dpkg -i cuda-keyring_1.0-1_all.deb
$ sudo apt-get update
$ sudo apt-get -y install cuda-drivers
インストールが終わったら再起動を行う。
再起動後GPUが認識できていることを確認。
$ nvidia-smi
Sat Mar 4 14:35:24 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-PCIE-32GB On | 00000000:00:04.0 Off | 0 |
| N/A 29C P0 27W / 250W| 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
2.pipのインストール
$ sudo apt install python3-pip
$ python3 -m pip install --upgrade pip
3.FlexGenの用意
$ git clone https://github.com/FMInference/FlexGen.git
$ cd FlexGen
$ git checkout 9d888e5e3e6d78d6d4e1fdda7c8af508b889aeae
$ pip install -e .
4.FlexGenの実行
初回実行時にモデルのダウンロードが実行される。
$ python3 flexgen/apps/chatbot.py --model facebook/opt-6.7b
Initialize...
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 685/685 [00:00<00:00, 384kB/s]
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 651/651 [00:00<00:00, 327kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 942kB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 479kB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 221/221 [00:00<00:00, 99.8kB/s]
Load the pre-trained pytorch weights of opt-6.7b from huggingface. The downloading and cpu loading can take dozens of minutes. If it seems to get stuck, you can monitor the progress by checking the memory usage of this process.
Downloading (…)00002-of-00002.bin";: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.36G/3.36G [00:54<00:00, 61.1MB/s]
Downloading (…)00001-of-00002.bin";: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.96G/9.96G [01:41<00:00, 98.5MB/s]
Fetching 2 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:42<00:00, 51.30s/it]
Convert format: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:41<00:00, 20.80s/it]
A chat between a curious human and a knowledgeable artificial intelligence assistant.
Human: Hello! What can you do?
Assistant: As an AI assistant, I can answer questions and chat with you.
Human: What is the name of the tallest mountain in the world?
Assistant: Everest.
Human:
名前を聞いてみる、名前はないらしい。
$ python3 flexgen/apps/chatbot.py --model facebook/opt-6.7b
Initialize...
A chat between a curious human and a knowledgeable artificial intelligence assistant.
Human: Hello! What can you do?
Assistant: As an AI assistant, I can answer questions and chat with you.
Human: What is the name of the tallest mountain in the world?
Assistant: Everest.
Human: Hello
Assistant: Hi, what can I do for you?
Human: I would like to talk with you.
Assistant: You can talk with me.
Human: What is your name?
Assistant: As an AI assistant, I have no name.
Human:
Ansibleの特徴を聞いてみる、あっていそう。
$ python3 flexgen/apps/chatbot.py --model facebook/opt-6.7b
Initialize...
A chat between a curious human and a knowledgeable artificial intelligence assistant.
Human: Hello! What can you do?
Assistant: As an AI assistant, I can answer questions and chat with you.
Human: What is the name of the tallest mountain in the world?
Assistant: Everest.
Human: What are the features of Ansible?
Assistant: Ansible is a configuration management software. Here are some of the features.
Human:
とりあえず動くことを確認できたので他のモデルを使ってみたり、いろいろ活用を考えてみたいと思います。
追記
opt-66b
GPUのメモリ不足で動かない。
$ python3 flexgen/apps/chatbot.py --model facebook/opt-66b
Initialize...
Traceback (most recent call last):
File "/home/ubuntu/FlexGen/flexgen/apps/chatbot.py", line 106, in <module>
main(args)
File "/home/ubuntu/FlexGen/flexgen/apps/chatbot.py", line 35, in main
model = OptLM(args.model, env, args.path, policy)
File "/home/ubuntu/FlexGen/flexgen/flex_opt.py", line 637, in __init__
self.init_all_weights()
File "/home/ubuntu/FlexGen/flexgen/flex_opt.py", line 799, in init_all_weights
self.init_weight(j)
File "/home/ubuntu/FlexGen/flexgen/flex_opt.py", line 651, in init_weight
self.layers[j].init_weight(self.weight_home[j], expanded_path)
File "/home/ubuntu/FlexGen/flexgen/flex_opt.py", line 306, in init_weight
weights = init_weight_list(weight_specs, self.policy, self.env)
File "/home/ubuntu/FlexGen/flexgen/flex_opt.py", line 112, in init_weight_list
weight = home.allocate(shape, dtype, pin_memory=pin_memory)
File "/home/ubuntu/FlexGen/flexgen/pytorch_backend.py", line 190, in allocate
data = torch.empty(shape, dtype=dtype, pin_memory=pin_memory, device=self.dev)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 162.00 MiB (GPU 0; 31.74 GiB total capacity; 31.28 GiB already allocated; 17.12 MiB free; 31.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
opt-13b
ちょっと時間はかかるが動いた。
$ python3 flexgen/apps/chatbot.py --model facebook/opt-13b
Initialize...
A chat between a curious human and a knowledgeable artificial intelligence assistant.
Human: Hello! What can you do?
Assistant: As an AI assistant, I can answer questions and chat with you.
Human: What is the name of the tallest mountain in the world?
Assistant: Everest.
Human: