More than 1 year has passed since last update.

FlexGenをNVIDIA Tesla V100で動かしてみる

Last updated at 2023-03-05Posted at 2023-03-04

Ubuntu 22.04.1 LTSの環境でトライ

1.GPUドライバのインストール

NVIDIAのドキュメント通りにインストールを行う。
https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html

$ sudo apt-get install linux-headers-$(uname -r)
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
$ wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
$ sudo dpkg -i cuda-keyring_1.0-1_all.deb
$ sudo apt-get update
$ sudo apt-get -y install cuda-drivers

インストールが終わったら再起動を行う。

再起動後GPUが認識できていることを確認。

$ nvidia-smi
Sat Mar  4 14:35:24 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-PCIE-32GB            On | 00000000:00:04.0 Off |                    0 |
| N/A   29C    P0               27W / 250W|      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

2.pipのインストール

$ sudo apt install python3-pip
$ python3 -m pip install --upgrade pip

3.FlexGenの用意

$ git clone https://github.com/FMInference/FlexGen.git
$ cd FlexGen
$ git checkout 9d888e5e3e6d78d6d4e1fdda7c8af508b889aeae
$ pip install -e .

4.FlexGenの実行

初回実行時にモデルのダウンロードが実行される。

$ python3 flexgen/apps/chatbot.py --model facebook/opt-6.7b
Initialize...
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 685/685 [00:00<00:00, 384kB/s]
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 651/651 [00:00<00:00, 327kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 942kB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 479kB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 221/221 [00:00<00:00, 99.8kB/s]
Load the pre-trained pytorch weights of opt-6.7b from huggingface. The downloading and cpu loading can take dozens of minutes. If it seems to get stuck, you can monitor the progress by checking the memory usage of this process.
Downloading (…)00002-of-00002.bin";: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.36G/3.36G [00:54<00:00, 61.1MB/s]
Downloading (…)00001-of-00002.bin";: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.96G/9.96G [01:41<00:00, 98.5MB/s]
Fetching 2 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:42<00:00, 51.30s/it]
Convert format: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:41<00:00, 20.80s/it]
A chat between a curious human and a knowledgeable artificial intelligence assistant.
Human: Hello! What can you do?
Assistant: As an AI assistant, I can answer questions and chat with you.
Human: What is the name of the tallest mountain in the world?
Assistant: Everest.
Human:

名前を聞いてみる、名前はないらしい。

$ python3 flexgen/apps/chatbot.py --model facebook/opt-6.7b
Initialize...
A chat between a curious human and a knowledgeable artificial intelligence assistant.
Human: Hello! What can you do?
Assistant: As an AI assistant, I can answer questions and chat with you.
Human: What is the name of the tallest mountain in the world?
Assistant: Everest.
Human: Hello
Assistant: Hi, what can I do for you?
Human: I would like to talk with you.
Assistant: You can talk with me.
Human: What is your name?
Assistant: As an AI assistant, I have no name.
Human:

Ansibleの特徴を聞いてみる、あっていそう。

$ python3 flexgen/apps/chatbot.py --model facebook/opt-6.7b
Initialize...
A chat between a curious human and a knowledgeable artificial intelligence assistant.
Human: Hello! What can you do?
Assistant: As an AI assistant, I can answer questions and chat with you.
Human: What is the name of the tallest mountain in the world?
Assistant: Everest.
Human: What are the features of Ansible?
Assistant: Ansible is a configuration management software. Here are some of the features.
Human:

とりあえず動くことを確認できたので他のモデルを使ってみたり、いろいろ活用を考えてみたいと思います。

追記

opt-66b

GPUのメモリ不足で動かない。

$ python3 flexgen/apps/chatbot.py --model facebook/opt-66b
Initialize...
Traceback (most recent call last):
  File "/home/ubuntu/FlexGen/flexgen/apps/chatbot.py", line 106, in <module>
    main(args)
  File "/home/ubuntu/FlexGen/flexgen/apps/chatbot.py", line 35, in main
    model = OptLM(args.model, env, args.path, policy)
  File "/home/ubuntu/FlexGen/flexgen/flex_opt.py", line 637, in __init__
    self.init_all_weights()
  File "/home/ubuntu/FlexGen/flexgen/flex_opt.py", line 799, in init_all_weights
    self.init_weight(j)
  File "/home/ubuntu/FlexGen/flexgen/flex_opt.py", line 651, in init_weight
    self.layers[j].init_weight(self.weight_home[j], expanded_path)
  File "/home/ubuntu/FlexGen/flexgen/flex_opt.py", line 306, in init_weight
    weights = init_weight_list(weight_specs, self.policy, self.env)
  File "/home/ubuntu/FlexGen/flexgen/flex_opt.py", line 112, in init_weight_list
    weight = home.allocate(shape, dtype, pin_memory=pin_memory)
  File "/home/ubuntu/FlexGen/flexgen/pytorch_backend.py", line 190, in allocate
    data = torch.empty(shape, dtype=dtype, pin_memory=pin_memory, device=self.dev)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 162.00 MiB (GPU 0; 31.74 GiB total capacity; 31.28 GiB already allocated; 17.12 MiB free; 31.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

opt-13b

ちょっと時間はかかるが動いた。

$ python3 flexgen/apps/chatbot.py --model facebook/opt-13b
Initialize...
A chat between a curious human and a knowledgeable artificial intelligence assistant.
Human: Hello! What can you do?
Assistant: As an AI assistant, I can answer questions and chat with you.
Human: What is the name of the tallest mountain in the world?
Assistant: Everest.
Human:

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up