0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

NVIDIA Multi-Instance GPU (MIG) 設定例

Posted at

watsonx.ai では、NVIDIA Multi-Instance GPU(MIG)に対応している基盤モデルを、MIG設定済みのGPUにデプロイすることができます。(1つのGPUに複数の小型モデルをインストールし、GPUリソースをより効率的に使用することができます。)
MIG設定とその解除例をご紹介します。

最初の状態 (MIG設定なし)
$ oc exec -n nvidia-gpu-operator -it nvidia-driver-daemonset-416.94.202412100237-0-fst8j -- nvidia-smi

Thu Jun  5 05:35:22 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 NVL                On  |   00000000:08:00.0 Off |                    0 |
| N/A   28C    P0             57W /  400W |       1MiB /  95830MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

MIG設定

  1. MIGアドバタイズ戦略を single に設定します。
    ホスト名、ストラテジー、コンフィギュレーションラベルを環境変数で指定します。

    • STRATEGY: single

      IBM® Software Hub バージョン 5.1.0では、 NVIDIA MIGシングルストラテジーを検証し、サポートを追加しました。 シングル・ストラテジーでは、単一のGPUで固定パーティション・サイズを使用できます。

    • MIG_CONFIGURATIONは、こちらのリンク先から適切なもの all-3g.47gb を選択しました。
    NODE_NAME=mynode
    STRATEGY=single
    MIG_CONFIGURATION=all-3g.47gb
    
  2. 希望のMIG分割プロファイルを適用します。

    oc label node/${NODE_NAME} nvidia.com/mig.config=${MIG_CONFIGURATION} --overwrite
    

MIG設定 (all-3g.47gb) の確認

$ oc get node/${NODE_NAME} -o json | jq '.metadata.labels'| grep mig
  "nvidia.com/gpu.deploy.mig-manager": "true",
  "nvidia.com/mig.config": "all-3g.47gb",
  "nvidia.com/mig.config.state": "success"

$ oc exec -n nvidia-gpu-operator -it nvidia-driver-daemonset-416.94.202412100237-0-fst8j -- nvidia-smi -L
GPU 0: NVIDIA H100 NVL (UUID: GPU-f279a52b-e802-c262-e776-a4197f09a5f7)
  MIG 3g.47gb     Device  0: (UUID: MIG-16600d9a-ed18-5d1b-915c-472019110d02)
  MIG 3g.47gb     Device  1: (UUID: MIG-4fedd3e9-d1bc-582e-b01f-bc24dd98e15c)

$ oc exec -n nvidia-gpu-operator -it nvidia-driver-daemonset-416.94.202412100237-0-fst8j -- nvidia-smi
Fri Jun  6 07:08:40 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 NVL                On  |   00000000:08:00.0 Off |                   On |
| N/A   28C    P0             57W /  400W |      76MiB /  95830MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices:                                                                            |
+------------------+----------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                  |        ECC|                       |
|==================+==================================+===========+=======================|
|  0    1   0   0  |              38MiB / 47488MiB    | 60      0 |  3   0    3    0    3 |
|                  |                 0MiB / 65535MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  0    2   0   1  |              38MiB / 47488MiB    | 60      0 |  3   0    3    0    3 |
|                  |                 0MiB / 65535MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

MIG設定の解除

MIG_CONFIGURATION=all-disabled && \
  oc label node/$NODE_NAME nvidia.com/mig.config=$MIG_CONFIGURATION --overwrite
MIG設定解除後
$ oc exec -n nvidia-gpu-operator -it nvidia-driver-daemonset-416.94.202412100237-0-fst8j -- nvidia-smi	
Fri Jun  6 08:39:20 2025	
+-----------------------------------------------------------------------------------------+	
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |	
|-----------------------------------------+------------------------+----------------------+	
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |	
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |	
|                                         |                        |               MIG M. |	
|=========================================+========================+======================|	
|   0  NVIDIA H100 NVL                On  |   00000000:08:00.0 Off |                    0 |	
| N/A   28C    P0             60W /  400W |       1MiB /  95830MiB |      0%      Default |	
|                                         |                        |             Disabled |	
+-----------------------------------------+------------------------+----------------------+	
	
+-----------------------------------------------------------------------------------------+	
| Processes:                                                                              |	
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |	
|        ID   ID                                                               Usage      |	
|=========================================================================================|	
|  No running processes found                                                             |	
+-----------------------------------------------------------------------------------------+	

環境

  • GPU: NVIDIA H100 NVL 94GB
  • OCP 4.16
  • NVIDIA GPU Operator 24.6.2
  • IBM Software Hub 5.1.3
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?