GPUインスタンスをRDMAクラスタとしてOCIで構成する

Last updated at 2025-07-20Posted at 2025-07-09

はじめに

本記事では、OCI（Oracle Cloud Infrastructure）を活用して、RDMA（Remote Direct Memory Access）対応のGPUクラスタを簡単に構築する方法を紹介します。機械学習やHPCの分野では、複数GPUノード間での高速通信が必要になる場面が多く、RDMAによる低遅延・高スループットな通信は非常に重要です。

本手順は、OCI Marketplaceで提供されているHPC and GPU Clusterスタックを用いて、GPUクラスタのデプロイから動作確認、モニタリング、削除までを一貫して実施できる内容となっています。GPUクラスタ構築が初めての方でも、ステップバイステップで実施できるよう丁寧に説明しています。

アーキテクチャ

アーキテクチャ図

この環境で何ができるか？

本記事の手順で構成した後、以下のようなことを簡単に行うことができます。

RDMA対応のマルチノードGPUクラスタを簡単に構築
- A100/H100などの最新GPUを含むクラスタ構成に対応
クラスタノード間でのNCCL通信の疎通確認
- Slurmジョブスケジューラによるジョブ投入・制御
- MPI/NCCLベースの分散ジョブを簡単に投入・実行可能
Docker + NVIDIA Container Toolkitを用いたGPU環境の活用
- コンテナベースでCUDA環境をすぐにテスト・開発に利用
GrafanaによるGPUリソースの可視化
- GPU温度、使用率、消費電力などのリアルタイムモニタリング
インフラ構築・削除の自動化
- OCI Resource Manager（Terraformのマネージドサービス）を用いたクラスタ展開と削除がボタン操作で完結

本記事の対象範囲

本手順書の対象範囲となります。

対象範囲内

GPUクラスタのデプロイ
GPUクラスタの動作確認
モニタリングへのアクセス
クラスタの削除

対象範囲外

アプリケーションが機械学習コードのデプロイ
デプロイした環境に対する運用設計に関すること

手順

手順を実行するにあたり必要なもの

OCIのアカウント(Administrator権限、またはそれに相当するもの)
RDMAに対応したGPU ShapeのService Limit
- 以下表のLimitが16以上(2ノード以上を想定)

Shape	GPU Type	# of GPUs	Service Limit Name
BM.GPU4.8	A100 (40GB Mem)	8	gpu4-count
BM.GPU.A100-v2.8	A100 (80GB Mem)	8	gpu-a100-v2-count
BM.GPU.H100.8	H100	8	gpu-h100-count
BM.GPU.H200.8	H200	8	gpu-h200-count
BM.GPU.B200.8	B200	8	gpu-b200-count

事前設定

ポリシー

以下のポリシーを追加してください。この後のデプロイ手順を実施するのに必須な設定となります。

allow service compute_management to use tag-namespace in tenancy
allow service compute_management to manage compute-management-family in tenancy
allow service compute_management to read app-catalog-listing in tenancy

デプロイ

OCIコンソールにログインし、メニューからMarketplaceのAll Applicationsを選択。

検索ボックスにてHPCを検索。HPC and GPU Clusterを選択。

Versionでv2.11.1.1を選択し、Compartmentでデプロイするコンパートメントを選択肢、Launch Stackを選択。

Nextを選択。

各項目を選択していきます。ここでは赤字は必須で、青字は任意となります。
ここではシンプルなクラスタのデプロイを例とし以下の画像でデプロイをします。

①: SSHログイン時に必要なSSH Publicキーを設定
②: カスタムのクラスタ名を指定する(デフォルトではランダムな文字列)
③: クラスタ名のプリフィックスの指定
④: ADの選択(Service Limitが付与されているものを選択)
⑤: 必要に応じて
⑥: 必要に応じて
⑦: ADの選択(Service Limitが付与されているものを選択)
⑧: 対象のGPU Shapeを選択
⑨: デプロイするノード数を指定(Service Limitの範囲内)
10: GPUインスタンスで利用するOSイメージの選択
11: Grafanaのモニタリングを有効にします
12: NFS Filesystemの追加
13: File Storage Serviceの利用を選択
14: File Storage Serviceを/homeにも適用する
15: LA featureなので外す
上記を設定した上で、画面下部にあるNextを選択します。

Nextを選択する

設定した値によるデプロイが開始される。

ノード数にもよりますが、1時間から2時間で完了します。

SSHでBastionにログインするためのIPを確認します。

動作確認

ログイン確認

BastionへSSHログインします。

(base) kazuito@kazuito-mac ~ % ssh opc@<Public IP>
The authenticity of host '<Public IP> (<Public IP>)' can't be established.
ED25519 key fingerprint is SHA256:arFFpD70Me++Pb7JSXDnc3RVqb0oMvxPpnEFQWFlzqo.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '<Public IP>' (ED25519) to the list of known hosts.
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Tue Jul  8 14:31:47 2025 from 172.16.0.41
[opc@gpu-cluster-controller ~]$

SlurmのsinfoコマンドでGPUノードのホスト名を確認します。

[opc@gpu-cluster-controller ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
compute*     up   infinite      2   idle GPU-[88,937]

それぞれのノードにsshログインしてみます。

[opc@gpu-cluster-controller ~]$ ssh gpu-88
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Tue Jul  8 14:32:21 2025 from 172.16.0.41
[opc@GPU-88 ~]$
[opc@GPU-88 ~]$
[opc@GPU-88 ~]$ exit
logout
[opc@gpu-cluster-controller ~]$ ssh gpu-937
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Tue Jul  8 14:32:21 2025 from 172.16.0.41
[opc@GPU-937 ~]$
[opc@GPU-937 ~]$
[opc@GPU-937 ~]$ exit
logout
[opc@gpu-cluster-controller ~]$

共有ファイルシステムのマウント確認

Bastionでdf -hコマンドでFile Storage Serviceのマウント状況を確認

[opc@gpu-cluster-controller ~]$ df -h
Filesystem                       Size  Used Avail Use% Mounted on
devtmpfs                          32G     0   32G   0% /dev
tmpfs                             32G     0   32G   0% /dev/shm
tmpfs                             32G   33M   32G   1% /run
tmpfs                             32G     0   32G   0% /sys/fs/cgroup
/dev/mapper/ocivolume-root      1013G   27G  987G   3% /
/dev/sda2                       1014M  440M  575M  44% /boot
/dev/sda1                        100M  6.0M   94M   6% /boot/efi
/dev/mapper/ocivolume-oled        10G  190M  9.9G   2% /var/oled
fss-GPU.gpu-cluster.local:/home  8.0E     0  8.0E   0% /home
172.16.0.41:/export/cluster     1013G   27G  987G   3% /nfs/cluster
fss-GPU.gpu-cluster.local:/fss   8.0E     0  8.0E   0% /fss
tmpfs                            6.3G     0  6.3G   0% /run/user/1000

GPUノード側も確認してみましょう。

[opc@gpu-cluster-controller ~]$ ssh gpu-88
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Tue Jul  8 14:32:21 2025 from 172.16.0.41
[opc@GPU-88 ~]$
[opc@GPU-88 ~]$
[opc@GPU-88 ~]$ df -h
Filesystem                       Size  Used Avail Use% Mounted on
devtmpfs                        1008G     0 1008G   0% /dev
tmpfs                           1008G     0 1008G   0% /dev/shm
tmpfs                           1008G   44M 1008G   1% /run
tmpfs                           1008G     0 1008G   0% /sys/fs/cgroup
/dev/mapper/ocivolume-root       245G   33G  213G  14% /
/dev/sda2                       1014M  451M  564M  45% /boot
/dev/mapper/ocivolume-oled        10G  535M  9.5G   6% /var/oled
/dev/sda1                        100M  6.0M   94M   6% /boot/efi
tmpfs                            202G     0  202G   0% /run/user/0
/dev/md0                          13T   89G   13T   1% /mnt/localdisk
tmpfs                            202G     0  202G   0% /run/user/986
fss-GPU.gpu-cluster.local:/home  8.0E     0  8.0E   0% /home
172.16.0.41:/export/cluster     1013G   27G  987G   3% /nfs/cluster
fss-GPU.gpu-cluster.local:/fss   8.0E     0  8.0E   0% /fss
tmpfs                            202G     0  202G   0% /run/user/1000
[opc@GPU-88 ~]$ exit
logout
[opc@gpu-cluster-controller ~]$ ssh gpu-937
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Tue Jul  8 14:32:21 2025 from 172.16.0.41
[opc@GPU-937 ~]$
[opc@GPU-937 ~]$
[opc@GPU-937 ~]$ df -h
Filesystem                       Size  Used Avail Use% Mounted on
devtmpfs                        1008G     0 1008G   0% /dev
tmpfs                           1008G     0 1008G   0% /dev/shm
tmpfs                           1008G   44M 1008G   1% /run
tmpfs                           1008G     0 1008G   0% /sys/fs/cgroup
/dev/mapper/ocivolume-root       245G   33G  213G  14% /
/dev/sda2                       1014M  436M  579M  43% /boot
/dev/mapper/ocivolume-oled        10G  516M  9.5G   6% /var/oled
/dev/sda1                        100M  6.0M   94M   6% /boot/efi
tmpfs                            202G     0  202G   0% /run/user/0
/dev/md0                          13T   89G   13T   1% /mnt/localdisk
tmpfs                            202G     0  202G   0% /run/user/986
fss-GPU.gpu-cluster.local:/home  8.0E     0  8.0E   0% /home
172.16.0.41:/export/cluster     1013G   27G  987G   3% /nfs/cluster
fss-GPU.gpu-cluster.local:/fss   8.0E     0  8.0E   0% /fss
tmpfs                            202G     0  202G   0% /run/user/1000
[opc@GPU-937 ~]$ exit
logout
[opc@gpu-cluster-controller ~]$

GPUの認識・動作確認

GPUの動作確認として、nvidia-smiコマンドの実行。

[opc@GPU-937 ~]$ nvidia-smi
Tue Jul  8 14:44:00 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  |   00000000:0F:00.0 Off |                    0 |
| N/A   42C    P0            102W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          On  |   00000000:15:00.0 Off |                    0 |
| N/A   39C    P0            103W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          On  |   00000000:51:00.0 Off |                    0 |
| N/A   38C    P0            107W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          On  |   00000000:54:00.0 Off |                    0 |
| N/A   43C    P0            112W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A100-SXM4-40GB          On  |   00000000:8D:00.0 Off |                    0 |
| N/A   42C    P0            110W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A100-SXM4-40GB          On  |   00000000:92:00.0 Off |                    0 |
| N/A   40C    P0            108W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A100-SXM4-40GB          On  |   00000000:D6:00.0 Off |                    0 |
| N/A   39C    P0            107W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A100-SXM4-40GB          On  |   00000000:DA:00.0 Off |                    0 |
| N/A   42C    P0            108W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

NVIDIA Container Toolkitがインストール済みなのでDockerからも利用が可能です。

[opc@GPU-937 ~]$ docker run --rm  --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Unable to find image 'nvidia/cuda:11.6.2-base-ubuntu20.04' locally
11.6.2-base-ubuntu20.04: Pulling from nvidia/cuda
96d54c3075c9: Pull complete
a3d20efe6db8: Pull complete
bfdf8ce43b67: Pull complete
ad14f66bfcf9: Pull complete
1056ff735c59: Pull complete
Digest: sha256:a0dd581afdbf82ea9887dd077aebf9723aba58b51ae89acb4c58b8705b74179b
Status: Downloaded newer image for nvidia/cuda:11.6.2-base-ubuntu20.04
Tue Jul  8 14:45:21 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  |   00000000:0F:00.0 Off |                    0 |
| N/A   42C    P0            103W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          On  |   00000000:15:00.0 Off |                    0 |
| N/A   39C    P0            103W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          On  |   00000000:51:00.0 Off |                    0 |
| N/A   38C    P0            107W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          On  |   00000000:54:00.0 Off |                    0 |
| N/A   43C    P0            112W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A100-SXM4-40GB          On  |   00000000:8D:00.0 Off |                    0 |
| N/A   42C    P0            110W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A100-SXM4-40GB          On  |   00000000:92:00.0 Off |                    0 |
| N/A   40C    P0            109W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A100-SXM4-40GB          On  |   00000000:D6:00.0 Off |                    0 |
| N/A   39C    P0            107W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A100-SXM4-40GB          On  |   00000000:DA:00.0 Off |                    0 |
| N/A   42C    P0            108W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
[opc@GPU-937 ~]$

RoCEv2の動作確認

インターコネクトの動作確認を行います。

Bastionでスクリプトの実行に必要なファイルをコピーします。

[opc@gpu-cluster-controller ~]$ cp /opt/oci-hpc/samples/gpu/nccl_run_allreduce.sh .
[opc@gpu-cluster-controller ~]$ cp /opt/oci-hpc/playbooks/roles/rack-aware/files/node_ordering_by_rack.py .
[opc@gpu-cluster-controller ~]$ cat << EOS > hostfile
GPU-88
GPU-937
EOS

どちらかのGPUノードにログインしてスクリプトを実行します。

[opc@gpu-cluster-controller ~]$ ssh gpu-937
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Tue Jul  8 14:48:35 2025 from 172.16.0.41
[opc@GPU-937 ~]$
[opc@GPU-937 ~]$ vi hostfile
[opc@GPU-937 ~]$
[opc@GPU-937 ~]$ sh ./nccl_run_allreduce.sh 1 ./hostfile
INPUTFILE
GPU-88
GPU-937
ORDEREDMACHINEFILE
GPU-88
GPU-937
ORDEREDRANKMACHINEFILE
rank 0=GPU-88 slot=0
rank 1=GPU-88 slot=1
rank 2=GPU-88 slot=2
rank 3=GPU-88 slot=3
rank 4=GPU-88 slot=4
rank 5=GPU-88 slot=5
rank 6=GPU-88 slot=6
rank 7=GPU-88 slot=7
rank 8=GPU-937 slot=0
rank 9=GPU-937 slot=1
rank 10=GPU-937 slot=2
rank 11=GPU-937 slot=3
rank 12=GPU-937 slot=4
rank 13=GPU-937 slot=5
rank 14=GPU-937 slot=6
rank 15=GPU-937 slot=7
1
1
Tue Jul  8 14:51:23 GMT 2025
# nThread 1 nGpus 1 minBytes 1073741824 maxBytes 10737418240 step: 9663676416(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid 239240 on     GPU-88 device  0 [0x0f] NVIDIA A100-SXM4-40GB
#  Rank  1 Group  0 Pid 239241 on     GPU-88 device  1 [0x15] NVIDIA A100-SXM4-40GB
#  Rank  2 Group  0 Pid 239242 on     GPU-88 device  2 [0x51] NVIDIA A100-SXM4-40GB
#  Rank  3 Group  0 Pid 239243 on     GPU-88 device  3 [0x54] NVIDIA A100-SXM4-40GB
#  Rank  4 Group  0 Pid 239244 on     GPU-88 device  4 [0x8d] NVIDIA A100-SXM4-40GB
#  Rank  5 Group  0 Pid 239245 on     GPU-88 device  5 [0x92] NVIDIA A100-SXM4-40GB
#  Rank  6 Group  0 Pid 239246 on     GPU-88 device  6 [0xd6] NVIDIA A100-SXM4-40GB
#  Rank  7 Group  0 Pid 239247 on     GPU-88 device  7 [0xda] NVIDIA A100-SXM4-40GB
#  Rank  8 Group  0 Pid 233561 on    GPU-937 device  0 [0x0f] NVIDIA A100-SXM4-40GB
#  Rank  9 Group  0 Pid 233565 on    GPU-937 device  1 [0x15] NVIDIA A100-SXM4-40GB
#  Rank 10 Group  0 Pid 233566 on    GPU-937 device  2 [0x51] NVIDIA A100-SXM4-40GB
#  Rank 11 Group  0 Pid 233568 on    GPU-937 device  3 [0x54] NVIDIA A100-SXM4-40GB
#  Rank 12 Group  0 Pid 233570 on    GPU-937 device  4 [0x8d] NVIDIA A100-SXM4-40GB
#  Rank 13 Group  0 Pid 233571 on    GPU-937 device  5 [0x92] NVIDIA A100-SXM4-40GB
#  Rank 14 Group  0 Pid 233573 on    GPU-937 device  6 [0xd6] NVIDIA A100-SXM4-40GB
#  Rank 15 Group  0 Pid 233574 on    GPU-937 device  7 [0xda] NVIDIA A100-SXM4-40GB
NCCL version 2.25.1+cuda12.4
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
  1073741824     268435456     float     sum      -1    10933   98.21  184.14      0    11025   97.40  182.62      0
 10737418240    2684354560     float     sum      -1   105736  101.55  190.41      0   105418  101.86  190.98      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 187.036
#

もちろんSlurmジョブでも確認可能です。

[opc@gpu-cluster-controller ~]$ cp /opt/oci-hpc/samples/gpu/nccl_run_allreduce.sbatch .
[opc@gpu-cluster-controller ~]$ sbatch nccl_run_allreduce.sbatch
Submitted batch job 2
[opc@gpu-cluster-controller ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 2   compute nccl-all      opc  R       0:02      2 GPU-[88,937]

[opc@gpu-cluster-controller ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

[opc@gpu-cluster-controller ~]$ cat slurm-2.out
/var/spool/slurmd/job00002/slurm_script: line 10: cd: /nfs/scratch: No such file or directory
MACHINEFILE
GPU-88
GPU-937
ORDEREDMACHINEFILE
GPU-88
GPU-937
ORDEREDRANKMACHINEFILE
rank 0=GPU-88 slot=0
rank 1=GPU-88 slot=1
rank 2=GPU-88 slot=2
rank 3=GPU-88 slot=3
rank 4=GPU-88 slot=4
rank 5=GPU-88 slot=5
rank 6=GPU-88 slot=6
rank 7=GPU-88 slot=7
rank 8=GPU-937 slot=0
rank 9=GPU-937 slot=1
rank 10=GPU-937 slot=2
rank 11=GPU-937 slot=3
rank 12=GPU-937 slot=4
rank 13=GPU-937 slot=5
rank 14=GPU-937 slot=6
rank 15=GPU-937 slot=7
# nThread 1 nGpus 1 minBytes 1073741824 maxBytes 10737418240 step: 9663676416(bytes) warmup iters: 5 iters: 100 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid 245867 on     GPU-88 device  0 [0x0f] NVIDIA A100-SXM4-40GB
#  Rank  1 Group  0 Pid 245868 on     GPU-88 device  1 [0x15] NVIDIA A100-SXM4-40GB
#  Rank  2 Group  0 Pid 245869 on     GPU-88 device  2 [0x51] NVIDIA A100-SXM4-40GB
#  Rank  3 Group  0 Pid 245870 on     GPU-88 device  3 [0x54] NVIDIA A100-SXM4-40GB
#  Rank  4 Group  0 Pid 245871 on     GPU-88 device  4 [0x8d] NVIDIA A100-SXM4-40GB
#  Rank  5 Group  0 Pid 245872 on     GPU-88 device  5 [0x92] NVIDIA A100-SXM4-40GB
#  Rank  6 Group  0 Pid 245873 on     GPU-88 device  6 [0xd6] NVIDIA A100-SXM4-40GB
#  Rank  7 Group  0 Pid 245874 on     GPU-88 device  7 [0xda] NVIDIA A100-SXM4-40GB
#  Rank  8 Group  0 Pid 239874 on    GPU-937 device  0 [0x0f] NVIDIA A100-SXM4-40GB
#  Rank  9 Group  0 Pid 239875 on    GPU-937 device  1 [0x15] NVIDIA A100-SXM4-40GB
#  Rank 10 Group  0 Pid 239876 on    GPU-937 device  2 [0x51] NVIDIA A100-SXM4-40GB
#  Rank 11 Group  0 Pid 239877 on    GPU-937 device  3 [0x54] NVIDIA A100-SXM4-40GB
#  Rank 12 Group  0 Pid 239878 on    GPU-937 device  4 [0x8d] NVIDIA A100-SXM4-40GB
#  Rank 13 Group  0 Pid 239879 on    GPU-937 device  5 [0x92] NVIDIA A100-SXM4-40GB
#  Rank 14 Group  0 Pid 239880 on    GPU-937 device  6 [0xd6] NVIDIA A100-SXM4-40GB
#  Rank 15 Group  0 Pid 239881 on    GPU-937 device  7 [0xda] NVIDIA A100-SXM4-40GB
NCCL version 2.25.1+cuda12.4
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
  1073741824     268435456     float     sum      -1    10940   98.15  184.03      0    10876   98.72  185.11      0
 10737418240    2684354560     float     sum      -1   106048  101.25  189.85      0   106109  101.19  189.73      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 187.179
#