GPUコンテナ on Microk8s (1)

Posted at 2025-01-31

Microk8s上でGPUを使う方法について概説する
内部向けのものを参考に書いているのでやや情報が古いことに注意
なお一部の高級GPU以外SR-IOV (Single Root IO Virtualization)が使えないので基本一つのマシンのGPUは一つのコンテナしか使えない (排他制御)
AMDでもできるようなことは書かれているが未確認

インストール

まずcudaとcuda-drivers、nvidia-container-toolkit-baseをインストールしておく
その後以下を実行する

#version check
sudo apt-get info cuda
nvidia-ctk --version

#setup
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
microk8s enable gpu

テスト例

apiVersion: v1
kind: Pod
metadata:
  name: gpu-hello-world
spec:
  containers:
    - name: cuda
# same version as host cuda
# https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/supported-tags.md
      image: nvidia/cuda:12.1.1-base-ubuntu20.04
      tty: true
      resources:
        limits:
           nvidia.com/gpu: 1

$ microk8s kubectl get pod
NAME                           READY   STATUS      RESTARTS   AGE
git-test72-gsxxf               0/1     Completed   0          100d
code-server-644bc8dd6d-mw7tk   1/1     Running     0          52m
gpu-hello-world                1/1     Running     0          14s
$ microk8s kubectl exec gpu-hello-world -- nvidia-smi
Wed Aug 23 10:40:40 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A5000               Off | 00000000:01:00.0 Off |                  Off |
| 30%   33C    P8              17W / 230W |    646MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

テスト例

import torch
print(torch.__version__)
torch.cuda.is_available()

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
python ./untitled.py

参考
microk8sでGPUコンテナーを使う環境を整える - 仮想化通信

MicroK8s - Add-on: gpu | MicroK8s

Installation Guide — container-toolkit 1.13.5 documentation

Customizing User Resources

Creating custom Docker images with CUDA — Sarus 1.6.0
documentation

自前のイメージを使いたいとき

以前書いた以下の記事を参考にしてください
さらにMicrok8s向けに追加の情報があるので書いておく

Microk8sとkubernetesの使い方 Fedoraのトラブルシューティング、ssh secret、dns設定付

クラスター上のssh(マスター)などからhost.tomlを書き込んでmicrok8sをリスタートする
これでregistryの設定が完了する({ip}は書き換える)

/var/snap/microk8s/current/args/certs.d/{ip}:32000/hosts.toml

server = "http://{ip}:32000"

[host."http://{ip}:32000"]
capabilities = ["pull", "resolve"]

cluster上のimageについて確認できる

microk8s ctr image ls

参考
How to work with a private registry

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up