More than 5 years have passed since last update.

Kubernetes on NVIDIA GPUsとは

Last updated at 2018-11-21Posted at 2018-07-25

Kubernetes on NVIDIA GPUsとは

GTC 2018のKeynoteで発表されたプロダクト

Kubernetes on NVIDIA GPUs:
https://developer.nvidia.com/kubernetes-gpu

まぎらわしいが、K8s自体をGPUで高速化ではなく、GPUリソースのスケジューリングに対応したNVIDIAのK8sパッケージ。

主な中身

Kubernetes(+Device Plugin)
NVIDIA device plugin for Kubernetes
nvidia-docker2
Ptometheus + Grafana

と前からGPU+K8sを触っていた人には実はそこまで目新しいものはない。

追加要素

Compute CapabilitiesやGPUメモリによるスケジューリング
Prometheus, GrafanaによるGPUモニタリング
コンテナランタイムのマルチサポート(Docker, CRI-O)
DGX SystemへのNVIDIA公式サポート

Compute CapabilitiesやGPUメモリによるスケジューリング

通常のK8s + Device PluginではGPUの個数でのスケジューリングしかできないが、NVIDIAのパッケージでは両方を拡張してより細かいスケジューリングができるようになっている

NVIDIA Kubernetes: https://github.com/NVIDIA/kubernetes
v1.9.6をベースにDevice Pluginまわりやスケジューラの拡張が入っている
NVIDIA device plugin for Kubernetes(nvidiak8s/v1.9 branch): https://github.com/NVIDIA/k8s-device-plugin/tree/nvidiak8s/v1.9
NVIDIA K8sにあわせて、K8sにGPUの各種情報を渡すように拡張されている https://github.com/NVIDIA/k8s-device-plugin/commit/fefb0372a0073d5d4ccf3182ac6c4f42a2960f0c#diff-4146cb100ecaa8a375818674a99281c6R32

これらの変更によって各ノードにあるGPUの

GPUメモリの容量
GPUメモリのECCの有無
Compute Capability

が自動的に取得され、スケジューリングの条件として使えるようになっている。

gpu-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: cuda-container
      image: nvidia/cuda:9.0-base
      command: ["sleep"]
      args: ["100000"]
      extendedResourceRequests: ["nvidia-gpu"]
  extendedResources:
    - name: "nvidia-gpu"
      resources:
        limits:
          nvidia.com/gpu: 1
      affinity:
        required:
          - key: "nvidia.com/gpu-memory"
            operator: "Gt"
            values: ["8000"] # change value to appropriate mem for GPU

nvidia.com/gpuでの指定のみではどのGPUでも一緒くたにスケジューリングされてしまうため、複数の種類のGPUが混在する状況では便利かもしれない

Prometheus, GrafanaによるGPUモニタリング

Prometheus Operator ( https://github.com/coreos/prometheus-operator )をもとにdcgmを使用してGPUの情報を監視できるように拡張されている。これは普通のK8s上でもデプロイ可能。

node-exporter-daemonset.yaml

$ diff -u  manifests/node-exporter/node-exporter-daemonset.yaml /etc/kubeadm/dcgm/node-exporter-daemonset.yaml
--- manifests/node-exporter/node-exporter-daemonset.yaml	2018-04-09 13:25:28.000000000 +0000
+++ /etc/kubeadm/dcgm/node-exporter-daemonset.yaml	2018-06-15 22:44:00.000000000 +0000
@@ -14,18 +14,31 @@
       name: node-exporter
     spec:
       serviceAccountName: node-exporter
-      securityContext:
-        runAsNonRoot: true
-        runAsUser: 65534
       hostNetwork: true
       hostPID: true
+      nodeSelector:
+        hardware-type: NVIDIAGPU
+      initContainers:
+      - image: nvcr.io/nvidia/k8s/dcgm-exporter:1.4.3
+        name: nvidia-dcgm-exporter-hook
+        command: ["cp"]
+        args:
+        - "/work/dcgm.json"
+        - "/hook/dcgm.json"
+        volumeMounts:
+        - name: dcgm-docker-hook
+          mountPath: /hook
       containers:
       - image: quay.io/prometheus/node-exporter:v0.15.2
         args:
         - "--web.listen-address=127.0.0.1:9101"
         - "--path.procfs=/host/proc"
         - "--path.sysfs=/host/sys"
+        - "--collector.textfile.directory=/run/prometheus"
         name: node-exporter
+        securityContext:
+          runAsNonRoot: true
+          runAsUser: 65534
         resources:
           requests:
             memory: 30Mi
@@ -40,6 +53,17 @@
         - name: sys
           readOnly: true
           mountPath: /host/sys
+        - name: collector-textfiles
+          readOnly: true
+          mountPath: /run/prometheus
+      - image: nvcr.io/nvidia/k8s/dcgm-exporter:1.4.3
+        name: nvidia-dcgm-exporter
+        securityContext:
+          runAsNonRoot: false
+          runAsUser: 0
+        volumeMounts:
+        - name: collector-textfiles
+          mountPath: /run/prometheus
       - name: kube-rbac-proxy
         image: quay.io/brancz/kube-rbac-proxy:v0.2.0
         args:
@@ -66,4 +90,9 @@
       - name: sys
         hostPath:
           path: /sys
-
+      - name: collector-textfiles
+        emptyDir:
+          medium: Memory
+      - name: dcgm-docker-hook
+        hostPath:
+          path: /usr/share/containers/docker/hooks.d

コンテナランタイムのマルチサポート(Docker, CRI-O)

nvidia-docker2も使用しているnvidia-container-runtimeのCRI-O対応。NVIDIA K8sとは関係なく使用可。
https://github.com/NVIDIA/nvidia-container-runtime

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up