More than 3 years have passed since last update.

kubernetes v1.23.6/Nvidia GPUと戯れるその3

Last updated at 2022-05-14Posted at 2022-05-14

前回からの続きになります。

作業手順

以下となります。

作業環境の準備
Nvidia CUDA Toolkit、docker、Nvidia Container Toolkit
kubeadmでクラスター構築 ←ココ
helmとGPU Operator
IstioとMetalLB
Jupyterhub

3. kubeadmでクラスター構築(Calico含む)

3.1 なぜ、kubeadmなのか？

情報量が多そうなkubeadmによるインストールを選択しました。
- kubernetesのマニュアルでは、最初にkubeadmが紹介されています(それ以外の方法は、kopsを使ったAWS上でのKubernetesのインストール、kubesprayを使ったオンプレミス/クラウドプロバイダへのKubernetesのインストール)。
- 一方、NvidiaのマニュアルのInstall Kubernetesでは、Option 1:Installing Kubernetes Using DeepOps、Option 2: Installing Kubernetes Using Kubeadmの順に紹介されています。
- クラウドじゃないし、DeepOpsは情報量が少なそうだし、でkubeadmになりました。

3.2 kubeadm initの準備

kubeadmのインストールは、k8sのマニュアルに従います。概要は以下です。Nvidiaのマニュアルにもk8sの導入手順がガイドされていますが、Nvidiaに特化した記述は見つからないのでマニュアルの手順に従ったほうが無難です。
- といいますか、Nvidiaのマニュアルには、後述のkubeadm-flags.envや、daemon.jsonに関する説明はないので、コマンドは失敗し、トラブルシューティングが必要になると思います(私の環境では必要になりました)。
- ①～⑥まではk8s01/02共に共通の作業であり、⑦はk8s01で、⑧はk8s02で実行します。
- なお、説明が必要なのは、④、⑥、⑦、⑧であり、詳細を後述します。
  - ①swapをoffにする
  - ②iptablesがブリッジを通過するトラフィックを処理できるようにする
  - ③iptablesがnftablesバックエンドを使用しないようにする
  - ④コンテナランタイムのインストール(Dockerのインストール)
  - ⑤kubeadm、kubelet、kubectlのインストール
  - ⑥コントロールプレーンノードのkubeletによって使用されるcgroupドライバーの設定
  - ⑦kubeadm initでmaster nodeを構成(Calico導入含む)
  - ⑧kubeadm joinでworker nodeを追加
コンテナランタイムはDockerを選択しています。有名で情報量が多そうだったからです。前記事にあるように、既にNvidia Container ToolkitをDocker環境にインストールしており、今更、他のコンテナランタイムを選択できない、という事情もあります。
- k8sのCRI(Container Runtime Interface)のマニュアルで、構成方法が説明されているコンテナランタイムは、Docker、containerd、podmanです。
- Nvidia Cotainer Toolkitのマニュアルにおける、Supported container runtimesは、Docker、containerd、podmanです。RHEL/CentOS 8 podmanがサポート対象になっているのですが、本環境のOSはUbuntuで関係なさそうなので、検討していません。

④コンテナランタイムのインストール(Dockerのインストール)

結論として、/etc/docker/daemon.jsonを以下の内容にします。既に前作業によりDocker自体はインストール済みですが、この設定をしないと、kubeadm initが失敗したり、この後の作業でインストールするGPU OperaterのPodが正常起動しなかったりします。
- initシステムとしてsystemdを使用している場合の設定と、Nvidia container toolkitのための設定の2つが組み込まれています。後述します。
- kubeadm initが失敗する事象と対策については、こちらのIssueで説明されています。

/etc/docker/daemon.json

root@k8s01:~# cat /etc/docker/daemon.json
{
   "exec-opts": ["native.cgroupdriver=systemd"],
   "log-driver": "json-file",
   "log-opts": {
   "max-size": "100m"
   },
   "default-runtime": "nvidia",
   "runtimes": {
      "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
      }
   }
}

k8sのマニュアル(CRIのインストール)のDockerのインストール手順では、systemdとcgroupfsの2つのcgroupマネージャが混在するとリソース圧迫下では不安定になる、それを回避するためには、/etc/docker/daemon.jsonにnative.cgroupdriver=systemd と記述すべし、とガイドされています。
- 具体的には、「systemdがLinuxのディストリビューションのinitシステムとして選択されている場合、 initプロセスが作成され、rootコントロールグループ(cgroup)を使い、cgroupマネージャーとして行動します。 systemdはcgroupと密接に統合されており、プロセスごとにcgroupを割り当てます。 cgroupfs を使うように、あなたのコンテナランライムとkubeletを設定することができます。 systemdと一緒に cgroupfs を使用するということは、2つの異なるcgroupマネージャーがあることを意味します。コントロールグループはプロセスに割り当てられるリソースを制御するために使用されます。単一のcgroupマネージャーは、割り当てられているリソースのビューを単純化し、デフォルトでは使用可能なリソースと使用中のリソースについてより一貫性のあるビューになります。 2つのマネージャーがある場合、それらのリソースについて2つのビューが得られます。 kubeletとDockerに cgroupfs を使用し、ノード上で実行されている残りのプロセスに systemd を使用するように設定されたノードが、リソース圧迫下で不安定になる場合があります。コンテナランタイムとkubeletがcgroupドライバーとしてsystemdを使用するように設定を変更することでシステムは安定します。以下のDocker設定の native.cgroupdriver=systemd オプションに注意してください。」という説明とともに、/etc/docker/daemon.jsonに以下を実施するようにガイドされています。前述のIssueと同じ対応です。

/etc/docker/daemon.jsonの処理

cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

私が使用した上記のdaemon.jsonには"storage-driver": "overlay2"が含まれていません。これを含めた状態でNvidia Container toolkitのオプションを追記するとkubeadm initが失敗するからです(試行錯誤の結果、このオプションを削除すると成功することが判明しました)。削除しないと失敗するのでしょうがないです。
- ちなみに、こちらのIssueでは"storage-driver": "overlay2"を残していてworks fineとのことです(動作しています)。
- こちらが、Docker storage driversのマニュアル。そして、こちらが、daemon.jsonの説明。結局、overlay2が干渉する理由はわからないままですけど、とりあえず、動作しているので、良しとしています。
- 以下のコマンドで、この環境のUbuntuではsystemdがinitシステムとして選択されていることが確認できています。この記事を参考にしました。

initシステムの確認

root@k8s01:~# ls -l /sbin/init
lrwxrwxrwx 1 root root 20  3月 23 22:29 /sbin/init -> /lib/systemd/systemd
root@k8s01:~#

そして、Nvidia container toolkitのマニュアルでは、"nvidia-docker2を使用している場合は、既にランタイムに登録されているので従うな"という注釈付きで、Container toolkitを導入するためにはdaemon.jsonに以下を追記するよう、ガイドされています。これらを統合したものが冒頭のdaemon.jsonとなります。
- nvidia-docker2をContainer Toolkitとしてインストールしたので不要なはずですが、下記の設定を実施しないとGPU OperaterのPodが正常起動しないのだから、しょうがないです。

/etc/docker/daemon.json

{
   "default-runtime": "nvidia",
   "runtimes": {
      "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
      }
   }
}

⑥コントロールプレーンノードのkubeletによって使用されるcgroupドライバーの設定

/etc/sysconfig/kubeletにKUBELET_EXTRA_ARGS=--cgroup-driver=systemdを記述します(!?)
- この環境では、しなくてもkubeadm initは成功していますし、kubeletがcgroup-driverにsystemdを使用していることが確認できました。
  - kubeadmのマニュアルには、「もしあなたが異なるCRIを使用している場合、/etc/default/kubelet(CentOS、RHEL、Fedoraでは/etc/sysconfig/kubelet)ファイル内のcgroup-driverの値をKUBELET_EXTRA_ARGS=--cgroup-driver="value"のように変更する必要があります。」という記述があります(意訳)。
  - kubeletのデフォルトのcgroupドライバーはcgroupfsで、コンテナランタイム(Docker)のcgroupドライバーはsystemdです(上記daemon.jsonで指定しています)ので、変更する必要がありそうです。
    - kubeletのマニュアルに「cgroupDriver is the driver kubelet uses to manipulate CGroups on the host (cgroupfs or systemd). Default: "cgroupfs"」という説明があります。

⑦kubeadm initでmaster nodeを構築(Calico導入含む)

kubeadm initコマンドは以下となります。
- master nodeに冗長性は持たせない
- Podに割り当てられるネットワークCIDRは192.168.0.0/16(マニュアルと同じ)

kubeadm init

kubeadm init --pod-network-cidr=192.168.0.0/16

CNI PluginであるCalicoの導入手順はマニュアルのquick startに従います。

Calico

kubectl create -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml
kubectl create -f https://projectcalico.docs.tigera.io/manifests/custom-resources.yaml
watch kubectl get pods -n calico-system
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl get nodes -o wide

⑧kubeadm joinでworker nodeを追加

kubeadm initが成功するとkubeadm joinのコマンドの詳細(ユニークなハッシュ値を含む)が出力されるので指示に従いworker nodeになる予定のOS上(k8s02)で実行します。

kubeadm join

kubeadm join 172.31.35.131:6443 --token gmagdj.763lucvtt6bsil6i \
        --discovery-token-ca-cert-hash sha256:0f76e7ebb24ad3a1180488e53a7337855ac026971fd72ea2f82ffda427598396

kubeadmが構築したk8sクラスターにアクセスできるように以下を.bashrcに追記します。

.bashrcへの追記

export KUBECONFIG=/etc/kubernetes/admin.conf

4.3 実行結果

-　以下、①～⑥までの作業のログです。
-　④Dockerそのもののインストールは、既に作業済みなので含まれません。
-　swapをoffにする方法は、こちらを参考にしました。

kubeadmのインストール(k8s01/02共通)

Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.13.0-40-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

7のアップデートはすぐに適用されます。
6 of these updates are standard security updates.
これらの追加アップデートを確認するには次を実行してください: apt list --upgradable

Your Hardware Enablement Stack (HWE) is supported until April 2025.


Last login: Fri Apr 29 22:54:08 2022 from 172.31.35.113
root@k8s02:~# // OSがrebootしてもswapをoffにするため、fstabのswapfile部分をコメントアウト
root@k8s02:~# vi /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda5 during installation
UUID=c0146e46-b6e3-4bd5-ab9d-f076ee073f4f /               ext4    errors=remount-ro 0       1
# /boot/efi was on /dev/sda1 during installation
UUID=1DBB-4BFF  /boot/efi       vfat    umask=0077      0       1
# /swapfile                                 none            swap    sw              0       0
~
~
"/etc/fstab" 12 lines, 667 characters written
root@k8s02:~# // initシステムにsystemdを使用している場合の設定、nvidia container toolkitのための設定をdaemon.jsonに反映
root@k8s02:~# vi /etc/docker/daemon.json
{
   "debug": true,
   "exec-opts": ["native.cgroupdriver=systemd"],
   "log-driver": "json-file",
   "log-opts": {
   "max-size": "100m"
   },
   "default-runtime": "nvidia",
   "runtimes": {
      "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
      }
   }
}
~
"/etc/docker/daemon.json" 15 lines, 313 characters written
root@k8s02:~# // 現在のswapの状況を確認
root@k8s02:~# swapon --show
NAME      TYPE SIZE USED PRIO
/swapfile file 1.8G   0B   -2
root@k8s02:~# // fstabの設定をしただけなので、この時点ではswapを使用している。swapを使用しないようにする。
root@k8s02:~# swapoff -a
root@k8s02:~# //　swapを使用してないことを確認する。
root@k8s02:~# swapon --show
root@k8s02:~# systemctl --type swap
  UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
root@k8s02:~# // iptablesがブリッジを通過するトラフィックを処理できるようにする
root@k8s02:~# cat <<EOF > /etc/sysctl.d/k8s.conf
> net.bridge.bridge-nf-call-ip6tables = 1
> net.bridge.bridge-nf-call-iptables = 1
> EOF
root@k8s02:~#
root@k8s02:~# sysctl --system
* Applying /etc/sysctl.d/10-console-messages.conf ...
kernel.printk = 4 4 1 7
----------------------中略----------------------
* Applying /etc/sysctl.conf ...
root@k8s02:~# # レガシーバイナリがインストールされていることを確認してください
root@k8s02:~# sudo apt-get install -y iptables arptables ebtables
パッケージリストを読み込んでいます... 完了
----------------------中略----------------------
man-db (2.9.1-1) のトリガを処理しています ...
root@k8s02:~# // iptablesがnftablesバックエンドを使用しないようにする
root@k8s02:~# sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
root@k8s02:~# sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
root@k8s02:~# sudo update-alternatives --set arptables /usr/sbin/arptables-legacy
sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacyupdate-alternatives: /usr/sbin/arptables (arptables) を提供するためにマニュアルモードで /usr/sbin/arptables-legacy を使います
root@k8s02:~# sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy
update-alternatives: /usr/sbin/ebtables (ebtables) を提供するためにマニュアルモードで /usr/sbin/ebtables-legacy を使います
root@k8s02:~# // kubeadm、kubelet、kubectlのインストール
root@k8s02:~# sudo apt-get update && sudo apt-get install -y apt-transport-https curl
ヒット:1 http://jp.archive.ubuntu.com/ubuntu focal InReleaseg | sudo apt-key add -
----------------------中略----------------------
  linux-modules-extra-5.13.0-35-generic linux-modules-extra-5.13.0-37-generic
これを削除するには 'sudo apt autoremove' を利用してください。
アップグレード: 0 個、新規インストール: 0 個、削除: 0 個、保留: 7 個。
root@k8s02:~# 
root@k8s02:~# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
OK
root@k8s02:~# cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
> deb https://apt.kubernetes.io/ kubernetes-xenial main
> EOF
deb https://apt.kubernetes.io/ kubernetes-xenial main
root@k8s02:~# 
root@k8s02:~# sudo apt-get update
ヒット:1 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  InRelease
---------------------中略----------------------
ヒット:8 https://download.docker.com/linux/ubuntu focal InRelease
取得:6 https://packages.cloud.google.com/apt kubernetes-xenial InRelease [9,383 B]
取得:9 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 Packages [55.3 kB]
64.7 kB を 2秒 で取得しました (28.1 kB/s)
パッケージリストを読み込んでいます... 完了
root@k8s02:~# sudo apt-get install -y kubelet kubeadm kubectl
パッケージリストを読み込んでいます... 完了
依存関係ツリーを作成しています
----------------------中略----------------------
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /lib/systemd/system/kubelet.service.
kubeadm (1.23.6-00) を設定しています ...
man-db (2.9.1-1) のトリガを処理しています ...
root@k8s02:~# sudo apt-mark hold kubelet kubeadm kubectl
kubelet は保留に設定されました。
kubeadm は保留に設定されました。
kubectl は保留に設定されました。
root@k8s02:~#

k8s01のkubeadm init実行とCalicoのインストール、k8s02でのkubeadm joinの実行

kubeadm init、calico、kubeadm join

root@k8s01:~# // daemon.jsonの内容確認
root@k8s01:~# cat /etc/docker/daemon.json
{
   "exec-opts": ["native.cgroupdriver=systemd"],
   "log-driver": "json-file",
   "log-opts": {
   "max-size": "100m"
   },
   "default-runtime": "nvidia",
   "runtimes": {
      "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
      }
   }
}
root@k8s01:~# // swapの状態を確認
root@k8s01:~# free -h
              total        used        free      shared  buff/cache   available
Mem:           15Gi       814Mi        13Gi       5.0Mi       1.1Gi        14Gi
Swap:            0B          0B          0B
root@k8s01:~# // master nodeの構成
root@k8s01:~# kubeadm init --pod-network-cidr=192.168.0.0/16
[init] Using Kubernetes version: v1.23.6
[preflight] Running pre-flight checks
----------------------中略----------------------
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.31.35.131:6443 --token gmagdj.763lucvtt6bsil6i \
        --discovery-token-ca-cert-hash sha256:0f76e7ebb24ad3a1180488e53a7337855ac026971fd72ea2f82ffda427598396

root@k8s01:~# // podの状態を確認する。CNI pluginがないのでpendingになっている。
root@k8s01:~# export KUBECONFIG=/etc/kubernetes/admin.conf
root@k8s01:~# kubectl get pod -A
NAMESPACE     NAME                                          READY   STATUS    RESTARTS   AGE
kube-system   coredns-64897985d-n49tt                       0/1     Pending   0          5m
kube-system   coredns-64897985d-schzf                       0/1     Pending   0          5m
kube-system   etcd-k8s01.dcws.dell.com                      1/1     Running   0          5m13s
kube-system   kube-apiserver-k8s01.dcws.dell.com            1/1     Running   0          5m12s
kube-system   kube-controller-manager-k8s01.dcws.dell.com   1/1     Running   0          5m12s
kube-system   kube-proxy-xqwbd                              1/1     Running   0          5m1s
kube-system   kube-scheduler-k8s01.dcws.dell.com            1/1     Running   0          5m12s
root@k8s01:~# // CNI Pluginとしてcalicoを導入する
root@k8s01:~# kubectl create -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
----------------------中略----------------------
clusterrolebinding.rbac.authorization.k8s.io/tigera-operator created
deployment.apps/tigera-operator created
root@k8s01:~# 
root@k8s01:~# kubectl create -f https://projectcalico.docs.tigera.io/manifests/custom-resources.yaml
installation.operator.tigera.io/default created
apiserver.operator.tigera.io/default created
root@k8s01:~# // calico関連のPodが正常動作していることを確認
root@k8s01:~# watch kubectl get pods -n calico-system
root@k8s01:~# kubectl get pods -n calico-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-557cb7fd8b-bw2sr   1/1     Running   0          82s
calico-node-lrx66                          1/1     Running   0          82s
calico-typha-5d4d7646fb-wv4vx              1/1     Running   0          82s
root@k8s01:~# // master nodeでも任意のpodが動作するようにtaintの設定をする。
root@k8s01:~# kubectl taint nodes --all node-role.kubernetes.io/master-
node/k8s01.dcws.dell.com untainted
root@k8s01:~# // master nodeの状態を確認
root@k8s01:~# kubectl get nodes -o wide
NAME                  STATUS   ROLES                  AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8s01.dcws.dell.com   Ready    control-plane,master   11m   v1.23.6   172.31.35.131   <none>        Ubuntu 20.04.4 LTS   5.13.0-40-generic   docker://20.10.14
root@k8s01:~# // k8s02でkubeadm joinを実行するためにk8s02にssh接続する。
root@k8s01:~# ssh k8s02
root@k8s02's password:
Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.13.0-40-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

1のアップデートはすぐに適用されます。
これらの追加アップデートを確認するには次を実行してください: apt list --upgradable

20 updates could not be installed automatically. For more details,
see /var/log/unattended-upgrades/unattended-upgrades.log
Your Hardware Enablement Stack (HWE) is supported until April 2025.

Last login: Tue May  3 22:16:51 2022 from 172.31.35.131
root@k8s02:~# // k8s01のkubeadm initで出力されたkubeadm join(ユニークなハッシュ値含む)を実行する
root@k8s02:~# kubeadm join 172.31.35.131:6443 --token gmagdj.763lucvtt6bsil6i \
>         --discovery-token-ca-cert-hash sha256:0f76e7ebb24ad3a1180488e53a7337855ac026971fd72ea2f82ffda427598396
[preflight] Running pre-flight checks
----------------------中略----------------------
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

root@k8s02:~#
root@k8s02:~# ログアウト
Connection to k8s02 closed.
root@k8s01:~# // k8s01/02の状態を確認する。k8s02はkubeadm join実行直後なのでNotReadyになってる。
root@k8s01:~# kubectl get node -o wide
NAME                  STATUS     ROLES                  AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8s01.dcws.dell.com   Ready      control-plane,master   16m   v1.23.6   172.31.35.131   <none>        Ubuntu 20.04.4 LTS   5.13.0-40-generic   docker://20.10.14
k8s02.dcws.dell.com   NotReady   <none>                 47s   v1.23.6   172.31.35.132   <none>        Ubuntu 20.04.4 LTS   5.13.0-40-generic   docker://20.10.14
root@k8s01:~# // 数分経過後、k8s02がReadyになってることが確認できる。
root@k8s01:~# kubectl get node -o wide
NAME                  STATUS   ROLES                  AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8s01.dcws.dell.com   Ready    control-plane,master   18m    v1.23.6   172.31.35.131   <none>        Ubuntu 20.04.4 LTS   5.13.0-40-generic   docker://20.10.14
k8s02.dcws.dell.com   Ready    <none>                 2m8s   v1.23.6   172.31.35.132   <none>        Ubuntu 20.04.4 LTS   5.13.0-40-generic   docker://20.10.14
root@k8s01:~#

4.4 その他

k8sのマニュアルには「コントロールプレーンノードのkubeletによって使用されるcgroupドライバーの設定」-「Dockerを使用した場合、kubeadmは自動的にkubelet向けのcgroupドライバーを検出し、それを実行時に/var/lib/kubelet/kubeadm-flags.envファイルに設定します。」という記述があります。kubeadm-flags.envを確認してみたのですが、cgroupドライバーの記述は見当たりません。が、とりあえず、k8sクラスターは動作しているので、突っ込んでいません。

cgroupドライバー

root@k8s01:~# cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.6"
root@k8s01:~#

kubeletを実行してみましたけど、"Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd""というメッセージ。「kubeletのcgroup driverがcgroupfsで、dockerのそれがsystemdで、構成間違いでkubeletを実行できない」ということらしいです。
- kubeletのConfigMapを確認したら、cgroupDriver: systemd　となっているんですけど、、、、。他にどこを変更すればよいのか、、、。
  - こちらがConfigMapのマニュアルです。
  - マニュアルのトラブルシューティングにガイドがありますが、/etc/default/kubeletにKUBELET_EXTRA_ARGSを追記してやり直しても変化なし。苦労してcontainer toolkitを入れたのでdocker再導入は非現実的、ということで、とりあえず、k8sクラスターは動作しているので、放置しています。

kubeletの出力

root@k8s01:~# kubelet
I0510 10:50:40.107109    3388 server.go:446] "Kubelet version" kubeletVersion="v1.23.6"
I0510 10:50:40.107860    3388 server.go:606] "Standalone mode, no API client"
I0510 10:50:40.108162    3388 server.go:662] "Failed to get the kubelet's cgroup. Kubelet system container metrics may be missing." err="cpu and memory cgroup hierarchy not unified.  cpu: /user.slice, memory: /user.slice/user-0.slice/session-3.scope"
I0510 10:50:40.241379    3388 server.go:494] "No api server defined - no events will be sent to API server"
I0510 10:50:40.241425    3388 server.go:693] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
I0510 10:50:40.241880    3388 container_manager_linux.go:281] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
I0510 10:50:40.242042    3388 container_manager_linux.go:286] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerPolicyOptions:map[] ExperimentalTopologyManagerScope:container ExperimentalCPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}
I0510 10:50:40.242107    3388 topology_manager.go:133] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
I0510 10:50:40.242146    3388 container_manager_linux.go:321] "Creating device plugin manager" devicePluginEnabled=true
I0510 10:50:40.242211    3388 state_mem.go:36] "Initialized new in-memory state store"
I0510 10:50:40.242352    3388 kubelet.go:313] "Using dockershim is deprecated, please consider using a full-fledged CRI implementation"
I0510 10:50:40.242444    3388 client.go:80] "Connecting to docker on the dockerEndpoint" endpoint="unix:///var/run/docker.sock"
I0510 10:50:40.242475    3388 client.go:99] "Start docker client with request timeout" timeout="2m0s"
I0510 10:50:40.262772    3388 docker_service.go:571] "Hairpin mode is set but kubenet is not enabled, falling back to HairpinVeth" hairpinMode=promiscuous-bridge
I0510 10:50:40.262832    3388 docker_service.go:243] "Hairpin mode is set" hairpinMode=hairpin-veth
I0510 10:50:40.263127    3388 cni.go:240] "Unable to update cni config" err="no networks found in /etc/cni/net.d"
I0510 10:50:40.271191    3388 docker_service.go:258] "Docker cri networking managed by the network plugin" networkPluginName="kubernetes.io/no-op"
I0510 10:50:40.293018    3388 docker_service.go:264] "Docker Info" dockerInfo=&{ID:GGZ5:A5GU:QN33:2SL7:SUSX:WAIC:TMFA:6AIH:CVWT:V7DY:ZCLB:X7CF Containers:0 ContainersRunning:0 ContainersPaused:0 ContainersStopped:0 Images:1 Driver:overlay2 DriverStatus:[[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true] [userxattr false]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host ipvlan macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTCP:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:true NFd:25 OomKillDisable:true NGoroutines:34 SystemTime:2022-05-10T10:50:40.272829415+09:00 LoggingDriver:json-file CgroupDriver:systemd CgroupVersion:1 NEventsListener:0 KernelVersion:5.13.0-40-generic OperatingSystem:Ubuntu 20.04.4 LTS OSVersion:20.04 OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc000ad3f10 NCPU:4 MemTotal:16778690560 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:k8s01.dcws.dell.com Labels:[] ExperimentalBuild:false ServerVersion:20.10.14 ClusterStore: ClusterAdvertise: Runtimes:map[io.containerd.runc.v2:{Path:runc Args:[] Shim:<nil>} io.containerd.runtime.v1.linux:{Path:runc Args:[] Shim:<nil>} nvidia:{Path:/usr/bin/nvidia-container-runtime Args:[] Shim:<nil>} runc:{Path:runc Args:[] Shim:<nil>}] DefaultRuntime:nvidia Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil> Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:3df54a852345ae127d1fa3092b95168e4a88e2f8 Expected:3df54a852345ae127d1fa3092b95168e4a88e2f8} RuncCommit:{ID:v1.0.3-0-gf46b6ba Expected:v1.0.3-0-gf46b6ba} InitCommit:{ID:de40ad0 Expected:de40ad0} SecurityOptions:[name=apparmor name=seccomp,profile=default] ProductLicense: DefaultAddressPools:[] Warnings:[]}
E0510 10:50:40.293105    3388 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"cgroupfs\" is different from docker cgroup driver: \"systemd\""

kubeletのConfigMap

root@k8s01:~# kubectl edit cm -n kube-system kubelet-config-1.23
Edit cancelled, no changes made.


# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  kubelet: |
    apiVersion: kubelet.config.k8s.io/v1beta1
    authentication:
      anonymous:
        enabled: false
      webhook:
        cacheTTL: 0s
        enabled: true
      x509:
        clientCAFile: /etc/kubernetes/pki/ca.crt
    authorization:
      mode: Webhook
      webhook:
        cacheAuthorizedTTL: 0s
        cacheUnauthorizedTTL: 0s
    cgroupDriver: systemd
    clusterDNS:
    - 10.96.0.10
    clusterDomain: cluster.local
    cpuManagerReconcilePeriod: 0s
    evictionPressureTransitionPeriod: 0s
    fileCheckFrequency: 0s
    healthzBindAddress: 127.0.0.1
    healthzPort: 10248
    httpCheckFrequency: 0s
    imageMinimumGCAge: 0s
    kind: KubeletConfiguration
    logging:
      flushFrequency: 0
      options:
        json:
          infoBufferSize: "0"
      verbosity: 0
    memorySwap: {}
    nodeStatusReportFrequency: 0s
    nodeStatusUpdateFrequency: 0s
    resolvConf: /run/systemd/resolve/resolv.conf
    rotateCertificates: true
    runtimeRequestTimeout: 0s
    shutdownGracePeriod: 0s
    shutdownGracePeriodCriticalPods: 0s
    staticPodPath: /etc/kubernetes/manifests
    streamingConnectionIdleTimeout: 0s
    syncFrequency: 0s
    volumeStatsAggPeriod: 0s

/etc/default/kubeletにnative.cgroupdriver=systemd を追記してません。忘れていました。しかし、上記の通り、kubeletのConfigMapにはsystemdが指定されているし、k8sはとりあえず動作しているので、そのままにしてます。

/etc/default/kubelet

root@k8s01:~# cat /etc/default/kubelet
cat: /etc/default/kubelet: そのようなファイルやディレクトリはありません
root@k8s01:~#

次に続きます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

kubernetes v1.23.6/Nvidia GPUと戯れる その3