※この記事は3部作の2番目です。前編、後編は以下の2つです。
- Kubespray+Vagrant で かんたん Kubernetes デプロイ
- K8s v1.24 の non-graceful node shutdown を kube-fencing で自動化する
はじめに
前回 Kubespray+Vagrant で K8s v1.24.6 クラスタを構築したので、有効にしておいた non-graceful node shutdown を試してみましょう。
この機能を使えば、障害ノード上の StatefulSet による PV 付 Pod が他ノードに自動的に移動されない(=Pod の Status が Terminating のまま放置される)問題がある程度解決します。詳しくは毛利さんの記事を参照してください。
下準備
前回の続き。
まずはネームスペースでも作りましょうか。
yosshy@nuc2:~$ cd ~/csi-driver-nfs
yosshy@nuc2:~/csi-driver-nfs$ kubectl create ns kep2268
namespace/kep2268 created
テストに使用する StatefulSet を用意します。サンプルのマニフェストが deploy/example/statefulset.yaml
にあるので、こちらを使用します。
diff --git a/deploy/example/statefulset.yaml b/deploy/example/statefulset.yaml
index 0a7116d3..0effbb7d 100644
--- a/deploy/example/statefulset.yaml
+++ b/deploy/example/statefulset.yaml
@@ -7,7 +7,7 @@ metadata:
app: nginx
spec:
serviceName: statefulset-nfs
- replicas: 1
+ replicas: 8
template:
metadata:
labels:
マニフェストを適用しましょう。
yosshy@nuc2:~/csi-driver-nfs$ kubectl -n kep2268 apply -f deploy/example/statefulset.yaml
statefulset.apps/statefulset-nfs created
yosshy@nuc2:~/csi-driver-nfs$ kubectl -n kep2268 get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
statefulset-nfs-0 1/1 Running 0 102s 10.233.66.4 k8s-3 <none> <none>
statefulset-nfs-1 1/1 Running 0 95s 10.233.65.6 k8s-2 <none> <none>
statefulset-nfs-2 1/1 Running 0 88s 10.233.64.4 k8s-1 <none> <none>
statefulset-nfs-3 1/1 Running 0 67s 10.233.66.5 k8s-3 <none> <none>
statefulset-nfs-4 1/1 Running 0 59s 10.233.65.7 k8s-2 <none> <none>
statefulset-nfs-5 1/1 Running 0 54s 10.233.66.6 k8s-3 <none> <none>
statefulset-nfs-6 1/1 Running 0 48s 10.233.65.8 k8s-2 <none> <none>
statefulset-nfs-7 1/1 Running 0 41s 10.233.64.5 k8s-1 <none> <none>
yosshy@nuc2:~/csi-driver-nfs$ kubectl -n kep2268 get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-109775b8-183c-4948-a2ad-53c41a9d012b 10Gi RWX Delete Bound kep2268/persistent-storage-statefulset-nfs-0 nfs-csi 2m16s
pvc-4746d76c-d6f9-45e3-ad02-a0894d2842f6 10Gi RWX Delete Bound kep2268/persistent-storage-statefulset-nfs-2 nfs-csi 2m4s
pvc-508141df-f789-4e32-9939-ef3f12ec715f 10Gi RWX Delete Bound kep2268/persistent-storage-statefulset-nfs-5 nfs-csi 90s
pvc-666b6ab5-b27c-4432-8c9e-2cf4fad83bbf 10Gi RWX Delete Bound kep2268/persistent-storage-statefulset-nfs-1 nfs-csi 2m10s
pvc-99e0cb83-f0a7-49d4-b4ba-10e4c41c7c42 10Gi RWX Delete Bound kep2268/persistent-storage-statefulset-nfs-7 nfs-csi 76s
pvc-abfbf62c-bebf-43b6-a6e1-c335043d8695 10Gi RWX Delete Bound kep2268/persistent-storage-statefulset-nfs-4 nfs-csi 95s
pvc-d77c6baa-b6b3-407c-acf6-68f79754d772 10Gi RWX Delete Bound kep2268/persistent-storage-statefulset-nfs-6 nfs-csi 84s
pvc-f9269c79-f7db-4c08-b7dc-dfae3e97a7f4 10Gi RWX Delete Bound kep2268/persistent-storage-statefulset-nfs-3 nfs-csi 102s
yosshy@nuc2:~/csi-driver-nfs$
ノードに疑似障害を起こす
では、Pod の多い k8s-2 ノードに疑似障害を発生させましょう。いつくか方法はあります。
- ノード VM を停止する (off 又は pause)
- ノード VM の NIC を off にする
- kubelet を止める
今回はノード VM 停止(pause) でやってみましょう。vboxmanage
コマンドで VM の一覧を表示し、k8s-2 ノードに該当する VM を停止(pause)してみます。
yosshy@nuc2:~/csi-driver-nfs$ vboxmanage list vms
"generic-ubuntu1804-virtualbox_1655701800174_67684" {7e7d8758-7362-41f7-9c04-21d09c879c44}
"kubespray_k8s-1_1665804234747_39746" {a3512cf1-2cd1-492b-9ec9-5b5bf517b583}
"kubespray_k8s-2_1665804309774_65270" {28725bfd-7387-4489-9b09-57c5b8522f10}
"kubespray_k8s-3_1665804382129_19421" {a9f75c2a-33b7-4a32-9b5e-f3a996ad3f2c}
yosshy@nuc2:~/csi-driver-nfs$ vboxmanage controlvm kubespray_k8s-2_1665804309774_65270 pause
yosshy@nuc2:~/csi-driver-nfs$ kubectl get nodes -w
NAME STATUS ROLES AGE VERSION
k8s-1 Ready control-plane 10h v1.24.6
k8s-2 Ready <none> 10h v1.24.6
k8s-3 Ready <none> 10h v1.24.6
(中略)
k8s-2 NotReady <none> 10h v1.24.6
k8s-2 NotReady <none> 10h v1.24.6
(^Cで中断)
k8s-2 ノードが NotReady になりました。
問題の StatefulSet Pod の状態変化を見てみましょう(状態変化に 5分以上かかります)
yosshy@nuc2:~/csi-driver-nfs$ kubectl get pods -n kep2268 -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
statefulset-nfs-0 1/1 Running 0 7m53s 10.233.66.4 k8s-3 <none> <none>
statefulset-nfs-1 1/1 Running 0 7m46s 10.233.65.6 k8s-2 <none> <none>
statefulset-nfs-2 1/1 Running 0 7m39s 10.233.64.4 k8s-1 <none> <none>
statefulset-nfs-3 1/1 Running 0 7m18s 10.233.66.5 k8s-3 <none> <none>
statefulset-nfs-4 1/1 Running 0 7m10s 10.233.65.7 k8s-2 <none> <none>
statefulset-nfs-5 1/1 Running 0 7m5s 10.233.66.6 k8s-3 <none> <none>
statefulset-nfs-6 1/1 Running 0 6m59s 10.233.65.8 k8s-2 <none> <none>
statefulset-nfs-7 1/1 Running 0 6m52s 10.233.64.5 k8s-1 <none> <none>
statefulset-nfs-6 1/1 Terminating 0 11m 10.233.65.8 k8s-2 <none> <none>
statefulset-nfs-1 1/1 Terminating 0 12m 10.233.65.6 k8s-2 <none> <none>
statefulset-nfs-4 1/1 Terminating 0 11m 10.233.65.7 k8s-2 <none> <none>
(^Cで中断)
この Pod は誰かが強制削除しない限りずっと Terminating のままになります。
この問題は CSI の PV 付 Pod でのみ発生します。Pod 削除時の PV デタッチ処理の一部(ノード側処理)が障害ノード上で処理されず、それ以降の Pod 削除処理が進まない事に起因しています。
non-graceful node shutdown を使う
では、障害ノードに node.kubernetes.io/out-of-service
taint を付与し、上記の Pod を自動的に強制削除させてみます。
yosshy@nuc2:~/csi-driver-nfs$ kubectl taint nodes k8s-2 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute
node/k8s-2 tainted
yosshy@nuc2:~/csi-driver-nfs$ kubectl get pods -n kep2268 -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
statefulset-nfs-0 1/1 Running 0 13m 10.233.66.4 k8s-3 <none> <none>
statefulset-nfs-1 1/1 Terminating 0 13m 10.233.65.6 k8s-2 <none> <none>
statefulset-nfs-2 1/1 Running 0 13m 10.233.64.4 k8s-1 <none> <none>
statefulset-nfs-3 1/1 Running 0 13m 10.233.66.5 k8s-3 <none> <none>
statefulset-nfs-4 1/1 Terminating 0 13m 10.233.65.7 k8s-2 <none> <none>
statefulset-nfs-5 1/1 Running 0 12m 10.233.66.6 k8s-3 <none> <none>
statefulset-nfs-6 1/1 Terminating 0 12m 10.233.65.8 k8s-2 <none> <none>
statefulset-nfs-7 1/1 Running 0 12m 10.233.64.5 k8s-1 <none> <none>
statefulset-nfs-1 1/1 Terminating 0 13m 10.233.65.6 k8s-2 <none> <none>
statefulset-nfs-6 1/1 Terminating 0 12m 10.233.65.8 k8s-2 <none> <none>
statefulset-nfs-4 1/1 Terminating 0 13m 10.233.65.7 k8s-2 <none> <none>
statefulset-nfs-4 1/1 Terminating 0 13m 10.233.65.7 k8s-2 <none> <none>
statefulset-nfs-6 1/1 Terminating 0 12m 10.233.65.8 k8s-2 <none> <none>
statefulset-nfs-1 1/1 Terminating 0 13m 10.233.65.6 k8s-2 <none> <none>
statefulset-nfs-1 0/1 Pending 0 0s <none> <none> <none> <none>
statefulset-nfs-1 0/1 Pending 0 0s <none> k8s-1 <none> <none>
statefulset-nfs-1 0/1 ContainerCreating 0 0s <none> k8s-1 <none> <none>
statefulset-nfs-1 1/1 Running 0 2s 10.233.64.6 k8s-1 <none> <none>
statefulset-nfs-4 0/1 Pending 0 0s <none> <none> <none> <none>
statefulset-nfs-4 0/1 Pending 0 0s <none> k8s-3 <none> <none>
statefulset-nfs-4 0/1 ContainerCreating 0 0s <none> k8s-3 <none> <none>
statefulset-nfs-4 1/1 Running 0 4s 10.233.66.8 k8s-3 <none> <none>
statefulset-nfs-6 0/1 Pending 0 0s <none> <none> <none> <none>
statefulset-nfs-6 0/1 Pending 0 0s <none> k8s-1 <none> <none>
statefulset-nfs-6 0/1 ContainerCreating 0 0s <none> k8s-1 <none> <none>
statefulset-nfs-6 1/1 Running 0 2s 10.233.64.7 k8s-1 <none> <none>
(^Cで中断)
yosshy@nuc2:~/csi-driver-nfs$ kubectl get pods -n kep2268 -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
statefulset-nfs-0 1/1 Running 0 14m 10.233.66.4 k8s-3 <none> <none>
statefulset-nfs-1 1/1 Running 0 57s 10.233.64.6 k8s-1 <none> <none>
statefulset-nfs-2 1/1 Running 0 14m 10.233.64.4 k8s-1 <none> <none>
statefulset-nfs-3 1/1 Running 0 14m 10.233.66.5 k8s-3 <none> <none>
statefulset-nfs-4 1/1 Running 0 55s 10.233.66.8 k8s-3 <none> <none>
statefulset-nfs-5 1/1 Running 0 13m 10.233.66.6 k8s-3 <none> <none>
statefulset-nfs-6 1/1 Running 0 51s 10.233.64.7 k8s-1 <none> <none>
statefulset-nfs-7 1/1 Running 0 13m 10.233.64.5 k8s-1 <none> <none>
yosshy@nuc2:~/csi-driver-nfs$
StatefulSet の Pod なので、別のノード上で Pod が自動的に再作成されました。
k8s-2 の停止を解除して、クラスタに復帰させましょう。
yosshy@nuc2:~/csi-driver-nfs$ vboxmanage controlvm kubespray_k8s-2_1665804309774_65270 resume
yosshy@nuc2:~/csi-driver-nfs$ kubectl get nodes -w
NAME STATUS ROLES AGE VERSION
k8s-1 Ready control-plane 28h v1.24.6
k8s-2 NotReady <none> 28h v1.24.6
k8s-3 Ready <none> 28h v1.24.6
(中略)
k8s-2 Ready <none> 28h v1.24.6
k8s-2 Ready <none> 28h v1.24.6
(^C で中断)
yosshy@nuc2:~/csi-driver-nfs$ kubectl taint nodes k8s-2 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute-
node/k8s-2 untainted
yosshy@nuc2:~/csi-driver-nfs$
繰り返し
今度は k8s-3 を停止します。
yosshy@nuc2:~/csi-driver-nfs$ vboxmanage list vms
"generic-ubuntu1804-virtualbox_1655701800174_67684" {7e7d8758-7362-41f7-9c04-21d09c879c44}
"kubespray_k8s-1_1665804234747_39746" {a3512cf1-2cd1-492b-9ec9-5b5bf517b583}
"kubespray_k8s-2_1665804309774_65270" {28725bfd-7387-4489-9b09-57c5b8522f10}
"kubespray_k8s-3_1665804382129_19421" {a9f75c2a-33b7-4a32-9b5e-f3a996ad3f2c}
yosshy@nuc2:~/csi-driver-nfs$ vboxmanage controlvm kubespray_k8s-3_1665804382129_19421 pause
yosshy@nuc2:~/csi-driver-nfs$ kubectl get nodes -w
NAME STATUS ROLES AGE VERSION
k8s-1 Ready control-plane 26h v1.24.6
k8s-2 Ready <none> 26h v1.24.6
k8s-3 Ready <none> 26h v1.24.6
(中略)
k8s-3 NotReady <none> 26h v1.24.6
k8s-3 NotReady <none> 26h v1.24.6
(^C で中断)
yosshy@nuc2:~/csi-driver-nfs$ kubectl get pods -n kep2268 -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
statefulset-nfs-0 1/1 Running 0 19m 10.233.66.4 k8s-3 <none> <none>
statefulset-nfs-1 1/1 Running 0 5m42s 10.233.64.6 k8s-1 <none> <none>
statefulset-nfs-2 1/1 Running 0 19m 10.233.64.4 k8s-1 <none> <none>
statefulset-nfs-3 1/1 Running 0 18m 10.233.66.5 k8s-3 <none> <none>
statefulset-nfs-4 1/1 Running 0 5m40s 10.233.66.8 k8s-3 <none> <none>
statefulset-nfs-5 1/1 Running 0 18m 10.233.66.6 k8s-3 <none> <none>
statefulset-nfs-6 1/1 Running 0 5m36s 10.233.64.7 k8s-1 <none> <none>
statefulset-nfs-7 1/1 Running 0 18m 10.233.64.5 k8s-1 <none> <none>
statefulset-nfs-0 1/1 Terminating 0 24m 10.233.66.4 k8s-3 <none> <none>
statefulset-nfs-5 1/1 Terminating 0 23m 10.233.66.6 k8s-3 <none> <none>
statefulset-nfs-3 1/1 Terminating 0 23m 10.233.66.5 k8s-3 <none> <none>
statefulset-nfs-4 1/1 Terminating 0 10m 10.233.66.8 k8s-3 <none> <none>
(^C で中断)
先程同様、k8s-3 ノードに node.kubernetes.io/out-of-service
taint を付与して例の Pod を移動させます。
yosshy@nuc2:~/csi-driver-nfs$ kubectl taint nodes k8s-3 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute
node/k8s-3 tainted
yosshy@nuc2:~/csi-driver-nfs$ kubectl get pods -n kep2268 -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
statefulset-nfs-0 0/1 ContainerCreating 0 2s <none> k8s-2 <none> <none>
statefulset-nfs-1 1/1 Running 0 12m 10.233.64.6 k8s-1 <none> <none>
statefulset-nfs-2 1/1 Running 0 25m 10.233.64.4 k8s-1 <none> <none>
statefulset-nfs-6 1/1 Running 0 11m 10.233.64.7 k8s-1 <none> <none>
statefulset-nfs-7 1/1 Running 0 24m 10.233.64.5 k8s-1 <none> <none>
statefulset-nfs-0 1/1 Running 0 2s 10.233.65.10 k8s-2 <none> <none>
statefulset-nfs-3 0/1 Pending 0 0s <none> <none> <none> <none>
statefulset-nfs-3 0/1 Pending 0 0s <none> k8s-2 <none> <none>
statefulset-nfs-3 0/1 ContainerCreating 0 0s <none> k8s-2 <none> <none>
statefulset-nfs-3 1/1 Running 0 3s 10.233.65.11 k8s-2 <none> <none>
statefulset-nfs-4 0/1 Pending 0 0s <none> <none> <none> <none>
statefulset-nfs-4 0/1 Pending 0 0s <none> k8s-2 <none> <none>
statefulset-nfs-4 0/1 ContainerCreating 0 0s <none> k8s-2 <none> <none>
statefulset-nfs-4 1/1 Running 0 2s 10.233.65.12 k8s-2 <none> <none>
statefulset-nfs-5 0/1 Pending 0 0s <none> <none> <none> <none>
statefulset-nfs-5 0/1 Pending 0 0s <none> k8s-2 <none> <none>
statefulset-nfs-5 0/1 ContainerCreating 0 0s <none> k8s-2 <none> <none>
statefulset-nfs-5 1/1 Running 0 1s 10.233.65.13 k8s-2 <none> <none>
(^C で中断)
yosshy@nuc2:~/csi-driver-nfs$ kubectl get pods -n kep2268 -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
statefulset-nfs-0 1/1 Running 0 56s 10.233.65.10 k8s-2 <none> <none>
statefulset-nfs-1 1/1 Running 0 12m 10.233.64.6 k8s-1 <none> <none>
statefulset-nfs-2 1/1 Running 0 26m 10.233.64.4 k8s-1 <none> <none>
statefulset-nfs-3 1/1 Running 0 54s 10.233.65.11 k8s-2 <none> <none>
statefulset-nfs-4 1/1 Running 0 51s 10.233.65.12 k8s-2 <none> <none>
statefulset-nfs-5 1/1 Running 0 48s 10.233.65.13 k8s-2 <none> <none>
statefulset-nfs-6 1/1 Running 0 12m 10.233.64.7 k8s-1 <none> <none>
statefulset-nfs-7 1/1 Running 0 25m 10.233.64.5 k8s-1 <none> <none>
yosshy@nuc2:~/csi-driver-nfs$
先程同様、StatefulSet の Pod が別ノードに移動しました。
後始末をしておきましょう。
yosshy@nuc2:~/csi-driver-nfs$ vboxmanage controlvm kubespray_k8s-3_1665804382129_19421 resume
yosshy@nuc2:~/csi-driver-nfs$ kubectl get nodes -w
NAME STATUS ROLES AGE VERSION
k8s-1 Ready control-plane 29h v1.24.6
k8s-2 Ready <none> 28h v1.24.6
k8s-3 NotReady <none> 28h v1.24.6
(中略)
k8s-3 Ready <none> 28h v1.24.6
k8s-3 Ready <none> 28h v1.24.6
(^C で中断)
yosshy@nuc2:~/csi-driver-nfs$ kubectl taint nodes k8s-3 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute-
node/k8s-3 untainted
yosshy@nuc2:~/csi-driver-nfs$ kubectl delete ns kep2268
namespace "kep2268" deleted
yosshy@nuc2:~/csi-driver-nfs$
まとめ
Kubernetes v1.24 上で、node.kubernetes.io/out-of-service
taint 付与による障害ノード上の PV 付 Pod 削除機能の動作を確認しました。従来、障害ノード上の PV 付 Pod の個数分の Pod 強制削除コマンドを叩かないといけなかったのが、1回の taint 付与コマンド実行で済むようになったのは運用上の大きな改善と言えるでしょう。
今回はここまで。
後編「K8s v1.24 の non-graceful node shutdown を kube-fencing で自動化する」に続きます。