下記の記事の続き
[IBMCloud] IKS(Kubernetes) WorkerNodeバージョンアップ時の挙動確認①
テスト環境
東京リージョンの各ZoneにVSI for VPCのLinuxサーバを配置
- VSI for VPC(CentOS) TOK01 x 1台
- VSI for VPC(CentOS) TOK02 x 1台
- VSI for VPC(CentOS) TOK03 x 1台
サーバ側はIKS(Kubernetes)にてNginxのPodを各ゾーンに配置し、前段にIngressALBを配置
- Application load balancer for VPC x 1(TOK02/TOK03)
- IBM Cloud Kubernetes Service x 1Cluster(TOK02/TOK03)
テスト内容
前回の記事にてClusterMasterのバージョンアップを実施した。
今回はWorkerNodeのバージョンアップを実施する。
現在の状態
ClusterMaster:1.29.6_1545
% ibmcloud ks cluster ls
OK
名前 ID 状態 作成日 ワーカー ロケーション バージョン リソース・グループ名 プロバイダー
acs-paascluster-jp-tok co9j451t0t7uthsxxxxx normal 2 months ago 2 Tokyo 1.29.6_1545 acs-paas vpc-gen2
WorkerNode:1.28.11_1565
% ibmcloud ks worker ls --cluster co9j451t0t7uthsxxxxx
OK
ID プライマリー IP フレーバー 状態 状況 ゾーン バージョン
kube-co9j451t0t7uthsxxxxx-acspaasclus-lowspec-000006c4 10.244.128.21 cx2.2x4 normal Ready jp-tok-3 1.28.11_1565*
kube-co9j451t0t7uthsxxxxx-acspaasclus-lowspec-0000074c 10.244.64.28 cx2.2x4 normal Ready jp-tok-2 1.28.11_1565*
* 1.29.6_1546 バージョンに更新するには、「ibmcloud ks worker replace」を実行します。 更新する前に、必要なバージョンの変更があれば検討および実行してください: 'https://ibm.biz/upworker'
PODの稼働状態、各WorkerNodeで1台ずつNginxPodが動いている
% kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
iks-nginx-7b789b9b4-gkmwj 1/1 Running 0 18s 172.17.4.113 10.244.64.28 <none> <none>
iks-nginx-7b789b9b4-t8r5d 1/1 Running 0 18s 172.17.22.251 10.244.128.21 <none> <none>
バージョンアップの実施(Node1)
バージョンアップについては、下記のドキュメントを参照
VPC ワーカー・ノードの更新
ワーカー・ノードを置換して、マスター・バージョンと同じパッチ・バージョンに更新
% ibmcloud ks worker replace --cluster co9j451t0t7uthsxxxxx --worker kube-co9j451t0t7uthsxxxxx-acspaasclus-lowspec-000006c4 --update
置換ワーカー・ノードは同じゾーン内に同じフレーバーで作成されますが、新しいパブリック IP アドレスまたはプライベート IP アドレスを取得します。 置換中、すべてのポッドが他のワーカー・ノードにスケジュール変更される場合があり、ポッドの外部に保管されていないデータは削除されます。 ダウン時間を回避するには、選択したワーカー・ノードの置換中に、ワークロードを処理するために十分なワーカー・ノード数を確保するようにしてください。
ワーカー・ノード kube-co9j451t0t7uthsxxxxx-acspaasclus-lowspec-000006c4 を置換しますか? [y/N]> y
ワーカー・ノード kube-co9j451t0t7uthsxxxxx-acspaasclus-lowspec-000006c4 の削除とクラスター co9j451t0t7uthsxxxxx 内の新規ワーカー・ノードの作成中...
OK
通信確認
PodがDrainされて片方のノードで動き出した状態
更新を開始するとすぐにPodがDrainされてNode2の方で新しいPodが作成されていた
DrainされたPod:t8r5d
新しく作成されたPod:4c5ll
方系で動き続けていたPod:gkmwj
sv1 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:45
Hostname: iks-nginx-7b789b9b4-t8r5d Current Time: 2024-06-27 02:49:46
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:47
Hostname: iks-nginx-7b789b9b4-t8r5d Current Time: 2024-06-27 02:49:48
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:49
Hostname: iks-nginx-7b789b9b4-t8r5d Current Time: 2024-06-27 02:49:50
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:51
Hostname: iks-nginx-7b789b9b4-t8r5d Current Time: 2024-06-27 02:49:52
Hostname: iks-nginx-7b789b9b4-t8r5d Current Time: 2024-06-27 02:49:53
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:54
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:55
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:56
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:49:57
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:58
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:49:59
sv2 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-t8r5d Current Time: 2024-06-27 02:49:46
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:47
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:48
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:49
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:50
Hostname: iks-nginx-7b789b9b4-t8r5d Current Time: 2024-06-27 02:49:51
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:52
Hostname: iks-nginx-7b789b9b4-t8r5d Current Time: 2024-06-27 02:49:53
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:54
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:55
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:56
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:57
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:49:58
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:59
sv3 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-t8r5d Current Time: 2024-06-27 02:49:48
Hostname: iks-nginx-7b789b9b4-t8r5d Current Time: 2024-06-27 02:49:49
Hostname: iks-nginx-7b789b9b4-t8r5d Current Time: 2024-06-27 02:49:50
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:51
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:52
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:53
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:54
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:55
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:56
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:57
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:49:59
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:50:00
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:50:00
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:50:02
ノードを確認すると、ノード1台になっていることがわかる
% kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.244.64.28 Ready <none> 20h v1.28.11+IKS 10.244.64.28 10.244.64.28 Ubuntu 20.04.6 LTS 5.4.0-186-generic containerd://1.7.18
kh@khs-MacBook-Pro ~ %
しばらく待っていると、一時的にPodへの通信が停止した
sv1 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:30
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:51:34
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:36
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:51:42
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:49
sv2 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:29
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:51:31
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:51:34
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:39
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:43
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:46
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:51
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:52
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:53
sv3 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:29
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:51:31
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 02:51:34
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:39
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:43
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:46
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:51
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:52
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 02:51:53
7分程度で新しいノードが作成された
新しいノードはマスターと同じ、1.29に上がっている
% kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.244.128.23 Ready <none> 5m16s v1.29.6+IKS 10.244.128.23 10.244.128.23 Ubuntu 20.04.6 LTS 5.4.0-186-generic containerd://1.7.18
10.244.64.28 Ready <none> 20h v1.28.11+IKS 10.244.64.28 10.244.64.28 Ubuntu 20.04.6 LTS 5.4.0-186-generic containerd://1.7.18
Podは片方のノードで動いている状態
% kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
iks-nginx-7b789b9b4-4c5ll 1/1 Running 0 16m 172.17.4.114 10.244.64.28 <none> <none>
iks-nginx-7b789b9b4-gkmwj 1/1 Running 0 33m 172.17.4.113 10.244.64.28 <none> <none>
バージョンアップの実施(Node2)
もう片方のノードのバージョンアップを実施する
10.244.64.28 -> 1.28.11
% kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.244.128.23 Ready <none> 12m v1.29.6+IKS 10.244.128.23 10.244.128.23 Ubuntu 20.04.6 LTS 5.4.0-186-generic containerd://1.7.18
10.244.64.28 Ready <none> 20h v1.28.11+IKS 10.244.64.28 10.244.64.28 Ubuntu 20.04.6 LTS 5.4.0-186-generic containerd://1.7.18
バージョンアップの実行
% ibmcloud ks worker replace --cluster co9j451t0t7uthsxxxxx --worker kube-co9j451t0t7uthsxxxxx-acspaasclus-lowspec-0000074c --update
置換ワーカー・ノードは同じゾーン内に同じフレーバーで作成されますが、新しいパブリック IP アドレスまたはプライベート IP アドレスを取得します。 置換中、すべてのポッドが他のワーカー・ノードにスケジュール変更される場合があり、ポッドの外部に保管されていないデータは削除されます。 ダウン時間を回避するには、選択したワーカー・ノードの置換中に、ワークロードを処理するために十分なワーカー・ノード数を確保するようにしてください。
ワーカー・ノード kube-co9j451t0t7uthsxxxxx-acspaasclus-lowspec-0000074c を置換しますか? [y/N]> y
ワーカー・ノード kube-co9j451t0t7uthsxxxxx-acspaasclus-lowspec-0000074c の削除とクラスター co9j451t0t7uthsxxxxx 内の新規ワーカー・ノードの作成中...
OK
Cordonが実行された状態
% kubectl get node
NAME STATUS ROLES AGE VERSION
10.244.128.23 Ready <none> 17m v1.29.6+IKS
10.244.64.28 Ready,SchedulingDisabled <none> 20h v1.28.11+IKS
続けてPodがdrainされている状態
% kubectl get pod
NAME READY STATUS RESTARTS AGE
iks-nginx-7b789b9b4-4c5ll 1/1 Terminating 0 22m
iks-nginx-7b789b9b4-gkmwj 1/1 Terminating 0 39m
iks-nginx-7b789b9b4-jxwj2 0/1 ContainerCreating 0 15s
iks-nginx-7b789b9b4-lcg6m 0/1 ContainerCreating 0 15s
新しいPodが作成された状態
% kubectl get pod
NAME READY STATUS RESTARTS AGE
iks-nginx-7b789b9b4-jxwj2 1/1 Running 0 58s
iks-nginx-7b789b9b4-lcg6m 1/1 Running 0 58s
アップデート対象のノードが削除されている状態
% kubectl get node
NAME STATUS ROLES AGE VERSION
10.244.128.23 Ready <none> 19m v1.29.6+IKS
通信確認
PodがDrainされて片方のノードで動き出した状態
更新を開始するとすぐにPodがDrainされてNode1の方で新しいPodが作成されていた
DrainされたPod:4c5ll / gkmwj
新しく作成されたPod:lcg6m / jxwj2
sv1 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 03:13:05
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 03:13:06
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:13:38
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:13:40
sv2 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 03:13:05
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 03:13:06
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:13:38
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:13:38
sv3 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-gkmwj Current Time: 2024-06-27 03:13:05
Hostname: iks-nginx-7b789b9b4-4c5ll Current Time: 2024-06-27 03:13:06
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:13:38
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:13:40
しばらくすると通信断が発生している状態となった
sv1 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:12
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:13
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:14
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:15
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:18
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:25
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:30
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:31
sv2 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:11
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:12
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:17
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:20
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:21
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:26
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:31
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:32
sv3 -> IngressALB -> Node -> Pod
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:12
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:15
Request timed out or failed.
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:22
Request timed out or failed.
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:27
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:28
Hostname: iks-nginx-7b789b9b4-jxwj2 Current Time: 2024-06-27 03:14:29
Request timed out or failed.
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:32
Hostname: iks-nginx-7b789b9b4-lcg6m Current Time: 2024-06-27 03:14:33
約7分程度でバージョンアップが完了した
% kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.244.128.23 Ready <none> 23m v1.29.6+IKS 10.244.128.23 10.244.128.23 Ubuntu 20.04.6 LTS 5.4.0-186-generic containerd://1.7.18
10.244.64.30 Ready <none> 43s v1.29.6+IKS 10.244.64.30 10.244.64.30 Ubuntu 20.04.6 LTS 5.4.0-186-generic containerd://1.7.18
今回の検証では、30秒程度の停止が発生した
バージョンアップ自体はメンテナンスウィンドウで実施するのが良さそう