More than 5 years have passed since last update.

Kubernetes の drain について検証した時のメモ

Posted at 2019-01-11

Kubernetes の drain について検証した時のメモ。

Safely Drain a Node while Respecting Application SLOs

環境

$kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-28T20:03:09Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.5-eks-6bad6d", GitCommit:"6bad6d9c768dc0864dab48a11653aa53b5a47043", GitTreeState:"clean", BuildDate:"2018-12-06T23:13:14Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

drain って何

Maintenance on a Node

Node のメンテナンス(reboot)などを行う必要がある場合に kubectl drainを行うと Node で動いている Pod について gracefully に terminate される。また、ReplicaSet を使っていれば別の Node で Pod が自動的に起動される。
メンテナンス完了後、kubectl uncordonを行うことで再度 Pod がスケジューリングされる状態になる

試す

ReplicaSet 3 として設定。
現在の状況を確認。

# Node は3つ
$kubectl get nodes
NAME                                              STATUS    ROLES     AGE       VERSION
ip-172-31-0-56.ap-northeast-1.compute.internal    Ready     <none>    21d       v1.11.5
ip-172-31-19-51.ap-northeast-1.compute.internal   Ready     <none>    21d       v1.11.5
ip-172-31-23-75.ap-northeast-1.compute.internal   Ready     <none>    21d       v1.11.5
     1d

# Replicaset で 3つ Pod を作成
$kubectl get replicaset
NAME                           DESIRED   CURRENT   READY     AGE
sample-deployment-86d576464c   3         3         3         13d

# Pod も想定通り3つ起動している
$kubectl get pods |grep sample
sample-deployment-86d576464c-wkhjd   1/1       Running     0          13d
sample-deployment-86d576464c-x75hx   1/1       Running     0          13d
sample-deployment-86d576464c-xbsmw   1/1       Running     0     

# 各 Pod はそれぞれの Node で起動
$kubectl describe pods sample-deployment-86d576464c-wkhjd |grep Node
Node:               ip-172-31-19-51.ap-northeast-1.compute.internal/172.31.19.51
Node-Selectors:  <none>

$kubectl describe pods sample-deployment-86d576464c-x75hx |grep Node
Node:               ip-172-31-0-56.ap-northeast-1.compute.internal/172.31.0.56
Node-Selectors:  <none>

$kubectl describe pods sample-deployment-86d576464c-xbsmw |grep Node
Node:               ip-172-31-23-75.ap-northeast-1.compute.internal/172.31.23.75
Node-Selectors:  <none>

この状態で一つの Node がメンテナンスをすると仮定し drain する。

# cordoned は成功したが drain は失敗
$kubectl drain ip-172-31-23-75.ap-northeast-1.compute.internal
node "ip-172-31-23-75.ap-northeast-1.compute.internal" cordoned
error: unable to drain node "ip-172-31-23-75.ap-northeast-1.compute.internal", aborting command...

There are pending nodes to be drained:
 ip-172-31-23-75.ap-northeast-1.compute.internal
error: DaemonSet-managed pods (use --ignore-daemonsets to ignore): aws-node-h9dmt, kube-proxy-jvnsz

# cordoned は成功した為、SchedulingDisabled になっている
$kubectl get nodes
NAME                                              STATUS                     ROLES     AGE       VERSION
ip-172-31-0-56.ap-northeast-1.compute.internal    Ready                      <none>    21d       v1.11.5
ip-172-31-19-51.ap-northeast-1.compute.internal   Ready                      <none>    21d       v1.11.5
ip-172-31-23-75.ap-northeast-1.compute.internal   Ready,SchedulingDisabled   <none>    21d       v1.11.5

失敗した理由は DeamonSet の Pod が存在するため。
--ignore-daemonsetsオプションを指定する事で無視出来る。
オプションを指定して実行。

$kubectl drain ip-172-31-23-75.ap-northeast-1.compute.internal --ignore-daemonsets
node "ip-172-31-23-75.ap-northeast-1.compute.internal" already cordoned
WARNING: Ignoring DaemonSet-managed pods: aws-node-h9dmt, kube-proxy-jvnsz
pod "sample-deployment-86d576464c-xbsmw" evicted
node "ip-172-31-23-75.ap-northeast-1.compute.internal" drained

成功。
ReplicaSet の Pod である「sample-deployment-86d576464c-xbsmw」が削除された記述がある。

現状確認。

# Node の状態として SchedulingDisabled
$kubectl get nodes
NAME                                              STATUS                     ROLES     AGE       VERSION
ip-172-31-0-56.ap-northeast-1.compute.internal    Ready                      <none>    21d       v1.11.5
ip-172-31-19-51.ap-northeast-1.compute.internal   Ready                      <none>    21d       v1.11.5
ip-172-31-23-75.ap-northeast-1.compute.internal   Ready,SchedulingDisabled   <none>    21d       v1.11.5

# 一つだけ AGE が 2m になっている
$kubectl get pods |grep sample
sample-deployment-86d576464c-df68k   1/1       Running     0          2m
sample-deployment-86d576464c-wkhjd   1/1       Running     0          13d
sample-deployment-86d576464c-x75hx   1/1       Running     0          13d

# Pod は別 Node で起動
$ kubectl describe pod sample-deployment-86d576464c-df68k |grep Node
Node:               ip-172-31-19-51.ap-northeast-1.compute.internal/172.31.19.51
Node-Selectors:  <none>

想定通り、ReplicaSet のため、希望数を維持するために別の Node で Pod が起動していた。
次に Node でのメンテナンスが終わったとして Ready に戻す。
上記については kubectl uncordonで実施する

# uncordon する
$kubectl uncordon ip-172-31-23-75.ap-northeast-1.compute.internal
node "ip-172-31-23-75.ap-northeast-1.compute.internal" uncordoned

# Ready に戻っている
$ kubectl get node
NAME                                              STATUS    ROLES     AGE       VERSION
ip-172-31-0-56.ap-northeast-1.compute.internal    Ready     <none>    21d       v1.11.5
ip-172-31-19-51.ap-northeast-1.compute.internal   Ready     <none>    21d       v1.11.5
ip-172-31-23-75.ap-northeast-1.compute.internal   Ready     <none>    21d       v1.11.5

上記によって対象 Node への Pod の配置が再度スケジューリングされるようになる。

試しに先程起動した Pod を削除してみる。

# 別 Node で動いた Pod を削除する
$kubectl delete pod sample-deployment-86d576464c-df68k
pod "sample-deployment-86d576464c-df68k" deleted

# 1つの Pod が新しく起動
$ kubectl get pods|grep sample
sample-deployment-86d576464c-kk8xr   1/1       Running     0          1m
sample-deployment-86d576464c-wkhjd   1/1       Running     0          13d
sample-deployment-86d576464c-x75hx   1/1       Running     0          13d

# 新しい Pod はメンテナンスをした Node 上で起動している
$kubectl describe pod sample-deployment-86d576464c-kk8xr|grep Node
Node:               ip-172-31-23-75.ap-northeast-1.compute.internal/172.31.23.75
Node-Selectors:  <none>

一時的にメンテナンスした Node にて Pod が起動した。
以下のような記述が確認でき、デフォルトで Pod は Node毎に分散されそう（直接的な文章は見つからず）

Assigning Pods to Nodes

Generally such constraints are unnecessary, as the scheduler will automatically do a reasonable placement (e.g. spread your pods across nodes, not place the pod on a node with insufficient free resources, etc.)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up