More than 5 years have passed since last update.

k8s での Pod のスケジューリングについて確認した時のメモ(NodeAffinity)

Posted at 2020-04-19

メモ。
以下の続きです。

k8s での Pod のスケジューリングについて確認した時のメモ(nodeSelector まで)

まとめ

nodeSelector と比較し、Node Affinity/Node Anti-Affinity によってより柔軟な設定ができる
必須条件と優先条件を指定出来る
必須条件を満たさない場合、Pod は Pending となり、スケジューリングされず起動しない
優先条件は満たさない場合でも他の Node のリソースなどがあればスケジューリングされ、起動しようとする
オペレーターの種類は複数あり、これによって柔軟な設定ができる
必須条件と優先条件は併記する事が出来る

環境

以下の記事と同じとなります。

k8s での Pod のスケジューリングについて確認した時のメモ(nodeSelector まで)

Affinity と Anti-Affinity

nodeSelector はとてもシンプルなスケジューリング方法だが、Affinity/Anti-Affinity を使うことでより柔軟な設定が可能

様々な指定方法がある (“AND条件”に限らない)
必須条件ではなく優先条件を指定でき、条件を満たさない場合でもPodをスケジュールさせることができる
Node自体のラベルではなく、Node(または他のトポロジカルドメイン)上で稼働している他のPodのラベルに対して条件を指定することができ、そのPodと同じ、または異なるドメインで稼働させることができる

3点目については Inter-Pod Affinity/Anti-Affinity に対するメリットであり、この記事では記載しない

Node Affinity

機能としては Node のラベルによって Pod がどの Node にスケジュールされるかを制限する機能であり、nodeSelector と同じ。
ただし、より柔軟な設定が出来る

以下の2種類の指定がある。

requiredDuringSchedulingIgnoredDuringExecution:「requiredDuringScheduling」とある通り、スケジューリングの際の必須条件
preferredDuringSchedulingIgnoredDuringExecution:「preferredDuringScheduling」とある通り、スケジューリングの際の優先条件

なお、「IgnoredDuringExecution」とある通り、実行中については条件は無視される。これは nodeSelector の機能と同様であり、Node のラベルが変更され、Pod がその条件を満たさなくなった場合でも Pod はその Node で稼働し続けるということを意味する。

検証

検証に利用する Node

以下の通り2つの Node が配置された環境を利用する。
デフォルトで付与されてるラベルは以下の通り

$k get nodes -o json|jq ".items[]|.metadata.labels"
{
  "alpha.eksctl.io/cluster-name": "test-cluster",
  "alpha.eksctl.io/instance-id": "i-0b26392bc266a5999",
  "alpha.eksctl.io/nodegroup-name": "standard-workers",
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/instance-type": "t3.medium",
  "beta.kubernetes.io/os": "linux",
  "disktype": "ssd",
  "failure-domain.beta.kubernetes.io/region": "ap-northeast-1",
  "failure-domain.beta.kubernetes.io/zone": "ap-northeast-1c",
  "kubernetes.io/arch": "amd64",
  "kubernetes.io/hostname": "ip-192-168-44-10.ap-northeast-1.compute.internal",
  "kubernetes.io/os": "linux"
}
{
  "alpha.eksctl.io/cluster-name": "test-cluster",
  "alpha.eksctl.io/instance-id": "i-02b72bf1fd5120455",
  "alpha.eksctl.io/nodegroup-name": "standard-workers",
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/instance-type": "t3.medium",
  "beta.kubernetes.io/os": "linux",
  "failure-domain.beta.kubernetes.io/region": "ap-northeast-1",
  "failure-domain.beta.kubernetes.io/zone": "ap-northeast-1d",
  "kubernetes.io/arch": "amd64",
  "kubernetes.io/hostname": "ip-192-168-65-145.ap-northeast-1.compute.internal",
  "kubernetes.io/os": "linux"
}

上記の通り、Node は ap-northeast-1c 及び ap-northeast-1d に起動されている。

requiredDuringSchedulingIgnoredDuringExecution

基本的な使い方

以下のようなマニュフェストを書いて特定 AZ の Node への配置を必須とする。

node-affinity.yaml

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: failure-domain.beta.kubernetes.io/zone
            operator: In
            values:
            - ap-northeast-1a
            - ap-northeast-1c
  containers:
  - name: with-node-affinity
    image: k8s.gcr.io/pause:2.0

$k apply -f node-affinity.yaml

# ap-northeast-1c の Node 上に起動
$k describe pods |grep Node
Node:         ip-192-168-44-10.ap-northeast-1.compute.internal/192.168.44.10
Node-Selectors:  <none>

無事想定通りの動きとなった。

条件を満たさない場合の動き

必須条件が満たされない時の動きも見る。
先程のマニュフェストを使いしつつ、事前に対象 Node は drain してスケジューリング出来ないようにしておく。

# Daemon set が動いているので --ignore-daemonsets を付与して drain
$k drain ip-192-168-44-10.ap-northeast-1.compute.internal --ignore-daemonsets
node/ip-192-168-44-10.ap-northeast-1.compute.internal already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/aws-node-4c74w, kube-system/kube-proxy-msjts
node/ip-192-168-44-10.ap-northeast-1.compute.internal drained

# 確認
$k get nodes
NAME                                                STATUS                     ROLES    AGE   VERSION
ip-192-168-44-10.ap-northeast-1.compute.internal    Ready,SchedulingDisabled   <none>   13d   v1.14.9-eks-1f0ca9
ip-192-168-65-145.ap-northeast-1.compute.internal   Ready                      <
none>   13d   v1.14.9-eks-1f0ca9

# Pod の起動
$k apply -f node-affinity.yaml
pod/with-node-affinity created

# Pending　状態になっている
$k get pods
NAME                 READY   STATUS    RESTARTS   AGE
with-node-affinity   0/1     Pending   0          43s

# 詳細を確認
$k describe pods with-node-affinity
・・・

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  54s   default-scheduler  0/2 nodes are available: 1
 node(s) didn't match node selector, 1 node(s) were unschedulable.

2つ Node があるが一方は node selector にマッチせず、もう一方は unschedulable なのでスケジューラーが配置できない状態であるというのが分かった。

つまり、必須要件の場合、名前の通り必須な条件なので Node がある場合でも要件に合致する Node がなければ配置しない。

先程 drain した Node は ready に戻しておく

$k uncordon ip-192-168-44-10.ap-northeast-1.compute.internal
node/ip-192-168-44-10.ap-northeast-1.compute.internal uncordoned

$k get nodes
NAME                                                STATUS   ROLES    AGE   VERSION
ip-192-168-44-10.ap-northeast-1.compute.internal    Ready    <none>   13d   v1.14.9-eks-1f0ca9
ip-192-168-65-145.ap-northeast-1.compute.internal   Ready    <none>   13d   v1.1
4.9-eks-1f0ca9

preferredDuringSchedulingIgnoredDuringExecution

基本的な使い方

先ほどのマニュフェストを少し変えて検証。
条件を必須から優先に変える

node-affinity.yaml

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      -  weight: 1
         preference:
           matchExpressions:
             - key: failure-domain.beta.kubernetes.io/zone
               operator: In
               values:
                 - ap-northeast-1a
                 - ap-northeast-1c
  containers:
  - name: with-node-affinity
    image: k8s.gcr.io/pause:2.0

主な変更点は以下

「requiredDuringSchedulingIgnoredDuringExecution」を「preferredDuringSchedulingIgnoredDuringExecution」に変更
preferredDuringSchedulingIgnoredDuringExecution には List を設定しつつ、preference を設定
weight は必須なようなので追加

これで検証

$k apply -f node-affinity.yaml
pod/with-node-affinity created

$k get pods
NAME                 READY   STATUS    RESTARTS   AGE
with-node-affinity   1/1     Running   0          2m58s

# 　優先条件として指定された ap-northeast-1c  への起動を確認
$k describe pods |grep Node
Node:         ip-192-168-44-10.ap-northeast-1.compute.internal/192.168.44.10
Node-Selectors:  <none>

条件を満たさない場合の動き

同じように優先条件に指定された内容が満たされない場合の挙動を確認する。

# drain する
$k drain ip-192-168-44-10.ap-northeast-1.compute.internal --ignore-daemonsets
node/ip-192-168-44-10.ap-northeast-1.compute.internal already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/aws-node-4c74w, kube-system/kube-proxy-msjts
node/ip-192-168-44-10.ap-northeast-1.compute.internal drained

# SchedulingDisabled　になっていることを確認
$k get nodes
NAME                                                STATUS                     ROLES    AGE   VERSION
ip-192-168-44-10.ap-northeast-1.compute.internal    Ready,SchedulingDisabled   <none>   13d   v1.14.9-eks-1f0ca9
ip-192-168-65-145.ap-northeast-1.compute.internal   Ready                      <none>   13d   v1.14.9-eks-1f0ca9

# Pod 作成
$k apply -f node-affinity.yaml
pod/with-node-affinity created

# Pod は Running
$k get pods
NAME                 READY   STATUS    RESTARTS   AGE
with-node-affinity   1/1     Running   0          28s

# 詳細を確認
$k describe pods with-node-affinity
・・・
Events:
  Type    Reason     Age   From
       Message
  ----    ------     ----  ----
       -------
  Normal  Scheduled  64s   default-scheduler
       Successfully assigned default/with-node-affinity to ip-192-168-65-145.ap-
northeast-1.compute.internal
  Normal  Pulling    64s   kubelet, ip-192-168-65-145.ap-northeast-1.compute.internal  Pulling image "k8s.gcr.io/pause:2.0"
  Normal  Pulled     61s   kubelet, ip-192-168-65-145.ap-northeast-1.compute.internal  Successfully pulled image "k8s.gcr.io/pause:2.0"
  Normal  Created    61s   kubelet, ip-192-168-65-145.ap-northeast-1.compute.internal  Created container with-node-affinity
  Normal  Started    61s   kubelet, ip-192-168-65-145.ap-northeast-1.compute.internal  Started container with-node-affinity

特に詳細な Events の記載はないがスケジューラのアサインが出来たことは確認出来た。
また、こちらは必須ではなく、優先なので条件が満たされない場合でも他の Node のリソースに余裕があればスケジューラーは Pod の起動を実行するということだと思われる。

その他

オペレーター(operator)

今回の検証では In のみ利用したが他にも以下がある

In
NotIn
Exists
DoesNotExist
Gt
Lt

NotIn や DoesNotExist などを利用することで Node-AntiAffinity を実現する。（xxx という条件ではない Node へ Pod の起動をするというような条件の場合）

requiredDuringSchedulingIgnoredDuringExecution と requiredDuringSchedulingIgnoredDuringExecution の併用

必須条件「requiredDuringSchedulingIgnoredDuringExecution」と優先条件「
requiredDuringSchedulingIgnoredDuringExecution」の併用も可能。

その為、以下のような要件の作成も可能

OS が Linux であることを必須条件とする。(beta.kubernetes.io/os =　linux)。加えて可能な限り ap-northeast-1c に配置して欲しい(failure-domain.beta.kubernetes.io/zone = ap-northeast-1c)

nodeSelector と比較すると難しい条件の指定も出来る。

k8s での Pod のスケジューリングについて確認した時のメモ(nodeSelector まで)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up