More than 3 years have passed since last update.

KubernetesのClusterAutoscalerでスケールインされない原因について調べてみた

Last updated at 2021-08-11Posted at 2021-08-11

ClusterAutoscaler(以下CA)を使用していて、NodeのCPUやメモリのリソースは余裕なのに全然スケールインされないなーと思ったら
CAにはスケールインしない条件が色々あるようなので調べてみました。

スケールインしない条件とは？

CAのFAQに記載されていました。
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node

以下のような条件を持つNodeはスケールイン対象外になります

対象Nodeに載っているPodがPodDisruptionBudget(以下PDB)によってEvictされることを制限されている
kube-system namespace配下のPodが存在してかつ
- そのPodはデフォルトで起動しないPodである
- そのPodにPDBが設定されていない、またはPDBの制限が厳しすぎる
DeploymentやStatefulSetなどのController Objectの管理外のPodが存在している
localStorageを持つPodが存在している
リソース不足などでPodがEvictできないとき
以下のアノテーション付与されているPodが存在している
- "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"

スケールインされないのか試してみる

PDBによってPodがEvictされなかったり、リソース不足が理由の場合は想像つくのでいいとして
それ以外の理由の場合の動作を確かめてみたいので、実際に試してみました。

検証する上での前提条件を記載します。

環境はEKS
Kubernetesのバージョンはv1.21.0
ClusterAutoscalerのオプションは以下

- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<YOUR CLUSTER NAME>

kube-system namespace配下のPod

以下のようなPodをkube-systemに配置してみて、スケールインするかやってみます。
すべてのNodeで同じPodを起動させたいので topologySpreadConstraints を指定してます。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deployment
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: test
  replicas: 3
  template:
    metadata:
      labels:
        app: test
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: test
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

しばらく様子を見てみると、CAのログにこのようなものが出てきました。

Fast evaluation: node ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal cannot be removed: non-daemonset, non-mirrored, non-pdb-assigned kube-system pod present: test-deployment-69f99ff6f7-fj7hs

ということで、kube-system にPodを配置しているとEvictされず、そのPodが載っているNodeがスケールインの対象外となります。
それだと困る場合はCAのオプションに --skip-nodes-with-system-pods=false を追加します。
このオプションを追加しておくことで、kube-system 配下のPodもEvictの対象になります。

Controller Objectの管理外のPod

以下のようなPodを用意します。

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: default
spec:
  containers:
  - name: nginx
    image: nginx:latest
    ports:
    - containerPort: 80

しばらく様子を見てみると、CAのログにこのようなものが出てきました。

Fast evaluation: node ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal cannot be removed: default/test-pod is not replicated

ということで、Pod単体で起動してるものはEvictされず、そのPodが載っているNodeはスケールインの対象外になってしまいます。

こういうPodを起動しないよう、Deploymentなどを使用してPodを起動するようにしましょう。
またデバッグの目的でPodを起動させる際も、利用後Podが残らないように気をつけましょう。

localStorageを持つPod

これは emptyDir や hostPath を使用しているPodのことです。
これも試してみます。

以下のようなDeploymentを用意します。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  namespace: default
spec:
  selector:
    matchLabels:
      app: test
  replicas: 3
  template:
    metadata:
      labels:
        app: test
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: test
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

しばらく様子を見てみると、CAのログにこのようなものが出てきました。

Fast evaluation: node ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal cannot be removed: pod with local storage present: test-654ccc4768-8tt7p

ということで、localStorageを持つPodを配置しているとEvictされず、そのPodが載っているNodeがスケールインの対象外となります。
それだと困る場合はCAのオプションに --skip-nodes-with-local-storage=false を追加します。
このオプションを追加しておくことで、localStorageを持つPodもEvictの対象になります。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up