More than 1 year has passed since last update.

ClusterAutoscalerのpriority-expanderの挙動の調査

Posted at 2023-03-03

ClusterAutoscaler(以下CA)には priority-expander という機能があります。
これはノードを増やす際にどのNodeGroupを選択するかの優先付けができるものです。
以下のようなConfigMapを用意すること、--expander=priority とCAのオプションに追加することで利用できます。

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |-
    10: 
      - .*t2\.large.*
      - .*t3\.large.*
    50: 
      - .*m4\.4xlarge.*

数値が大きいほど優先度が高いです。
上記の例でいうと、NodeGroup名に m4.4xlarge のものが入っているものが優先的に選択されます。

今回、もっとも高い優先度のNodeGroupの一部でスケールアウトができなくなった場合、どのような挙動になるのか気になったので調査しました。

調査したいこと

以下のような優先度設定があったとします。

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |-
    50: 
      - .*spot.*
    10: 
      - .*ondemand.*

内容は、NodeGroup名にspotと入っているNodeGroupに優先的にスケールアウトしてもらうというものです。
このspotという文字列が入っているNodeGroupが複数いるとします。

test-spot-a
test-spot-c
test-spot-d

このうち、スケールアウトする際に選択されたNodeGroupが test-spot-a だとして、これがなんらかが原因でスケールアウトできない場合

cまたはdのほうをスケールアウトするような動きをするのか？
ondemand という文字列が含まれるNodeGroupでスケールアウトするような動きをするのか？

どちらなのかを確認していきたいと思います。

結論

まず priority-expander で最優先されているNodeGroup群で起動を試み、それがだめなら次の優先度のNodeGroup群で起動を試みるという挙動をするようです。

検証内容

検証用ManagedNodeGroup作成

3つのManagedNodeGroupを作りました。
それにより以下のようなAutoScalingGroup（以下ASG）が作成されます。

spotという文字列が含まれるNodeGroupが2つ
ondemandという文字列が含まれるNodeGroupが1つあります。

動作確認

ためしにtest-spot-aというASGのminを1にしてスケールアウトできないようにします。
この状態でスケールアウトが必要な状態になったらどうなるのか見てみました。

以下がその際のCAのログです。

I0303 06:29:03.475797       1 scale_up.go:411] Skipping node group eks-test-spot-a-a8c352c8-4b51-e420-bd9a-4031039477d3 - max size reached
I0303 06:29:03.476295       1 priority.go:163] priority expander: eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600 chosen as the highest available
I0303 06:29:03.476313       1 scale_up.go:477] Best option to resize: eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600
I0303 06:29:03.476329       1 scale_up.go:481] Estimated 1 nodes needed in eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600
I0303 06:29:03.476449       1 scale_up.go:601] Final scale-up plan: [{eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600 1->2 (max: 5)}]

ログを見る限り

まずtest-spot-aが選択されるが、MaxSizeに達していてサイズが変更できない
次にtest-spot-cが選ばれ、スケールアウトされる

という動きをしています。

次にtest-spot-cのminを操作し、これ以上スケールアウトしないようにします。

その際のログがこちら

I0303 06:37:45.968380       1 scale_up.go:411] Skipping node group eks-test-spot-a-a8c352c8-4b51-e420-bd9a-4031039477d3 - max size reached
I0303 06:37:45.968573       1 priority.go:163] priority expander: eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600 chosen as the highest available
I0303 06:37:45.968581       1 scale_up.go:477] Best option to resize: eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600
I0303 06:37:45.968587       1 scale_up.go:481] Estimated 1 nodes needed in eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600
I0303 06:37:45.968632       1 scale_up.go:601] Final scale-up plan: [{eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600 2->3 (max: 5)}]
I0303 06:37:45.968644       1 scale_up.go:700] Scale-up: setting group eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600 size to 3
I0303 06:37:45.968661       1 auto_scaling_groups.go:248] Setting asg eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600 size to 3
I0303 06:37:45.969199       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"1eea6592-913c-4af0-abdc-8b7c89459eaa", APIVersion:"v1", ResourceVersion:"35507", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600 size to 3 instead of 2 (max: 5)
W0303 06:37:46.130795       1 clusterstate.go:268] Disabling scale-up for node group eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600 until 2023-03-03 06:42:45.968203272 +0000 UTC m=+904.477115083; errorClass=Other; errorCode=cloudProviderError
I0303 06:37:46.130954       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"1eea6592-913c-4af0-abdc-8b7c89459eaa", APIVersion:"v1", ResourceVersion:"35507", FieldPath:""}): type: 'Warning' reason: 'FailedToScaleUpGroup' Scale-up failed for group eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600: ValidationError: New SetDesiredCapacity value 3 is above max value 2 for the AutoScalingGroup.
I0303 06:37:56.154936       1 scale_up.go:411] Skipping node group eks-test-spot-a-a8c352c8-4b51-e420-bd9a-4031039477d3 - max size reached
W0303 06:37:56.155047       1 scale_up.go:398] Node group eks-test-spot-c-10c3530b-67c0-f605-60d7-54632022a600 is not ready for scaleup - backoff
I0303 06:37:56.155202       1 priority.go:163] priority expander: eks-test-ondemand-a-c8c3530c-1752-ac17-5b2a-ea6bfc4b3368 chosen as the highest available
I0303 06:37:56.155216       1 scale_up.go:477] Best option to resize: eks-test-ondemand-a-c8c3530c-1752-ac17-5b2a-ea6bfc4b3368
I0303 06:37:56.155227       1 scale_up.go:481] Estimated 1 nodes needed in eks-test-ondemand-a-c8c3530c-1752-ac17-5b2a-ea6bfc4b3368
I0303 06:37:56.155310       1 scale_up.go:676] No info about pods passing predicates found for group eks-test-spot-a-a8c352c8-4b51-e420-bd9a-4031039477d3, skipping it from scale-up consideration
I0303 06:37:56.155336       1 scale_up.go:601] Final scale-up plan: [{eks-test-ondemand-a-c8c3530c-1752-ac17-5b2a-ea6bfc4b3368 1->2 (max: 5)}]
I0303 06:37:56.155368       1 scale_up.go:700] Scale-up: setting group eks-test-ondemand-a-c8c3530c-1752-ac17-5b2a-ea6bfc4b3368 size to 2
I0303 06:37:56.155410       1 auto_scaling_groups.go:248] Setting asg eks-test-ondemand-a-c8c3530c-1752-ac17-5b2a-ea6bfc4b3368 size to 2
I0303 06:37:56.155479       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"1eea6592-913c-4af0-abdc-8b7c89459eaa", APIVersion:"v1", ResourceVersion:"35556", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group eks-test-ondemand-a-c8c3530c-1752-ac17-5b2a-ea6bfc4b3368 size to 2 instead of 1 (max: 5)

ログを見る限り

まずtest-spot-aが選択されるが、MaxSizeに達していてサイズが変更できない
次にtest-spot-cが選択されるが、MaxSizeに達していてサイズが変更できない
test-ondemand-aが選ばれ、スケールアウトされる

ここまでの挙動を見る限り、最優先されているNodeGroupをすべて起動を試み、だめなら次の優先度のNodeGroupに行くようです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up