More than 3 years have passed since last update.

Kubernetes 1.23: Metrics Changes と SIG Instrumentation の変更内容

kubernetes

Last updated at 2022-01-11Posted at 2022-01-11

はじめに

ここでは、Kubernetes v1.23 の CHANGELOG から Metrics の変更と SIG Instrumentation の取り組みについてまとめています。

今回 Kubernetes v1.23 で SIG Instrumentation が扱う主要なテーマは以下となります。

klog 固有フラグの非推奨

ソースコードをシンプルに保つため、Kubernetes 1.23 ではいくつかのロギングオプションが非推奨となりました。これらのオプションは将来のバージョンで削除されます。

--add-dir-header
--alsologtostderr
--log-backtrace-at
--log-dir
--log-file
--log-file-max-size
--logtostderr
--one-output
--skip-headers
--skip-log-headers
--stderrthreshold

これは次の KEP で取り組まれています。

KEP-2845: Deprecate klog specific flags in Kubernetes Compnents

klog は glog からフォークしており、多くのケースで利用されていない機能が存在します。新しい機能を追加する際には、これらの利用されていない機能との互換性を考慮する必要があるため、仕様の複雑化や性能劣化などの問題が生じています。

これらの課題を根本的に解決するためには多大な開発コストをかけて改修する必要があります。ですが改修して対応できたとしてもロギング・ライブラリのメンテナンスコストを払い続ける問題は残ります。
そもそもログローテのように、他のコンポーネントに委譲した方が良い機能もあります。この KEP ではロギングライブラリの責務を小さくすることで、メンテナンスコストの削減と品質の向上を推進することを目指す方針となっています。

構造化ロギングが Beta に移行

構造化ロギングのマイルストーンが Beta に到達しました。
kubelet と kube-scheduler コンポーネントのログメッセージの大部分が構造化ロギングに移行しました。
利用者は JSON フォーマット、または構造化テキストのコマンドオプションを試し、複数行の文字列処理や未解決な問題への解決策についてフィードバックを提供することが推奨されています。

構造化ロギングについては Kubernetes 1.19 からこの KEP 1602-structured-logging で取り組まれています。フィードバックを収集し大きな問題がなければ v1.26 で Stable になる予定です。

その他

その他の SIG Instrumentation の取り組みとしては、以前は SIG Instrumentation で管理され、現在は SIG Security で管理されている次の KEP が Stable になりました。

KEP 1933-secret-logging-static-analysis

静的解析ツールの google/go-flow-levee を利用し、ロギングによる Secret 情報の漏洩を改善する KEP のようです。
KEP: 1753-logs-sanitizationと同じく、あくまで保険的な機構だという点は注意が必要です。

それでは Metrics Changes について説明していきます。
メトリクスの変更 は、私が全ての変更点からメトリクスの変更に関連するものを抜粋した内容となります。

メトリクスの変更（Metrics Changes）

追加

kube-apiserver

廃止予定の apiserver_longrunning_gauge を置き換えるメトリクス(#103799, @￰jyz0309)
- Gauge: apiserver_longrunning_requests
  - Labels: {"verb", "group", "version", "resource", "subresource", "scope", "component"}
  - Help: Gauge of all active long-running apiserver requests broken out by verb, group, version, resource, scope and component. Not all requests are tracked this way.

API Priority and Fairness 関連のメトリクス (#105873, @￰MikeSpreitzer)
- Histogram: apiserver_flowcontrol_priority_level_seat_count_samples
  - Help: Periodic observations of the number of requests
  - Labels: {"phase", "priority_level"}
- Histogram: apiserver_flowcontrol_priority_level_seat_count_watermarks
  - Help: Watermarks of the number of requests
  - Labels: {"phase", "priority_level"}
- Histogram: apiserver_flowcontrol_watch_count_samples
  - Help: count of watchers for mutating requests in API Priority and Fairnes
  - Labels: {"priorityLevel", "flowSchema"}

Pod Security admission のメトリクス (#105898, @￰tallclair)
- Counter: pod_security_evaluations_total
  - Help: Counter of pod security evaluations
  - Labels: {"decision", "policy_level", "policy_version", "mode", "operation", "resource", "subresource"}
- Counter: pod_security_exemptions_total
  - Help: Number of exempt requests, not counting ignored or out of scope requests.
  - Labels: {"request_operation", "resource", "subresource"}
- Counter: pod_security_errors_total
  - Help: Number of errors preventing normal evaluation. Non-fatal errors may result in the latest restricted profile being used for evaluation.
  - Labels: {"fatal", "request_operation", "resource", "subresource"}

Admission Webhook 関連のメトリクス(#103162, @￰rmoriar1)
- Counter: apiserver_admission_webhook_request_total
  - Help: Admission webhook request total, identified by name and broken out for each admission type (validating or mutating) and operation. Additional labels specify whether the request was rejected or not and an HTTP status code. Codes greater than 600 are truncated to 600, to keep the metrics cardinality bounded.
  - Labels: {"name", "type", "operation", "code", "rejected"}

LIST リクエストの処理にかかるコストを表すメトリクス(#104983, @￰MikeSpreitzer)
- Counter: apiserver_cache_list_total
  - Watch のキャッシュから取得した LIST リクエスト数
  - Help: Number of LIST requests served from watch cache
  - Labels: {"resource_prefix", "index_name"}
- Counter: apiserver_cache_list_fetched_objects_total
  - LIST リクエストで Watch のキャッシュから読み込まれたオブジェクト数
  - Help: Number of objects read from watch cache in the course of serving a LIST request
  - Labels: {"resource_prefix", "index_name"}
- Counter: apiserver_cache_list_returned_objects_total
  - LIST リクエストで Watch のキャッシュからリターンされたオブジェクト数
  - Help: Number of objects returned for a LIST request from watch cache
  - Labels: {"resource_prefix"}
- Counter: apiserver_storage_list_total
  - etcd から取得した LIST リクエスト数
  - Help: Number of LIST requests served from storage
  - Labels: {"resource"}
- Counter: apiserver_storage_list_fetched_objects_total
  - LIST リクエスト処理で etcd から読み込まれたオブジェクト数
  - Help: Number of objects read from storage in the course of serving a LIST request
  - Labels: {"resource"}
- Counter: apiserver_storage_list_evaluated_objects_total
  - LIST リクエストで etcd から評価されたオブジェクト数
  - Help: Number of objects tested in the course of serving a LIST request from storage
  - Labels: {"resource"}
- Counter: apiserver_storage_list_returned_objects_total
  - LIST リクエストで etcd からリターンされたオブジェクト数
  - Help: Number of objects returned for a LIST request from storage
  - Labels: {"resource"}

Service の CIDRs 割当を記録するメトリクス(#104119, @￰aojea)
- Gauge: kube_apiserver_clusterip_allocator_allocated_ips
  - Help: Service CIDR ごとの割り当てられた Cluster IPs
  - Labels: {"cidr"}
- Gauge: kube_apiserver_clusterip_allocator_available_ips
  - Help: Service CIDR ごとの割り当て可能な Cluster IPs
  - Labels: {"cidr"}
- Gauge: kube_apiserver_clusterip_allocator_allocation_total
  - Help: Service CIDR ごとのクラスタ IP の割り当て数
  - Labels: {"cidr"}
- Gauge: kube_apiserver_clusterip_allocator_allocation_errors_total
  - Help: Service CIDR ごとの割り当てエラー数
  - Labels: {"cidr"}

kube-controller-manager

Job コントローラーがトラックしている終了した Pod 数を記録。このメトリクスは Feature Gate JobTrackingWithFinalizers が有効な場合に記録されます(#105197, @￰alculquicondor)
- Counter: job_pod_finished_total
  - Help: The number of finished Pods that are fully tracked
  - Labels: {"completion_mode", "result"}

変更

kube-apiserver

次のメトリクスの StabilityLevel が Stable になりました(#106122, @￰rezakrimi)
- apiserver_admission_controller_admission_duration_seconds
- apiserver_admission_step_admission_duration_seconds
- apiserver_admission_webhook_admission_duration_seconds
- apiserver_current_inflight_requests
- apiserver_response_sizes

カーディナリティの問題があるため、admission 関連のメトリクスに付与した namespace のラベルをリバートし削除しました(#104033, @￰s-urbaniak)
- 1.22 で入ったコミットのため 1.22 にも cherry-pick されています
  - Add a namespace label to admission metrics and expand histogram range to 0-10s by voutcn · Pull Request #101208 · kubernetes/kubernetes
  - Automated cherry pick of #104033: Revert "Add a namespace label to admission metrics and expand by rphillips · Pull Request #104037 · kubernetes/kubernetes

カーディナリティが高いため、バケットレンジを変更しました(#106306, @￰pawbana)
- apiserver_request_duration_seconds
  - from: [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60]
  - to: [0.05, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 1.25, 1.5, 2, 3, 4, 5, 6, 8, 10, 15, 20, 30, 45, 60]

次のメトリクスが非推奨となりました(#103793, @￰yan-lgtm)
- apiserver_longrunning_gauge
- apiserver_register_watchers

kube-scheduler

次のメトリクスの StabilityLevel が Stable になりました (#105941, @￰rezakrimi)
- scheduler_pending_pods
- scheduler_preemption_attempts_total
- scheduler_preemption_victims
- scheduler_schedule_attempts_total
次のメトリクスの StabilityLevel が Stable になりリネームされました(#105941, @￰rezakrimi)
- scheduler_e2e_scheduling_duration_seconds => scheduler_scheduling_attempt_duration_seconds

次のメトリクスの StabilityLevel が Stable になりました(#106266, @￰ahg-g)
- scheduler_pod_scheduling_duration_seconds
- scheduler_pod_scheduling_attempts
- scheduler_framework_extension_point_duration_seconds
- scheduler_plugin_execution_duration_seconds
- scheduler_queue_incoming_pods_total

kubelet

次のメトリクスがリネームされました (#105885, @￰gnufied)
- volume_fsgroup_recursive_apply => volume_apply_access_control

文字数の長いエラーメッセージが記録されてしまうため、次のメトリクスから message ラベルが削除されました(#105213, @￰yxxhero)
- kubelet_started_pods_errors_total

修正

kube-apiserver

非推奨となっている /api/<version>/watch/... パスにリクエストされた際のメトリクスが WATCH として正しく記録されるように修正されました(#104161, @￰wojtek-t)

kubelet

Generic Ephemeral Volumes が次のメトリクスで記録されるように修正されました(#105569, @￰pohly)
- kubelet_volume_stats_*

kube-scheduler

次のメトリクスのバケットレンジを修正しました(#100720, @￰dntosas)
- scheduler_volume_scheduling_duration_seconds_bucket
- metrics.ExponentialBuckets の引数が 1000 から 0.001 に修正されています

2.5Gi、1.1Ki などの小数点を含む 2 進接頭辞の場合に、スケジューラーのメトリクスが非常に小さい値として記録されていた点が修正されました(#103751, @￰y-tag)
- kube_pod_resource_request
- kube_pod_resource_limit
- 過去バージョンに cherry-pick されてないのがちょっと気になりますね。

kube-proxy

次のメトリクスの値が間違っていたため修正され、正しいルール数が記録されるようになりました(#106030, @￰danwinship)
- kubeproxy_sync_proxy_rules_iptables_total
- Kubernetes v1.22 で導入された複数の iptables proxy に関する不具合が修正されました
  - Service で SessionAffinity を利用し、Endpoint が Non-Ready になると、Endpoint に対するクライアントアフィニティが壊れることがありました
  - Service IP へのトラフィックは使用可能な Endpoint がなくなった時点でリジェクトされるようになり、停止中の Endpoint が使用されていない場合でも、すべての停止中の Endpoint が終了するまで待つ必要がなくなりました
  - 使用されていない Endpoint 用の iptables のチェインのリソース（メモリ/時間/CPU）を少し節約できるようになりました

削除

kube-scheduler

次の非推奨だったメトリクスが削除されました(#104518, @￰dntosas)
- scheduler_volume_scheduling_duration_seconds

メトリクス以外の変更

API 変更(API Changes)

kubelet ではログの Verbosity （--v）とフラッシュ頻度（--log-flush-frequency）をコマンドラインオプションだけでなく、設定ファイルからも設定できるようになりました。ヘルプのテキストでは Global、または Misc グループから、Logs グループで表示されるようになりました。また -vmodule オプションの説明が改善されています(#106090, @￰pohly)。

$ docker run -it k8s.gcr.io/kube-controller-manager-amd64:v1.23.0 kube-controller-manager --help
...
Logs flags:

      --experimental-logging-sanitization
                [Experimental] When enabled prevents logging of fields tagged as sensitive (passwords, keys, tokens).
                Runtime log sanitization may introduce significant computation overhead and therefore should not be enabled in production.
      --log-flush-frequency duration
                Maximum number of seconds between log flushes (default 5s)
      --log-json-info-buffer-size quantity
                [Experimental] In JSON format with split output streams, the info messages can be buffered for a while to increase performance. The default value of zero bytes
                disables buffering. The size can be specified as number of bytes (512), multiples of 1000 (1K), multiples of 1024 (2Ki), or powers of those (3M, 4G, 5Mi, 6Gi).
      --log-json-split-stream
                [Experimental] In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout.
      --logging-format string
                Sets the log format. Permitted formats: "json", "text".
                Non-default formats don't honor these flags: --add-dir-header, --alsologtostderr, --log-backtrace-at, --log-dir, --log-file, --log-file-max-size, --logtostderr,
                --one-output, --skip-headers, --skip-log-headers, --stderrthreshold, --vmodule.
                Non-default choices are currently alpha and subject to change without warning. (default "text")
  -v, --v Level
                number for the log level verbosity
      --vmodule pattern=N,...
                comma-separated list of pattern=N settings for file-filtered logging (only works for text log format)
...

非推奨(Deprecation)

kube-scheduler: --port と --address フラグは v1.24 で削除される予定です。セキュアではないポートフラグ --port は、現在 0 にのみ設定できます。また kubescheduler.config.k8s.io/v1beta1 の metricsBindAddress と healthzBindAddress フィールドは設定されても無視されるため、設定しないことが期待されています。kubescheduler.config.k8s.io/v1beta2 で完全に削除されました(#96345, @ingvagabund) なお次の点に注意してください
- (MUST) kube-scheduler は認証/認可を動作させるために --authorization-kubeconfig と --authentication-kubeconfig を正しくセットする必要があります
- (MUST) kube-scheduler への Liveness/Readiness Probe は HTTPS を使用しなければならず、デフォルトのポートは 10259 に変更されました
- (Should) kube-scheduler からメトリクスを取得するアプリケーション（Prometheus など）は、nonResourceURLs の /metrics へアクセスが許可された専用の Servcie Account を使うべきです

機能追加(Feature)

controller-manager で動的にログレベルを変更できる機能がサポートされました(#104571, @￰h4ghhh)
- 詳細はこちらに素晴らしい記事があります！
  - https://qiita.com/everpeace/items/a12d378c47c3ae30602f

STABLE メトリクス名に定数や変数が利用できるようになりました(#103654, @￰coffee pac)
- メトリクスの StabilityLevel をチェックする都合上(test/instrumentation/stability-utils.sh)、メトリクス名は文字列リテラルが期待されていたようですが、ツールで使うパーケージを改修し定数や変数が利用できるように対応したようです。

不具合の修正(Bug or Regression)

kubelet: 起動時に表示されるフラグ内容に最終的に利用されるロギングの設定内容が表示されるように修正されました(#106520, @￰pohly)

ログ出力フォーマットに JSON を使用した場合、Kubernetes オブジェクトの名前と名前空間を構造化ロギングで出力するように修正しました(#104877, @￰pohly)
- ex) {"ts":%f,"caller":"json/klog_test.go:%d","msg":"some","v":0,"pod":{"name":"pod-1","namespace":"kube-system"}}

--log-flush-frequency はいくつかのコンポーネントで実装されていなかったり、無視されます。ヘルプや警告のテキストは、コマンドに対して必ずしも正しいオプション名を表示できていません（add-dir-header ではなく add_dir_header になるなど）。この PR には component-base/logs のコマンドオプションのクリーンアップも含まれます。いくつかのロギング系のコマンドオプションは Global グループではなく Logs グループで表示されます。klog と --log-flush-frequency オプションが必要なコマンドは明示的に logs.AddFlags 関数を呼び出す必要があります。新しい cli.Run ヘルパー関数は、コマンドオプションの正規化や、一貫した利用方法とエラーの表示を行います。コマンドオプションのパースに失敗した場合は使用方法のヘルプを最初に表示し、その後でエラーを表示します(#105076, @￰pohly)

そのほか(Cleanup or Flake)

-v と -vmodule 以外のすべての klog オプションは非推奨となりました。-vmodule オプションのサポートは、ログフォーマットがテキストの場合のみ利用できます(#105042, @￰pohly)
次のパッケージ、もしくはファイルは構造化ロギングに移行しました
- cmd/proxy/app、pkg/proxy/meta_proxier(#104928, @￰jyz0309)
- cmd/proxy/{config, healthcheck, winkernel}(#104944, @￰jyz0309)
- cmd/kube-scheduler/app/server.go, pkg/scheduler/framework/plugins/nodelabel/node_label.go, pkg/scheduler/framework/plugins/nodevolumelimits/csi.go, pkg/scheduler/framework/plugins/nodevolumelimits/non_csi.go(#105855, @￰shivanshu1333)
- pkg/proxy(#104908, @￰CIPHERTron)
- pkg/proxy(#104891, @￰shivanshu1333)
- pkg/proxy/ipvs(#104932, @￰shivanshu1333)
- pkg/proxy/userspace(#104931, @￰shivanshu1333)
- pkg/proxy/winuserspace(#105035, @￰shivanshu1333)
- pkg/scheduler(#99273, @￰yangjunmyfm192085)
- pkg/scheduler/framework/plugins/interpodaffinity/filtering.go、pkg/scheduler/framework/plugins/podtopologyspread/filtering.go、pkg/scheduler/framework/plugins/volumezone/volume_zone.go(#105931, @￰mengjiao-liu)
- pkg/scheduler/framework/plugins/volumebinding/assume_cache.go(#105904, @￰mengjiao-liu)
- pkg/scheduler/framework/preemption/preemption.go、pkg/scheduler/framework/plugins/examples/stateful/stateful.go、pkg/scheduler/framework/plugins/noderesources/resource_allocation.go(#105967, @￰shivanshu1333)
- migratecmd/kube-proxy/app(#98913, @￰yxxhero)
- scheduler file cache.go(#105969, @￰shivanshu1333)
- scheduler files comparer.go, dumper.go, node_tree.go(#105968, @￰shivanshu1333)

EndpointSlice controller に Topology Hints 関連のログが追加されました(#104741, @￰robscott)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Kubernetes 1.23: Metrics Changes と SIG Instrumentation の変更内容

はじめに

klog 固有フラグの非推奨

関連ドキュメント

構造化ロギングが Beta に移行

その他

メトリクスの変更（Metrics Changes）

追加

kube-apiserver

kube-controller-manager

変更

kube-apiserver

kube-scheduler

kubelet

修正

kube-apiserver

kubelet

kube-scheduler

kube-proxy

削除

kube-scheduler

メトリクス以外の変更

API 変更(API Changes)

非推奨(Deprecation)

機能追加(Feature)

不具合の修正(Bug or Regression)

そのほか(Cleanup or Flake)