More than 5 years have passed since last update.

kubectl CLIでPodの正常性を確認する

Last updated at 2019-08-07Posted at 2018-08-02

IBMのオンプレKubernetes製品であるIBM Cloud Private自体、およびその上で動くPodが正常に稼働しているかをkubectl CLIで確認する方法を調べたメモ。

方針

kubectl get po --all-namespaces

$ kubectl get po --all-namespaces
NAMESPACE      NAME                                                           READY   STATUS             RESTARTS   AGE
cert-manager   cert-manager-ibm-cert-manager-687fbb75f4-7np8v                 1/1     Running            7          69d
kube-system    audit-logging-fluentd-ds-9tq48                                 1/1     Running            0          9d
kube-system    audit-logging-fluentd-ds-bdq64                                 1/1     Running            0          9d
kube-system    audit-logging-fluentd-ds-kz289                                 0/1     Pending            0          9d
kube-system    audit-logging-fluentd-ds-qpq4l                                 1/1     Running            3          69d
kube-system    auth-idp-pqn5x                                                 4/4     Running            0          9d
kube-system    auth-pap-r8c7s                                                 2/2     Running            0          9d
kube-system    auth-pdp-6bdlj                                                 2/2     Running            0          9d
（省略）
kube-system    logging-elk-kibana-init-b9bgs                                  0/1     Completed          8          69d
（省略）
kube-system    monitoring-prometheus-collectdexporter-8c479ffcd-zwn4c         1/2     CrashLoopBackOff   2745       9d
（省略）
$

PodのSTATUSがRunningであり、READYの分母と分子が同じであれば、Podが正常に稼働していると判断してよく、全てのPodが正常であれば、ICPとしても正常であると判断できそう。

コマンドの結果はgrepやawkで処理してもよいが、-o jsonを指定して出力されたjsonをjqで処理する。-o jsonpathや--field-selectorというオプションもあるが、機能が不十分そうなので、jqを使う。

https://stedolan.github.io/jq/manual/

Podの状態の例

kubectl get po -n kube-system audit-logging-fluentd-ds-9tq48 -o json | jq '.status'

{
  "conditions": [
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2019-07-28T06:27:45Z",
      "status": "True",
      "type": "Initialized"
    },
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2019-07-28T06:29:43Z",
      "status": "True",
      "type": "Ready"
    },
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2019-07-28T06:29:43Z",
      "status": "True",
      "type": "ContainersReady"
    },
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2019-07-28T06:26:16Z",
      "status": "True",
      "type": "PodScheduled"
    }
  ],
  "containerStatuses": [
    {
      "containerID": "docker://18aa4ac7900637eb17e23846c388ae4f3f5756e6b83dce6ab532a52340d76185",
      "image": "mycluster.icp:8500/ibmcom/fluentd:v1.4.1-icp",
      "imageID": "docker-pullable://mycluster.icp:8500/ibmcom/fluentd@sha256:0ee8b30c88a870aeba2d5817c68a7a56a68188fa5622d2281096ad7180a384eb",
      "lastState": {},
      "name": "fluentd",
      "ready": true,
      "restartCount": 0,
      "state": {
        "running": {
          "startedAt": "2019-07-28T06:29:42Z"
        }
      }
    }
  ],
  "hostIP": "9.188.124.25",
  "phase": "Running",
  "podIP": "10.1.166.214",
  "qosClass": "Guaranteed",
  "startTime": "2019-07-28T06:27:45Z"
}

以下あたりが判定に使えそう。

.status.phase
.status.conditions[]
.status.containerStatuses[]

ただし、kubecltの結果のSTATUSはjsonの.status.phaseではないので、.status.phaseがRunningであるかを確認するのでは不十分。

Pod Lifecycle

.status.phaseの取り得る値は以下しかないが、kubectlのSTATUSには他にもいろいろ出力される。

Value	Description
Pending	The Pod has been accepted by the Kubernetes system, but one or more of the Container images has not been created. This includes time before being scheduled as well as time spent downloading images over the network, which could take a while.
Running	The Pod has been bound to a node, and all of the Containers have been created. At least one Container is still running, or is in the process of starting or restarting.
Succeeded	All Containers in the Pod have terminated in success, and will not be restarted.
Failed	All Containers in the Pod have terminated, and at least one Container has terminated in failure. That is, the Container either exited with non-zero status or was terminated by the system.
Unknown	For some reason the state of the Pod could not be obtained, typically due to an error in communicating with the host of the Pod.

PodStatusのphaseフィールド

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.15/#podstatus-v1-core

.status.phaseがRunningではないPodを抽出。

kubectl get po --all-namespaces -o json \
  | jq -r '.items[]
    | select ( .status.phase != "Running" )
    | .metadata.namespace + "/" + .metadata.name'

$ kubectl get po --all-namespaces -o json \
>   | jq -r '.items[]
>     | select ( .status.phase != "Running" )
>     | .metadata.namespace + "/" + .metadata.name'
kube-system/audit-logging-fluentd-ds-kz289
kube-system/iam-onboarding-lz4rt
kube-system/key-management-onboarding-lx4rn
kube-system/logging-elk-elasticsearch-curator-1565047800-6whh9
kube-system/logging-elk-elasticsearch-curator-1565134200-dcmbk
kube-system/logging-elk-elasticsearch-pki-init-snrdr
kube-system/logging-elk-elasticsearch-searchguard-init-vzpmb
kube-system/logging-elk-kibana-init-b9bgs
kube-system/multicluster-hub-ibm-mcm-prod-create-redis-secret-ld6cp
kube-system/oidc-client-registration-tdl4p
kube-system/security-onboarding-mv7w7
$

これだとCompletedなPodも抽出される。また、Runningだが一部のコンテナが異常なケースは抽出されないのでこの方法は使えなそう。

PodCondition

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.15/#podcondition-v1-core

.status.conditions[]がPodCondtionの配列となっている。
PodCondtionはいくつかのtypeがあるが、typeがReadyでstatusがTrueなPodConditionがあればPodが正常と見なせそう。

コンテナーがReady＝ReadinessProbeが成功
PodがReady＝全てのコンテナーがReady
typeがReadyでstatusがTrueなPodConditionがあればPodはReady

Podのスケジュールや初期化に失敗した場合はtypeがReadyのPodCondtionが存在しないことに注意し、typeがReadyでstatusがTrueなPodConditionが1つもないPodを抽出。

kubectl get po --all-namespaces -o json \
  | jq -r '.items[]
    | select( ([ .status.conditions[] | select( .type == "Ready" and .status == "True" ) ] | length ) != 1)
    | .metadata.namespace + "/" + .metadata.name'

$ kubectl get po --all-namespaces -o json \
>   | jq -r '.items[]
>     | select( ([ .status.conditions[] | select( .type == "Ready" and .status == "True" ) ] | length ) != 1)
>     | .metadata.namespace + "/" + .metadata.name'
kube-system/audit-logging-fluentd-ds-kz289
kube-system/iam-onboarding-lz4rt
kube-system/key-management-onboarding-lx4rn
kube-system/logging-elk-elasticsearch-curator-1565047800-6whh9
kube-system/logging-elk-elasticsearch-curator-1565134200-dcmbk
kube-system/logging-elk-elasticsearch-pki-init-snrdr
kube-system/logging-elk-elasticsearch-searchguard-init-vzpmb
kube-system/logging-elk-kibana-init-b9bgs
kube-system/monitoring-prometheus-collectdexporter-8c479ffcd-zwn4c
kube-system/multicluster-hub-ibm-mcm-prod-create-redis-secret-ld6cp
kube-system/oidc-client-registration-tdl4p
kube-system/security-onboarding-mv7w7
$

CompletedなPodも抽出されてしまうので、CompletedなPod（.status.phaseがSucceeded）を除外する条件を加えると以下。

kubectl get po --all-namespaces -o json \
  | jq -r '.items[]
    | select( ( [ .status.conditions[] | select( .type == "Ready" and .status == "True" ) ] | length ) != 1 )
    | select( .status.phase != "Succeeded" )
    | .metadata.namespace + "/" + .metadata.name'

$ kubectl get po --all-namespaces -o json \
>   | jq -r '.items[]
>     | select( ( [ .status.conditions[] | select( .type == "Ready" and .status == "True" ) ] | length ) != 1 )
>     | select( .status.phase != "Succeeded" )
>     | .metadata.namespace + "/" + .metadata.name'
kube-system/audit-logging-fluentd-ds-kz289
kube-system/monitoring-prometheus-collectdexporter-8c479ffcd-zwn4c

これでよさそうに思える。

ContainerStatus

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.15/#containerstatus-v1-core

.status.containerStatuses[]がContainerStatusの配列となっている。ContainerStatusの配列が空か、配列にreadyがfalseとなっているコンテナが含まれる場合、Podとしても異常と考えられる。

kubectl get po --all-namespaces -o json \
  | jq -r '.items[]
    | if .status.containerStatuses | length == 0
      then .
      else select ( ( [ .status.containerStatuses[] | select( .ready == false ) ] | length ) != 0 )
      end
    | select( .status.phase != "Succeeded" )
    | .metadata.namespace + "/" + .metadata.name'

$ kubectl get po --all-namespaces -o json \
>   | jq -r '.items[]
>     | if .status.containerStatuses | length == 0
>       then .
>       else select ( ( [ .status.containerStatuses[] | select( .ready == false ) ] | length ) != 0 )
>       end
>     | select( .status.phase != "Succeeded" )
>     | .metadata.namespace + "/" + .metadata.name'
kube-system/audit-logging-fluentd-ds-kz289
kube-system/monitoring-prometheus-collectdexporter-8c479ffcd-zwn4c
$

これでもよさそうに思える。

補足

コンテナ作成時や削除時はPendingのフェーズであり、PodはReadyでないため、異常として検知される。何度かリトライするなど上手く対応する必要がありそう。

Podがそもそもスケジュールされない（Pendingにもならない）状態を異常として検知できないので、Podを監視するより、Deployment/StatefulSet/DaemonSetのDesiredな数とAvailableの数を確認する方が有用そう。Static Podは個別に監視。

kubectl get deploy --all-namespaces -o json \
  | jq -r '.items[]
    | select ( .status.replicas != .status.readyReplicas )
    | .metadata.namespace + "/" + .metadata.name'

kubectl get sts --all-namespaces -o json \
  | jq -r '.items[]
    | select ( .status.replicas != .status.readyReplicas )
    | .metadata.namespace + "/" + .metadata.name'

kubectl get ds --all-namespaces -o json \
  | jq -r '.items[]
    | select ( .status.desiredNumberScheduled != .status.numberReady )
    | .metadata.namespace + "/" + .metadata.name'

参考リンク

kubectl get should have a way to filter for advanced pods status

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up