はじめに
個人的な実験メモで、他の方が読む事を想定してはおらず、読みやすいものではないのでご了承頂ければ幸いです。
環境認識
記録用に環境の情報を取得
各 Node の情報
バージョンは OpenShift 4.8.26 (Kubernetes 1.21)
[root@bastion openshift]# oc version
Client Version: 4.8.26
Server Version: 4.8.26
Kubernetes Version: v1.21.6+bb8d50a
[root@bastion openshift]# oc get nodes
NAME STATUS ROLES AGE VERSION
ocp48-6vldl-infra-94vjm Ready infra,worker 12d v1.21.6+bb8d50a
ocp48-6vldl-infra-gjgwb Ready infra,worker 12d v1.21.6+bb8d50a
ocp48-6vldl-infra-ocs-dbjbd Ready infra,worker 5d12h v1.21.6+bb8d50a
ocp48-6vldl-infra-ocs-rdt8b Ready infra,worker 5d12h v1.21.6+bb8d50a
ocp48-6vldl-infra-ocs-xdhvn Ready infra,worker 5d12h v1.21.6+bb8d50a
ocp48-6vldl-infra-qvwvk Ready infra,worker 12d v1.21.6+bb8d50a
ocp48-6vldl-master-0 Ready master 17d v1.21.6+bb8d50a
ocp48-6vldl-master-1 Ready master 17d v1.21.6+bb8d50a
ocp48-6vldl-master-2 Ready master 17d v1.21.6+bb8d50a
ocp48-6vldl-worker-85crs Ready worker 17d v1.21.6+bb8d50a
ocp48-6vldl-worker-hdj9r Ready worker 17d v1.21.6+bb8d50a
ocp48-6vldl-worker-xp4bf Ready worker 17d v1.21.6+bb8d50a
[root@bastion openshift]#
- Master Node x 3
- Worker Node x 3
- Infrastructure Node x 6
Cluster Monitoring のバージョン
Cluster Monitoring は、CO (Cluster Operator) の一つとして導入され、OpenShift を導入するとデフォルトで導入されている。
[root@bastion openshift]# oc get co | grep monitoring
monitoring 4.8.26 True False False 16d
[root@bastion openshift]#
monitoring のバージョンは 4.8.26
Cluster Monitoring 用にデプロイされている Pod
openshift-monitoring
に所属する Pod。nodeSelector
を付けて Infrastructure Node
に配置。
[root@bastion openshift]# oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-main-0 5/5 Running 0 41m 10.128.4.21 ocp48-6vldl-infra-gjgwb <none> <none>
alertmanager-main-1 5/5 Running 0 42m 10.130.2.17 ocp48-6vldl-infra-qvwvk <none> <none>
alertmanager-main-2 5/5 Running 0 42m 10.131.2.14 ocp48-6vldl-infra-94vjm <none> <none>
cluster-monitoring-operator-95674b95b-slbjr 2/2 Running 4 16d 10.129.0.7 ocp48-6vldl-master-2 <none> <none>
grafana-5666d69fc9-d8plz 2/2 Running 0 42m 10.130.2.15 ocp48-6vldl-infra-qvwvk <none> <none>
kube-state-metrics-5f5f79ccbc-858xx 3/3 Running 0 42m 10.130.2.13 ocp48-6vldl-infra-qvwvk <none> <none>
node-exporter-4bcsq 2/2 Running 0 11d 172.18.0.43 ocp48-6vldl-infra-94vjm <none> <none>
node-exporter-68q5d 2/2 Running 0 4d13h 172.18.0.154 ocp48-6vldl-infra-ocs-xdhvn <none> <none>
node-exporter-6nqpj 2/2 Running 0 4d13h 172.18.0.113 ocp48-6vldl-infra-ocs-rdt8b <none> <none>
node-exporter-9fjj4 2/2 Running 0 4d13h 172.18.0.186 ocp48-6vldl-infra-ocs-dbjbd <none> <none>
node-exporter-btbb7 2/2 Running 0 16d 172.18.0.180 ocp48-6vldl-worker-85crs <none> <none>
node-exporter-cr6m2 2/2 Running 0 11d 172.18.0.67 ocp48-6vldl-infra-qvwvk <none> <none>
node-exporter-cwglh 2/2 Running 0 16d 172.18.0.32 ocp48-6vldl-master-2 <none> <none>
node-exporter-m6hn9 2/2 Running 0 16d 172.18.0.25 ocp48-6vldl-master-1 <none> <none>
node-exporter-mplgz 2/2 Running 0 11d 172.18.0.37 ocp48-6vldl-infra-gjgwb <none> <none>
node-exporter-qtxdl 2/2 Running 0 16d 172.18.0.124 ocp48-6vldl-master-0 <none> <none>
node-exporter-vrshn 2/2 Running 0 16d 172.18.0.126 ocp48-6vldl-worker-hdj9r <none> <none>
node-exporter-zgsmz 2/2 Running 0 16d 172.18.0.53 ocp48-6vldl-worker-xp4bf <none> <none>
openshift-state-metrics-5bbdb5896-nnx65 3/3 Running 0 42m 10.130.2.12 ocp48-6vldl-infra-qvwvk <none> <none>
prometheus-adapter-7b757d8db7-gm8v2 1/1 Running 0 42m 10.130.2.14 ocp48-6vldl-infra-qvwvk <none> <none>
prometheus-adapter-7b757d8db7-h5lt5 1/1 Running 0 42m 10.131.2.12 ocp48-6vldl-infra-94vjm <none> <none>
prometheus-k8s-0 7/7 Running 1 46s 10.131.2.15 ocp48-6vldl-infra-94vjm <none> <none>
prometheus-k8s-1 7/7 Running 1 46s 10.130.2.22 ocp48-6vldl-infra-qvwvk <none> <none>
prometheus-operator-7c8f55cc45-qjx6s 2/2 Running 0 43m 10.131.2.11 ocp48-6vldl-infra-94vjm <none> <none>
telemeter-client-844fdfd96-xzfm5 3/3 Running 0 42m 10.128.4.17 ocp48-6vldl-infra-gjgwb <none> <none>
thanos-querier-68d474b7df-bzqdv 5/5 Running 0 42m 10.130.2.16 ocp48-6vldl-infra-qvwvk <none> <none>
thanos-querier-68d474b7df-q5lmw 5/5 Running 0 42m 10.131.2.13 ocp48-6vldl-infra-94vjm <none> <none>
[root@bastion openshift]#
Cluster Monitoring 構成用に作成したConfigMap
OpenShift install 後、デフォルトのままでも動くが、モニタリング・データ保存用のPVを作成したり、Infrastructure Node
に Pod
を配置するには ConfigMap
を作成して設定を行う必要がある。
この YAML は、VMware 環境にIPIでインストールした時のデフォルトである StorageClass thin
を使用するように構成してある。
設定カスタマイズ用のConfigMap
[root@bastion openshift]# cat cluster-monitoring-configmap-vm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |+
alertmanagerMain:
nodeSelector: # nodeSelectorでinfraを選ぶ
node-role.kubernetes.io/infra: ""
tolerations: # toleration を付ける
- key: infra
value: reserved
effect: NoSchedule
- key: infra
value: reserved
effect: NoExecute
prometheusK8s:
volumeClaimTemplate: # volumeCliaimTemplate
spec: # 追加
storageClassName: thin # VMware の in-tree
volumeMode: Filesystem # FileSystem
resources: # 追加
requests: # 追加
storage: 40Gi # Size はとりあえず 40Gi
nodeSelector: # nodeSelectorでinfraを選ぶ
node-role.kubernetes.io/infra: ""
tolerations: # toleration を付ける
- key: infra
value: reserved
effect: NoSchedule
- key: infra
value: reserved
effect: NoExecute
prometheusOperator:
nodeSelector: # nodeSelectorでinfraを選ぶ
node-role.kubernetes.io/infra: ""
tolerations: # toleration を付ける
- key: infra
value: reserved
effect: NoSchedule
- key: infra
value: reserved
effect: NoExecute
grafana:
nodeSelector: # nodeSelectorでinfraを選ぶ
node-role.kubernetes.io/infra: ""
tolerations: # toleration を付ける
- key: infra
value: reserved
effect: NoSchedule
- key: infra
value: reserved
effect: NoExecute
k8sPrometheusAdapter:
nodeSelector: # nodeSelectorでinfraを選ぶ
node-role.kubernetes.io/infra: ""
tolerations: # toleration を付ける
- key: infra
value: reserved
effect: NoSchedule
- key: infra
value: reserved
effect: NoExecute
kubeStateMetrics:
nodeSelector: # nodeSelectorでinfraを選ぶ
node-role.kubernetes.io/infra: ""
tolerations: # toleration を付ける
- key: infra
value: reserved
effect: NoSchedule
- key: infra
value: reserved
effect: NoExecute
telemeterClient:
nodeSelector: # nodeSelectorでinfraを選ぶ
node-role.kubernetes.io/infra: ""
tolerations: # toleration を付ける
- key: infra
value: reserved
effect: NoSchedule
- key: infra
value: reserved
effect: NoExecute
openshiftStateMetrics:
nodeSelector: # nodeSelectorでinfraを選ぶ
node-role.kubernetes.io/infra: ""
tolerations: # toleration を付ける
- key: infra
value: reserved
effect: NoSchedule
- key: infra
value: reserved
effect: NoExecute
thanosQuerier:
nodeSelector: # nodeSelectorでinfraを選ぶ
node-role.kubernetes.io/infra: ""
tolerations: # toleration を付ける
- key: infra
value: reserved
effect: NoSchedule
- key: infra
value: reserved
effect: NoExecute
主要 Pod の Requests / Limits 調査
openshift-monitoring
namespace に存在する Pod に指定されている Requests
と Limits
を oc (kubectl) get pod
を使って調べて行く。
コマンドによる確認結果
grep で Requests
と Limits
を引っかけてコンテナ名を調べるのが大変だったので jsonpath
で取得する方法を開発した。
.spec.containers[*].name
と .spec.containers[*].resources
のペアと
.spec.initContainers[*].name
と .spec.initContainers[*].resources
のペアを取得している。
[root@bastion openshift]# oc get pod alertmanager-main-0 -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
alertmanager {"requests":{"cpu":"4m","memory":"40Mi"}}
config-reloader {"requests":{"cpu":"1m","memory":"10Mi"}}
alertmanager-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
prom-label-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod cluster-monitoring-operator-95674b95b-slbjr -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
cluster-monitoring-operator {"requests":{"cpu":"10m","memory":"75Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod grafana-5666d69fc9-d8plz -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
grafana {"requests":{"cpu":"4m","memory":"64Mi"}}
grafana-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod kube-state-metrics-5f5f79ccbc-858xx -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
kube-state-metrics {"requests":{"cpu":"2m","memory":"80Mi"}}
kube-rbac-proxy-main {"requests":{"cpu":"1m","memory":"15Mi"}}
kube-rbac-proxy-self {"requests":{"cpu":"1m","memory":"15Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod node-exporter-4bcsq -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
node-exporter {"requests":{"cpu":"8m","memory":"32Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
init-textfile {"requests":{"cpu":"1m","memory":"1Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod openshift-state-metrics-5bbdb5896-nnx65 -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
kube-rbac-proxy-main {"requests":{"cpu":"1m","memory":"20Mi"}}
kube-rbac-proxy-self {"requests":{"cpu":"1m","memory":"20Mi"}}
openshift-state-metrics {"requests":{"cpu":"1m","memory":"32Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod prometheus-adapter-7b757d8db7-gm8v2 -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
prometheus-adapter {"requests":{"cpu":"1m","memory":"40Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod prometheus-k8s-0 -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
prometheus {"requests":{"cpu":"70m","memory":"1Gi"}}
config-reloader {"requests":{"cpu":"1m","memory":"10Mi"}}
thanos-sidecar {"requests":{"cpu":"1m","memory":"25Mi"}}
prometheus-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
prom-label-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
kube-rbac-proxy-thanos {"requests":{"cpu":"1m","memory":"10Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod prometheus-operator-7c8f55cc45-qjx6s -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
prometheus-operator {"requests":{"cpu":"5m","memory":"150Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod telemeter-client-844fdfd96-xzfm5 -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
telemeter-client {"requests":{"cpu":"1m","memory":"40Mi"}}
reload {"requests":{"cpu":"1m","memory":"10Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod thanos-querier-68d474b7df-bzqdv -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
thanos-query {"requests":{"cpu":"10m","memory":"12Mi"}}
oauth-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
prom-label-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
kube-rbac-proxy-rules {"requests":{"cpu":"1m","memory":"15Mi"}}
[root@bastion openshift]#
コマンドの結果のまとめ
空白部分は特に指定が無かった事を表す。
Pod 名 | container名 | CPU (Limits) | CPU (Requests) | Memory (Limits) | Memory (Requests) |
---|---|---|---|---|---|
Alert Manager | |||||
alertmanager | 4m | 40Mi | |||
config-reloader | 1m | 10Mi | |||
alertmanager-proxy | 1m | 20Mi | |||
kube-rbac-proxy | 1m | 15Mi | |||
prom-label-proxy | 1m | 20Mi | |||
cluster-monitoring-operator-xxxx | |||||
kube-rbac-proxy | 1m | 20Mi | |||
cluster-monitoring-operator | 10m | 75Mi | |||
grafana-xxxx | |||||
grafana | 4m | 64Mi | |||
grafana-proxy | 1m | 20Mi | |||
kube-state-metrics-xxxx | |||||
kube-state-metrics | 2m | 80Mi | |||
kube-rbac-proxy-main | 1m | 15Mi | |||
kube-rbac-proxy-self | 1m | 15Mi | |||
node-exporter-xxxx | |||||
node-exporter | 8m | 32Mi | |||
kube-rbac-proxy | 1m | 15Mi | |||
init-textfile(init Container) | 2m | 1Mi | |||
openshift-state-metrics-xxxx | |||||
kube-rbac-proxy-main | 1m | 20Mi | |||
kube-rbac-proxy-self | 1m | 20Mi | |||
openshift-state-metrics | 1m | 32Mi | |||
prometheus-adapter-xxxx | 1m | 40Mi | |||
prometheus-k8s-n | |||||
prometheus | 70m | 1Gi | |||
config-reloader | 1m | 10Mi | |||
thanos-sidecar | 1m | 25Mi | |||
prometheus-proxy | 1m | 20Mi | |||
kube-rbac-proxy | 1m | 15Mi | |||
prom-label-proxy | 1m | 15Mi | |||
kube-rbac-proxy-thanos | 1m | 10Mi | |||
prometheus-operator-xxxx | |||||
prometheus-operator | 5m | 150Mi | |||
kube-rbac-proxy | 1m | 15Mi | |||
telemeter-client-xxxx | |||||
telemeter-client | 1m | 40Mi | |||
reload | 1m | 10Mi | |||
kube-rbac-proxy | 1m | 20Mi | |||
thanos-querier-xxxx | |||||
thanos-query | 10m | 12Mi | |||
oauth-proxy | 1m | 20Mi | |||
kube-rbac-proxy | 1m | 15Mi | |||
prom-label-proxy | 1m | 15Mi | |||
kube-rbac-proxy-rules | 1m | 15Mi |
実際のリソース使用量
kubectl top pods
の結果
[root@bastion openshift]# kubectl top pods -n openshift-monitoring --use-protocol-buffers
NAME CPU(cores) MEMORY(bytes)
alertmanager-main-0 2m 117Mi
alertmanager-main-1 3m 110Mi
alertmanager-main-2 2m 104Mi
cluster-monitoring-operator-95674b95b-slbjr 9m 116Mi
grafana-5666d69fc9-d8plz 3m 136Mi
kube-state-metrics-5f5f79ccbc-858xx 3m 117Mi
node-exporter-4bcsq 3m 46Mi
node-exporter-68q5d 5m 52Mi
node-exporter-6nqpj 6m 55Mi
node-exporter-9fjj4 3m 58Mi
node-exporter-btbb7 4m 39Mi
node-exporter-cr6m2 3m 48Mi
node-exporter-cwglh 5m 46Mi
node-exporter-m6hn9 5m 47Mi
node-exporter-mplgz 5m 48Mi
node-exporter-qtxdl 3m 39Mi
node-exporter-vrshn 4m 39Mi
node-exporter-zgsmz 2m 40Mi
openshift-state-metrics-5bbdb5896-nnx65 0m 57Mi
prometheus-adapter-7b757d8db7-gm8v2 5m 72Mi
prometheus-adapter-7b757d8db7-h5lt5 4m 74Mi
prometheus-k8s-0 1230m 2425Mi
prometheus-k8s-1 514m 2513Mi
prometheus-operator-7c8f55cc45-qjx6s 7m 140Mi
telemeter-client-844fdfd96-xzfm5 0m 73Mi
thanos-querier-68d474b7df-bzqdv 4m 121Mi
thanos-querier-68d474b7df-q5lmw 2m 123Mi
[root@bastion openshift]#
alertmanager-main-n
や、prometheus-k80-n
のメモリ使用量、prometheus-k80-n
のCPU使用量は reuests
値を大きく上回っているのが分かる。