はじめに
Raspberry Pi 4B で kubernetes クラスタを構築したのですが、ラズパイは構築記事ばかりで運用面などの実例記事は少ないのかなと感じます。アーキテクチャがARMでコンテナイメージが用意されていないものも多いせいでしょうか...
そんな中でのクラスターモニタリングについてのお勉強記事です。
前提環境
- H/W
Raspberry Pi 4B - S/W
以下のコマンドでOS(kernel),docker,kubernetesのバージョンが全部わかるみたいですね。
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
chino Ready control-plane 165d v1.20.0 10.0.0.1 <none> Debian GNU/Linux 10 (buster) 5.4.83-v8+ docker://20.10.1
chiya Ready worker 165d v1.20.0 10.0.0.5 <none> Debian GNU/Linux 10 (buster) 5.4.65-v8+ docker://20.10.1
cocoa Ready worker 158d v1.20.0 10.0.0.2 <none> Debian GNU/Linux 10 (buster) 5.4.65-v8+ docker://20.10.1
maya Ready worker 21d v1.20.1 10.0.0.7 <none> Debian GNU/Linux 10 (buster) 5.4.65-v8+ cri-o://1.20.0
megu Ready worker 57d v1.20.1 10.0.0.8 <none> Debian GNU/Linux 10 (buster) 5.4.83-v8+ cri-o://1.20.0
rize Ready worker 158d v1.20.0 10.0.0.3 <none> Debian GNU/Linux 10 (buster) 5.4.65-v8+ docker://20.10.1
syaro Ready worker 158d v1.20.0 10.0.0.4 <none> Debian GNU/Linux 10 (buster) 5.4.65-v8+ docker://20.10.1
構築手順
Goのインストール
ARMアーキテクチャでkubernetesを頑張る場合はGoは必需品になるかと思います。
raspbianにおいても、ちょっと古いですが、aptでインストールできるようです。
# apt install golang
# go version
go version go1.11.6 linux/arm
あと、環境変数GOPATHの設定とPATHへの追加をしておきます。
# cat >> ~/.bashrc << 'EOF'
> export GOPATH=$HOME/go
> export PATH=$PATH:$GOPATH/bin
> EOF
# source ~/.bashrc
# printenv | grep PATH
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/go/bin
GOPATH=/root/go
クラスターモニタリングのデプロイ
最初はkube-state-metricsのARM用のイメージをビルドしてたのですが、調べてる最中にgithubで全部まとめてデプロイできるようにしている方がおりました。神様です。
Cluster Monitoring stack for ARM / X86-64 platforms
# git clone https://github.com/carlosedp/cluster-monitoring
Cloning into 'cluster-monitoring'...
remote: Enumerating objects: 16, done.
remote: Counting objects: 100% (16/16), done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 1398 (delta 2), reused 5 (delta 1), pack-reused 1382
Receiving objects: 100% (1398/1398), 1.04 MiB | 1.16 MiB/s, done.
Resolving deltas: 100% (958/958), done.
# cd cluster-monitoring
# make vendor
rm -rf vendor
/root/go/bin/jb install
GET https://github.com/coreos/kube-prometheus/archive/17989b42aa10b1c6afa07043cb05bcd5ae492284.tar.gz 200
GET https://github.com/coreos/prometheus-operator/archive/e31c69f9b5c6555e0f4a5c1f39d0f03182dd6b41.tar.gz 200
GET https://github.com/ksonnet/ksonnet-lib/archive/0d2f82676817bbf9e4acf6495b2090205f323b9f.tar.gz 200
GET https://github.com/prometheus/node_exporter/archive/08ce3c6dd430deb51798826701a395e460620d60.tar.gz 200
GET https://github.com/prometheus/prometheus/archive/74207c04655e1fd93eea0e9a5d2f31b1cbc4d3d0.tar.gz 200
GET https://github.com/brancz/kubernetes-grafana/archive/57b4365eacda291b82e0d55ba7eec573a8198dda.tar.gz 200
GET https://github.com/coreos/etcd/archive/d8c8f903eee10b8391abaef7758c38b2cd393c55.tar.gz 200
GET https://github.com/kubernetes-monitoring/kubernetes-mixin/archive/b61c5a34051f8f57284a08fe78ad8a45b430252b.tar.gz 200
GET https://github.com/kubernetes/kube-state-metrics/archive/d667979ed55ad1c4db44d331b51d646f5b903aa7.tar.gz 200
GET https://github.com/kubernetes/kube-state-metrics/archive/d667979ed55ad1c4db44d331b51d646f5b903aa7.tar.gz 200
GET https://github.com/grafana/grafonnet-lib/archive/8fb95bd89990e493a8534205ee636bfcb8db67bd.tar.gz 200
GET https://github.com/grafana/jsonnet-libs/archive/881db2241f0c5007c3e831caf34b0c645202b4ab.tar.gz 200
# make deploy
echo "Deploying stack setup manifests..."
Deploying stack setup manifests...
kubectl apply -f ./manifests/setup/
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
echo "Will wait 10 seconds to deploy the additional manifests.."
Will wait 10 seconds to deploy the additional manifests..
sleep 10
kubectl apply -f ./manifests/
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
secret/grafana-config created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-coredns-dashboard created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-kubernetes-cluster-dashboard created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-dashboard created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
ingress.extensions/alertmanager-main created
ingress.extensions/grafana created
ingress.extensions/prometheus-k8s created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
service/kube-controller-manager-prometheus-discovery created
service/kube-dns-prometheus-discovery created
service/kube-scheduler-prometheus-discovery created
servicemonitor.monitoring.coreos.com/prometheus-operator created
prometheus.monitoring.coreos.com/k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
podはmonitoringというネームスペースにデプロイされます。
# kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 18m
grafana-784d46dcb-jjhgj 1/1 Running 0 18m
kube-state-metrics-6cb6df5d4-8w5kl 3/3 Running 0 18m
node-exporter-dksrp 2/2 Running 0 18m
node-exporter-h5cnc 2/2 Running 0 18m
node-exporter-jqdpr 2/2 Running 0 18m
node-exporter-l4vgm 2/2 Running 0 18m
node-exporter-lmdlz 2/2 Running 0 18m
node-exporter-pg4ww 2/2 Running 0 18m
node-exporter-sm4vk 2/2 Running 0 18m
prometheus-adapter-585b57857b-6tw7k 1/1 Running 0 18m
prometheus-k8s-0 3/3 Running 1 18m
prometheus-operator-67755f959-sjp7l 2/2 Running 0 19m
サービスは以下のように定義されてました。
私の場合は、prometheus-k8sとgrafanaは利便性を考えてNodePortした方が良いかな。
# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.96.231.106 <none> 9093/TCP 7m5s
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 7m5s
grafana ClusterIP 10.96.126.125 <none> 3000/TCP 7m3s
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 7m3s
node-exporter ClusterIP None <none> 9100/TCP 7m2s
prometheus-adapter ClusterIP 10.96.10.168 <none> 443/TCP 7m2s
prometheus-k8s ClusterIP 10.96.0.198 <none> 9090/TCP 6m59s
prometheus-operated ClusterIP None <none> 9090/TCP 7m
prometheus-operator ClusterIP None <none> 8443/TCP 40d
manifestsにあるサービスを定義するyamlファイルを更新してapplyしました。
prometheusは30909番、grafanaは30300番にしました。
# cat manifests/prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
type: NodePort
ports:
- name: web
port: 9090
nodePort: 30909
targetPort: web
selector:
app: prometheus
prometheus: k8s
sessionAffinity: ClientIP
# kubectl apply -f manifests/prometheus-service.yaml
service/prometheus-k8s configured
# cat manifests/grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: grafana
name: grafana
namespace: monitoring
spec:
type: NodePort
ports:
- name: http
port: 3000
nodePort: 30300
targetPort: http
selector:
app: grafana
# kubectl apply -f manifests/grafana-service.yaml
service/grafana configured
# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.96.231.106 <none> 9093/TCP 12m
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 12m
grafana NodePort 10.96.126.125 <none> 3000:30300/TCP 12m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 12m
node-exporter ClusterIP None <none> 9100/TCP 12m
prometheus-adapter ClusterIP 10.96.10.168 <none> 443/TCP 12m
prometheus-k8s NodePort 10.96.0.198 <none> 9090:30909/TCP 12m
prometheus-operated ClusterIP None <none> 9090/TCP 12m
prometheus-operator ClusterIP None <none> 8443/TCP 40d
ここまでデプロイできれば、kubectl topでCPUとメモリの使用率が確認できます。
# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
chino 558m 13% 1456Mi 38%
chiya 699m 17% 550Mi 15%
cocoa 282m 7% 2773Mi 73%
maya 108m 2% 820Mi 21%
megu 488m 12% 1624Mi 43%
rize 713m 17% 4648Mi 59%
syaro 431m 10% 3111Mi 40%
私の環境では、dashboardもデプロイしているので、dashboardにグラフも表示されるようになりました。
Prometheus/Grafana
使い方はkubernetes関連ではいつも参考にさせていただいている高良さんの記事『k8sメトリックスのモニタリングとログ分析について調べたメモ』 などを参考に、ただいま勉強中なので詳しく書けませんが、以下のようにラズパイでも使えています。
grafanaの初期ログインユーザは「admin」パスワードは「admin」となっています。新しいパスワードに更新して好きなダッシュボードをインポートしましょう。
トラブルシュート事例?
prometheusのOOM [2020/6/25追記]
数週間稼働させていると、ワーカーの1台のCPU負荷(shellでelasticsearchに送ってkibanaで可視化してました)が定期的に上がっている事に気付きました。
prometheusの画面が開かなくなっていたので、kubernetesのpodを調べたところ、prometheus-k8s-0のprometheusコンテナが「CrashLoopBackOff」になっていました。早速コンテナのlogを確認すると、
# kubectl logs pod/prometheus-k8s-0 -n monitoring -c prometheus
:
runtime: out of memory: cannot allocate byte block
fatal error: out of memory
どうやらメモリ不足のようでした。調べると結構出てくるのですが、「storage.tsdb.retention.time」を短くするという情報が多かったです。
今回利用しているマニフェストでは、manifestes/prometheus-prometheus.yamlのretentionを短くすれば良いかな?と考えて、以下のように編集したら無事にデプロイできました。
# grep retention manifests/prometheus-prometheus.yaml
retention: 7d
# kubectl apply -f manifests/prometheus-prometheus.yaml
prometheus.monitoring.coreos.com/k8s configured
# kubectl get pods -n monitoring -o wide -w
:
(podの状態遷移が確認できます)
:
おわりに
ラズパイ(ARM)用のkubernetesクラスターモニタリングとして、Cluster Monitoring stack for ARM / X86-64 platforms をそのまま利用できました。
利用されているコンポーネントもOpenShift4のクラスターモニタリングとほぼ同じ構成なので、OpenShiftの勉強にも使えると考えています。