Help us understand the problem. What is going on with this article?

Kubernetesクラスタでmetrics-serverを導入してkubectl topやHPA(Horizontal Pod Autoscaler)を有効にする

More than 1 year has passed since last update.

kubeadmなどを使って自前で構築したkubernetesを使って色々な機能を試していると、初期状態では
kubectl top [node/pod]で結果が得られなかったり、
kubectl autoscaleでオートスケーリングさせようにもPodのCPU使用率などをウォッチできずにスケールされないかと思います。
HPAの公式ドキュメントをみているとkubernetes v1.11からはmetrics serverを動かしてやればいい(それ以前はHeapsterだが現在は非推奨)という情報があり、
試しに動かしてみたけどやっぱり取れない…ということがありました。
私が探した範囲では日本語記事にはコレ!という記事は見つからず、GithubのIssueや英語のブログなどを見て解決することができたので、書き残しておこうと思います。

TL;DR

metrics-server-deployment.yamlのmetrics-server Deploymentに4行追加して、実行コマンドにオプションをつけてApplyしましょう。

metrics-server-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.2
        imagePullPolicy: Always
+       command:
+       - /metrics-server
+       - --kubelet-insecure-tls
+       - --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp

環境

少し前にkubernetes.ioのブログにVagrant/Ansible/kubeadmを使ってkubernetes環境を構築する記事があったので、
コレを元に少し手を入れたコードで環境構築をしています。(利用されるようでしたら環境に合わせてスペック等を変更してお使いください)
https://github.com/chataro0/k8s_on_vagrant_by_ansible
デプロイ完了すると以下のような構成になります。

$ kubectl get node
NAME         STATUS   ROLES    AGE   VERSION
k8s-master   Ready    master   45d   v1.14.1
node-1       Ready    <none>   45d   v1.14.1
node-2       Ready    <none>   45d   v1.14.1

Before

まず初期状態でkubectl topコマンドを実行してみると、もちろん値が取れません。(エラーメッセージはhttp:heapsterを要求するんですね。。)

$ kubectl top node
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

$ kubectl top pod
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

インストール

metrics-serverはgithubからソースcloneしてくるか、releaseからアーカイブをダウンロードしてきましょう。
(余談ですがreleaseのバージョンはv0.3.3ですが、コンテナイメージのバージョンはなぜかv0.3.2でした。。)

wget https://github.com/kubernetes-incubator/metrics-server/archive/v0.3.3.tar.gz
tar xzf v0.3.3.tar.gz

その後、以下コマンドを実行するとmetrics-serverが起動するのですが、デフォルトのままだと…

$ kubectl apply -f metrics-server-0.3.3/deploy/1.8+/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.extensions/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created

以下のように値は取得できず、logsを見てみるとエラーが発生しているのが見て取れるかと思います。

$ kubectl top node
error: metrics not available yet
$ kubectl top pod
W0528 08:32:38.400545   14926 top_pod.go:259] Metrics not available for pod default/busybox-b6f5c9d7c-kfzcr, age: 2h10m48.394664877s
error: Metrics not available for pod default/busybox-b6f5c9d7c-kfzcr, age: 2h10m48.394664877s
$ kubectl get hpa
NAME   REFERENCE        TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
web    Deployment/web   <unknown>/20%   1         10        1          15m
$ kubectl logs -n kube-system metrics-server-548456b4cd-p2w2n 
I0528 08:29:38.244495       1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
[restful] 2019/05/28 08:29:38 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi
[restful] 2019/05/28 08:29:38 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/
I0528 08:29:38.879673       1 serve.go:96] Serving securely on [::]:443
E0528 08:30:19.226871       1 reststorage.go:129] unable to fetch node metrics for node "node-2": no metrics known for node
E0528 08:30:19.226889       1 reststorage.go:129] unable to fetch node metrics for node "k8s-master": no metrics known for node
E0528 08:30:19.226893       1 reststorage.go:129] unable to fetch node metrics for node "node-1": no metrics known for node
E0528 08:30:28.811230       1 reststorage.go:148] unable to fetch pod metrics for pod default/busybox-b6f5c9d7c-kfzcr: no metrics known for pod
E0528 08:30:28.811285       1 reststorage.go:148] unable to fetch pod metrics for pod default/web-7b97d49c8c-fknv6: no metrics known for pod
E0528 08:30:39.159993       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:node-1: unable to fetch metrics from Kubelet node-1 (node-1): Get https://node-1:10250/stats/summary/: dial tcp: lookup node-1 on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:k8s-master: unable to fetch metrics from Kubelet k8s-master (k8s-master): Get https://k8s-master:10250/stats/summary/: dial tcp: lookup k8s-master on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:node-2: unable to fetch metrics from Kubelet node-2 (node-2): Get https://node-2:10250/stats/summary/: dial tcp: lookup node-2 on 10.96.0.10:53: server misbehaving]
E0528 08:31:38.956402       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-master: unable to fetch metrics from Kubelet k8s-master (k8s-master): Get https://k8s-master:10250/stats/summary/: dial tcp: lookup k8s-master on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:node-2: unable to fetch metrics from Kubelet node-2 (node-2): Get https://node-2:10250/stats/summary/: dial tcp: lookup node-2 on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:node-1: unable to fetch metrics from Kubelet node-1 (node-1): Get https://node-1:10250/stats/summary/: dial tcp: lookup node-1 on 10.96.0.10:53: server misbehaving]
E0528 08:32:28.029247       1 reststorage.go:129] unable to fetch node metrics for node "k8s-master": no metrics known for node
E0528 08:32:28.029419       1 reststorage.go:129] unable to fetch node metrics for node "node-1": no metrics known for node
E0528 08:32:28.029488       1 reststorage.go:129] unable to fetch node metrics for node "node-2": no metrics known for node
E0528 08:32:32.997577       1 reststorage.go:148] unable to fetch pod metrics for pod default/busybox-b6f5c9d7c-kfzcr: no metrics known for pod
E0528 08:32:32.997600       1 reststorage.go:148] unable to fetch pod metrics for pod default/web-7b97d49c8c-fknv6: no metrics known for pod
E0528 08:32:39.136985       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:node-2: unable to fetch metrics from Kubelet node-2 (node-2): Get https://node-2:10250/stats/summary/: dial tcp: lookup node-2 on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:k8s-master: unable to fetch metrics from Kubelet k8s-master (k8s-master): Get https://k8s-master:10250/stats/summary/: dial tcp: lookup k8s-master on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:node-1: unable to fetch metrics from Kubelet node-1 (node-1): Get https://node-1:10250/stats/summary/: dial tcp: lookup node-1 on 10.96.0.10:53: server misbehaving]

調査

エラーメッセージを元にmetrics-serverのIssueを見たり、ググって出てきたブログを読んでみると、
TLS証明書のエラーや名前解決ができていないことが問題のようでした。
metrics-server-deploymentで作成されるpodはDockerfileに書かれているように
エントリーポイントとして/metrics-serverコマンドがオプションなしで実行されているのですが、
これに以下の2つのオプションを追加してあげればいい、ということでした。

  • --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
  • --kubelet-insecure-tls

(ブログにもあったようにkubelet-preferred-address-typesInternalIPのみでも問題なく動きましたが、今回は複数書いてます)

修正&適用

具体的にはTL;DRに書いているようにdeployment.spec.template.spec.containers[0].commandを追加してあげればいいです。

metrics-server-0.3.3/deploy/1.8+/metrics-server-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.2
        imagePullPolicy: Always
+       command:
+       - /metrics-server
+       - --kubelet-insecure-tls
+       - --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp

ファイルを編集して、新規なり更新なりでディレクトリをApplyしてあげます。(例では更新)

$ kubectl apply -f metrics-server-0.3.3/deploy/1.8+/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged
serviceaccount/metrics-server unchanged
deployment.extensions/metrics-server configured
service/metrics-server unchanged
clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged

After

Apply直後はメトリクスの不足辺りなのかわかりませんが値は取得できませんが、
1分程度経つと各種メトリクスが取得できるようになります。

$ kubectl top node
NAME         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
k8s-master   415m         20%    734Mi           82%       
node-1       645m         32%    586Mi           65%       
node-2       81m          4%     587Mi           65%     
$ kubectl top pod
NAME                      CPU(cores)   MEMORY(bytes)   
busybox-b6f5c9d7c-kfzcr   1m           0Mi             
web-7b97d49c8c-7dd6z      0m           2Mi             
web-7b97d49c8c-cdtwg      0m           3Mi             
web-7b97d49c8c-fknv6      201m         2Mi             
web-7b97d49c8c-jdzxn      0m           2Mi             
web-7b97d49c8c-v4rsj      0m           2Mi  

Deployment(resourcesでCPUにlimitsかrequestsを設定しないと値が<unknown>のままでした)とhpaを設定してあげると、

deploy.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: web
  name: web
spec:
  replicas: 1
  selector:
    matchLabels:
      run: web
  template:
    metadata:
      labels:
        run: web
    spec:
      containers:
      - image: nginx
        name: web
        resources:
          limits:
            cpu: 200m
hpa.yml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: web
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: web
  targetCPUUtilizationPercentage: 20

きちんとオートスケールされ、イベントログにもその様子が表示されました。

$  kubectl get hpa
NAME   REFERENCE        TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
web    Deployment/web   19%/20%   1         10        5          99m

$ kubectl get events | sort -r
LAST SEEN   TYPE     REASON              OBJECT                        MESSAGE
8m53s       Normal   SuccessfulRescale   horizontalpodautoscaler/web   New size: 2; reason: cpu resource utilization (percentage of request) above target
8m53s       Normal   ScalingReplicaSet   deployment/web                Scaled up replica set web-7b97d49c8c to 2
8m52s       Normal   SuccessfulCreate    replicaset/web-7b97d49c8c     Created pod: web-7b97d49c8c-v4rsj
8m52s       Normal   Scheduled           pod/web-7b97d49c8c-v4rsj      Successfully assigned default/web-7b97d49c8c-v4rsj to node-2
8m49s       Normal   Pulling             pod/web-7b97d49c8c-v4rsj      Pulling image "nginx"
8m46s       Normal   Started             pod/web-7b97d49c8c-v4rsj      Started container web
8m46s       Normal   Pulled              pod/web-7b97d49c8c-v4rsj      Successfully pulled image "nginx"
8m46s       Normal   Created             pod/web-7b97d49c8c-v4rsj      Created container web
5m59s       Normal   SuccessfulRescale   horizontalpodautoscaler/web   New size: 3; reason: cpu resource utilization (percentage of request) above target
5m59s       Normal   SuccessfulCreate    replicaset/web-7b97d49c8c     Created pod: web-7b97d49c8c-jdzxn
5m59s       Normal   Scheduled           pod/web-7b97d49c8c-jdzxn      Successfully assigned default/web-7b97d49c8c-jdzxn to node-2
5m59s       Normal   ScalingReplicaSet   deployment/web                Scaled up replica set web-7b97d49c8c to 3
5m56s       Normal   Pulling             pod/web-7b97d49c8c-jdzxn      Pulling image "nginx"
5m53s       Normal   Started             pod/web-7b97d49c8c-jdzxn      Started container web
5m53s       Normal   Pulled              pod/web-7b97d49c8c-jdzxn      Successfully pulled image "nginx"
5m53s       Normal   Created             pod/web-7b97d49c8c-jdzxn      Created container web
4m58s       Normal   SuccessfulRescale   horizontalpodautoscaler/web   New size: 5; reason: cpu resource utilization (percentage of request) above target
4m57s       Normal   SuccessfulCreate    replicaset/web-7b97d49c8c     Created pod: web-7b97d49c8c-cdtwg
4m57s       Normal   SuccessfulCreate    replicaset/web-7b97d49c8c     Created pod: web-7b97d49c8c-7dd6z
4m57s       Normal   Scheduled           pod/web-7b97d49c8c-cdtwg      Successfully assigned default/web-7b97d49c8c-cdtwg to node-1
4m57s       Normal   Scheduled           pod/web-7b97d49c8c-7dd6z      Successfully assigned default/web-7b97d49c8c-7dd6z to node-1
4m57s       Normal   ScalingReplicaSet   deployment/web                Scaled up replica set web-7b97d49c8c to 5
4m55s       Normal   Pulling             pod/web-7b97d49c8c-cdtwg      Pulling image "nginx"
4m54s       Normal   Pulling             pod/web-7b97d49c8c-7dd6z      Pulling image "nginx"
4m52s       Normal   Pulled              pod/web-7b97d49c8c-cdtwg      Successfully pulled image "nginx"
4m52s       Normal   Created             pod/web-7b97d49c8c-cdtwg      Created container web
4m51s       Normal   Started             pod/web-7b97d49c8c-cdtwg      Started container web
4m49s       Normal   Pulled              pod/web-7b97d49c8c-7dd6z      Successfully pulled image "nginx"
4m49s       Normal   Created             pod/web-7b97d49c8c-7dd6z      Created container web
4m48s       Normal   Started             pod/web-7b97d49c8c-7dd6z      Started container web
chataro0
ap-com
エーピーコミュニケーションズは「エンジニアから時間を奪うものをなくす」ため、ITインフラ自動化のプロフェッショナルとして、クラウドも含めたインフラ自動化技術で顧客の課題を解決すると同時に、SI業務の課題を解決するプロダクト・サービスを提供するNeoSIer(ネオエスアイヤー)です。
https://www.ap-com.co.jp/
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away