More than 3 years have passed since last update.

既存 AWS VPC の中に EKS を構築する時の注意点

Last updated at 2020-11-23Posted at 2020-11-23

AWS EKS を構築する手順に関して、このサンプルを見ると「簡単にできる」と思いませんか？
eksctl で例をあげると

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: basic-cluster
  region: eu-north-1

nodeGroups:
  - name: node-group-hoge
    instanceType: m5.large
    desiredCapacity: 1
    volumeSize: 100
    ssh:
      allow: true
      publicKeyPath: ~/.ssh/ec2_id_rsa.pub
  - name: node-group-fuga
    instanceType: m5.xlarge
    desiredCapacity: 2
    volumeSize: 100
    ssh:
      allow: true
      publicKeyPath: ~/.ssh/ec2_id_rsa.pub

awscliの認証が設定済みであれば、下記のコマンドを打つだけで10分間ぐらい後に EKS クラスターが出来上がります。

eksctl create cluster -f cluster.yaml

但し、上記のコマンドは、eksctlが eks cluster を作るだけではなく、勝手にvpcを新規で作ってしまいます。

では、　どうやって既存 VPC に eks cluster を構築するのでしょうか？

ざっくり言うと、 cluster yaml に vpc の設定を入れるのです。下記のよう

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: basic-cluster
  region: eu-north-1

vpc:
  subnets:
    private:
      eu-north-1a: { id: subnet-praa }
      eu-north-1b: { id: subnet-prcc }
      eu-north-1c: { id: subnet-prdd }
    public:
      eu-north-1a: { id: subnet-puaa }
      eu-north-1b: { id: subnet-pucc }
      eu-north-1c: { id: subnet-pudd }

でも、この場合には、罠が結構あります。本文では筆者がづらかったことと注意点をまとめておきます。
（公式ドキュメントのこの３つ 1 2 3 を丹心に読んだら罠をうまく避けるはずですが、簡単なまとめをまず読んでみたい方は、本文を読んでもいいと思います。)

node が kubenetes cluster に join できない問題

現象 : eksctl で既存 VPC に cluster を作る時に、すべての node が cluster に join できません。下記のエラーになります。

Error: timed out (after 25m0s) waiting for at least 1 nodes to join the cluster and become ready in "node-group-hoge"

kubectl で nodes を見ると、nodeのStatusが NotReady になっています。

kubectl get nodes
NAME                                                STATUS     ROLES    AGE   VERSION
ip-10-113-208-246.ap-northeast-1.compute.internal   NotReady   <none>   10h   v1.17.11-eks-cfdc40

原因は node がインタネット接続できない

ssh で node にログインして見ると、インタネット接続できないことを気づきました。

原因は、
public subnet に置かれる node が自動にグローバルIPを付けてないため、インタネットアクセスできなくで、 kubenetes master api にアクセスできませんでした。

ついてに、 private subnet に node を置く場合もまとめてみると、インタネットアクセスできるようにする方法は：

public subnet に node を置く場合： subnet の map_public_ip_on_launch を true に設定すべき
private subnet に node を置く場合： subnet の nat gateway をちゃんと設定すべき

ついてに、高可用性のため、正しい eks ネットワーク環境の構成は下記だと思います：

VPC
internet gateway
public subnet が 3つ
- それぞれのアベイラビリティーゾーンに置く
- internet gateway に関する route table等の設定
nat gateway が3つ
- nat gateway の subnet がそれぞれの public subnetに設定
private subnet が 3つ
- それぞれのアベイラビリティーゾーンに置く
- 同じアベイラビリティーゾーンに置かれる nat gateway を使う
- route table 等の設定

VPC と subnet の Tag 対応

vpc subnet tagging の文章によると、上記だけではなく、 VPC と subnet に Tag をつけることも必要になります。

VPC に下記のタグをつける
- kubernetes.io/cluster/<cluster-name> => shared
subnet に下記のタグをつける
- kubernetes.io/cluster/<cluster-name> => shared
public subnet に下記のタグをつける
- kubernetes.io/role/elb => 1
private subnet に下記のタグをつける
- kubernetes.io/role/internal-elb => 1

一部 node が cluster に join できますが、一部は join できない問題

現象： eksctl で cluster を作る時に、一部 node が Ready になっていますが、一部は NotReady になっています

Error: timed out (after 25m0s) waiting for at least 2 nodes to join the cluster and become ready in "tidb"

kubectl get nodes
NAME                                                STATUS     ROLES    AGE   VERSION
ip-10-113-208-238.ap-northeast-1.compute.internal   Ready      <none>   10h   v1.17.11-eks-cfdc40  # node-group-hoge
ip-10-113-208-246.ap-northeast-1.compute.internal   NotReady   <none>   10h   v1.17.11-eks-cfdc40 # node-group-fuga
ip-10-113-208-253.ap-northeast-1.compute.internal   Ready      <none>   10h   v1.17.11-eks-cfdc40 # node-group-fuga

問題になったサーバーに入って見ると amazon-k8s-cni がエラーになっていたっぽいです。

[ec2-user@ip-10-113-208-246 ~]$ ps -ef|grep kube
root      4611     1  1 Nov10 ?        00:09:08 /usr/bin/kubelet --node-ip=10.113.208.246 --node-labels=alpha.eksctl.io/nodegroup-name=tidb,dedicated=tidb,alpha.eksctl.io/cluster-name=dig-ec-tidb-cluster,node-lifecycle=on-demand,alpha.eksctl.io/instance-id=i-02f495832c6f511a6 --max-pods=29 --register-node=true --register-with-taints=dedicated=tidb:NoSchedule --cloud-provider=aws --container-runtime=docker --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --pod-infra-container-image=602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause:3.1-eksbuild.1 --kubeconfig=/etc/eksctl/kubeconfig.yaml --config=/etc/eksctl/kubelet.yaml
root      5629  5598  0 Nov10 ?        00:00:15 kube-proxy --v=2 --config=/var/lib/kube-proxy-config/config

[ec2-user@ip-10-113-208-246 ~]$ docker ps --all
CONTAINER ID        IMAGE                                                                        COMMAND                  CREATED             STATUS                            PORTS               NAMES
e9f4a894192b        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/amazon-k8s-cni             "/bin/sh -c /app/ent…"   3 minutes ago       Exited (137) About a minute ago                       k8s_aws-node_aws-node-c7dmc_kube-system_e7c4fe7c-8621-4b8b-9cb4-68aedfba905f_157
a128c1ebcd69        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy             "kube-proxy --v=2 --…"   11 hours ago        Up 11 hours                                           k8s_kube-proxy_kube-proxy-b8cst_kube-system_13a4b302-0e5d-4c2b-847a-c6c1fa7e15a7_0
f71de18a5c35        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/amazon-k8s-cni-init        "/bin/sh -c /init/in…"   11 hours ago        Exited (0) 11 hours ago                               k8s_aws-vpc-cni-init_aws-node-c7dmc_kube-system_e7c4fe7c-8621-4b8b-9cb4-68aedfba905f_0
a3ea54d25272        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause:3.1-eksbuild.1   "/pause"                 11 hours ago        Up 11 hours                                           k8s_POD_aws-node-c7dmc_kube-system_e7c4fe7c-8621-4b8b-9cb4-68aedfba905f_0
6cd9cfff6472        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause:3.1-eksbuild.1   "/pause"                 11 hours ago        Up 11 hours                                           k8s_POD_kube-proxy-b8cst_kube-system_13a4b302-0e5d-4c2b-847a-c6c1fa7e15a7_0

Exited (137) About a minute ago の docker 気になります

[ec2-user@ip-10-113-208-246 ~]$ docker logs e9f4a894192b
{"level":"info","ts":"2020-11-11T01:16:13.871Z","caller":"entrypoint.sh","msg":"Install CNI binary.."}
{"level":"info","ts":"2020-11-11T01:16:13.889Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2020-11-11T01:16:13.897Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}

ログをみても原因を特定できていないですが、 aws console で subnet 状況をみたら、「利用可能な IPv4」が 0 になったことに気になりました。

原因は IP Address が足りない

cluster の log を enable にすると、正しいエラーメッセージを見つけました。

eksctl utils update-cluster-logging --enable-types=all --region=ap-northeast-1 --cluster=dig-ec-tidb-cluster --approve

Error: InvalidRequestException: Provided subnets subnet-08e***df Free IPs: 2 subnet-0e***02 Free IPs: 0 , need at least 3 IPs in each subnet to be free for this operation
{
  RespMetadata: {
    StatusCode: 400,
    RequestID: "1638fedc-9599-42da-973a-ad12dae6541e"
  },
  Message_: "Provided subnets subnet-08efd8c4cc627d9df Free IPs: 2 subnet-0e51****02 Free IPs: 0 , need at least 3 IPs in each subnet to be free for this operation"
}

開発環境のVPCなので、VPC の CIDR は /24 で、各 subnet に CIDR が /28 で、1つの subnet に IPv4 のが 16 個しかありませんでした。

IPが足りないため、 VPC の CIDR の /24 配下のすべての ip-range を使ってみました。

public subnet が 2つ : CIDR が /26 で 64 個 IP address
private subnet が 2つ : CIDR が /26 で 64 個 IP address

すべての subnet が合わせて 256 個 IP address で全部です。

これで cluster 自体が構築できましたが、 IP address がまた大量に使われているようで、 private subnet の「利用可能な IPv4」がなんと僅か 8 になっています。

なぜ EKS がこんなに大量な IP address を使うのか、AWSに問い合わせてみました。
結論は

AWS EKS の CNI が amazon-vpc-cni-k8s というもの
利点としては、pod に普通な VPC の IP をつけることができて、Security Groupの設定などを柔軟に設定可能
pod を作る時に迅速に IP address をつけるため、ある程度事前に IP Address を確保することで、 Warm Pool という機構が存在する
Warm Pool のサイズが調整可能
- 小さく調整したら、 pod が auto-scaling の時に迅速に対応できないかもしれないので要注意

解決方法

ということで、うちの場合はそんなに迅速に大量 scale-out しないので、早速調整してみます。

kubectl set env ds aws-node -n kube-system AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true
kubectl set env ds aws-node -n kube-system WARM_IP_TARGET=3

暫く待ったら利用可能なIPv4 の数がある程度に戻りました。

まとめ

既存 VPC に EKS を構築する時に、下記のことを注意したほうがいいと思います

VPC のネットワーク環境整備
インタネット接続できる状態にしておく
複数アベイラビリティーゾーン (AZ) を持つこと
VPC/Subnet に Tag をつけること
IP Address の計画
WARM_IP_TARGET の環境変数で Warm Pool のサイズを調整すること

TODO:
私の実験には node が kubenetes master にアクセスする endpoint はグローバルIPになっていたこと、気になります。
完全に private kubenetes cluster を構築する場合もあるので、下記のドキュメントを参考して続けて検証しておきたいです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up