#ContainerCreating - EKSとCNIをアップグレードした結果
先週事前にEKSで1.12を作った後、MasterとWorker Nodesをアップグレードするテストは行ってました。
この時のテストDev Clusterには
- 5 namespaces
- 3 pods
- r5.xlarge x 2
- EKS 1.12 -> 1.13
- CNIは変わらず1.5
割とすんなり出来たので、金曜にStagingをアップグレードした結果、ContainerCreatingとなってうまくPodが走らなかった。
Staging cluster
- 9 namespaces
- 28 pods in each AZx2
- r5.xlarge x 2
- EKS 1.12 -> 1.13
- CNI 1.32 -> 1.50
他社の運用例を知らないからこれが普通なのかわからないけど、うちはCloud FormationとEKSを使ってこんな形で運用してます。
jsonnet > compile > manifest yaml化 > deploy
これを機にEKS, ENI, CNI, Primary & Secondary IP, IPAMD, L-IPAM,その他のドキュメントを読みました。
##Official Document - Upgrade guide
https://github.com/awslabs/amazon-eks-ami
https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html
https://docs.aws.amazon.com/eks/latest/userguide/update-stack.html
##Proposal: CNI plugin for Kubernetes networking over AWS VPC
https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/cni-proposal.md
##amazon-vpc-cni-k8s
https://github.com/aws/amazon-vpc-cni-k8s
2 components:
- CNI Plugin
- L-IPAMD
##Possible Issues
Pods stuck in ContainerCreating due to CNI Failing to Assing IP to Container Until aws-node is deleted #59
https://github.com/aws/amazon-vpc-cni-k8s/issues/59
Leaking Network Interfaces (ENI) #69
https://github.com/aws/amazon-vpc-cni-k8s/issues/69
##ENI and VPC
- Each ENI has a description set as "aws-K8S-'instance-id'"
- Can be attached to an instance in a VPC
- The primary ENI IP address is automatically assigned
- All secondary addresses remain unassigned and it's up to the host owner as to how to configure them.
- Each instance can have multiple ENI and each ENI can have multiple IPv4 or IPv6 addresses.
##L-IPAM (node-Local IP Address Management)
-
a daemon which is responsible for:
- maintaining a warm-pool of available IP addresses
- assigning an IP address to a Pod
-
scenario 1 : available IP addresses < min threshold
- create a new ENI and attach it to instance
- allocate all available IP addresses on this new ENI
- once these IP addresses become available -> add these IP addresses to warm-pool (instance's metadata service is used)
-
scenario 2 : available IP addresses > max threshold
- pick an ENI where all of its secondary IP address are in warm-pool
- detach the ENI interface and free it to EC2-VPC ENI pool
##Pod IP address cooling period
- Used to prevent CNI plugin recycle this Pod's IP address and assign to a new Pod before controller has finished updating all nodes in the cluster about this deleted pod.
- scenario : When a Pod is deleted
- The Pod IP address -> "cooling mode" for a period for 30 seconds
- When the cooling period expires, this Pod IP -> warm-pool (recycle)
##IPAMD (Internet Protocol address management)
- Allocates ENIs and secondary IP addresses from the instance subnet.
- If a subnet runs out of IP addresses
- ipamD will not able to get secondary IP addresses -> may get stuck in "ContainerCreating"
##ENI Allocation
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI
-
計算方法
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html -
とても使いやすいサイト。Max ENIsやMax IPsとかわかります。
https://www.ec2instances.info
##Log Location
/var/log/aws-routed-eni
##Troubleshooting 便利コマンドまとめ
- https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md
- https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/cni-proposal.md
###ipamD debugging commands
collecting node level tech-support bundle for offline troubleshooting
/opt/cni/bin/aws-cni-support.sh
get enis info
curl http://localhost:61679/v1/enis | python -m json.tool
get IP assignment info
curl http://localhost:61679/v1/pods | python -m json.tool
get ipamD metrics
curl http://localhost:61678/metrics
###L-IPAM (Local IP Address Manager)
retrieve all attached ENIs
curl http://169.254.169.254/latest/meta-data/network/interfaces/macs/
retrieve all IPv4 addresses on an ENI
curl http://169.254.169.254/latest/meta-data/network/interfaces/macs/<MAC address>/local-ipv4s
###Inside a Pod
IP address
ip addr show
routes
ip route show
###On Host side
to Pod traffic
ip route show
pod is allocated with one of the ENI's secondary IP address
ip route show table eni-1
to and from Pods
ip rule list
##便利そうなもの
cni-metrics-helper
https://github.com/aws/amazon-vpc-cni-k8s/blob/master/cni-metrics-helper/README.md
##学んだこと
- Node上でのトラブルシュートに便利なコマンド
- r5.2xlargeだと以下なのでIPはまだ足りたはず。もう一度Devで作り直して今度は上のトラブルシュートに沿って調べていこうという話になった。
| API Name | Memory | vCPUs | Max IPs | Max ENIs |
|-----|-----|---|---|---|---|
| r5.2xlarge | 64.0 GiB | 8 vCPUs | 60 | 4 |