先日kubernetesでネットワークプラグインをcalicoからflannelに入れ替えた際にcorednsが起動しなくなる、
という事態に遭遇したため、その時の原因と対処法をログとして記載します。
具体的には以下のようにcorednsのSTATUSがContainerCreatingのままで止まってしまい、何時までもRunningになりませんでした。
sho@Desktop $ kubectl get po -n kube-system [~/workspace/k8s]
NAME READY STATUS RESTARTS AGE
coredns-66bff467f8-mbqc6 0/1 ContainerCreating 0 6m18s
coredns-66bff467f8-xjr6j 0/1 ContainerCreating 0 4m33s
etcd-vagrant 1/1 Running 8 21h
kube-apiserver-vagrant 1/1 Running 8 21h
kube-controller-manager-vagrant 1/1 Running 8 21h
kube-flannel-ds-amd64-gbwbm 1/1 Running 0 5m15s
kube-proxy-8zfjf 1/1 Running 8 21h
kube-scheduler-vagrant 1/1 Running 8 21h
そこでPodをdescribeしてみると「networkPlugin cni failed to set up」と表示されています。
どうやらflannelを起動する際に作られるはずのcni0インターフェースの作成に失敗しているようです。
sho@Desktop $ k describe po -n kube-system coredns-66bff467f8-mbqc6 [~/workspace/k8s]
Name: coredns-66bff467f8-mbqc6
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: vagrant/10.0.2.15
Start Time: Sun, 12 Apr 2020 21:48:08 +0900
Labels: k8s-app=kube-dns
pod-template-hash=66bff467f8
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/coredns-66bff467f8
Containers:
coredns:
Container ID:
Image: k8s.gcr.io/coredns:1.6.7
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-rrhhr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-rrhhr:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-rrhhr
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m26s default-scheduler Successfully assigned kube-system/coredns-66bff467f8-mbqc6 to vagrant
Warning FailedCreatePodSandBox 3m24s kubelet, vagrant Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "414eadf5bc603455533b031b30c86a22b3b3fb8aae0b4114842d6928bc7e9a86" network for pod "coredns-66bff467f8-mbqc6": networkPlugin cni failed to set up pod "coredns-66bff467f8-mbqc6_kube-system" network: error getting ClusterInformation: connection is unauthorized: Unauthorized, failed to clean up sandbox container "414eadf5bc603455533b031b30c86a22b3b3fb8aae0b4114842d6928bc7e9a86" network for pod "coredns-66bff467f8-mbqc6": networkPlugin cni failed to teardown pod "coredns-66bff467f8-mbqc6_kube-system" network: error getting ClusterInformation: connection is unauthorized: Unauthorized]
Normal SandboxChanged 8s (x16 over 3m24s) kubelet, vagrant Pod sandbox changed, it will be killed and re-created.
sho@Desktop $
実際にVM環境にログインしてネットワーク・インターフェースの一覧を見ると、確かにcni0が存在しません。
(Vagrant環境にkubernetesをインストールし、kubectlコマンドはmacで実行しています)
vagrant@vagrant:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:76:b9:09 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
valid_lft 85863sec preferred_lft 85863sec
inet6 fe80::a00:27ff:fe76:b909/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:0c:8f:7c brd ff:ff:ff:ff:ff:ff
inet 192.168.33.11/24 brd 192.168.33.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe0c:8f7c/64 scope link
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:e6:58:68:1b brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
6: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether e6:89:0e:e0:86:99 brd ff:ff:ff:ff:ff:ff
inet 10.244.0.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::e489:eff:fee0:8699/64 scope link
valid_lft forever preferred_lft forever
vagrant@vagrant:~$
当初はどうしてcni0の作成に失敗しているか分からなかったのですが、
調べていくうちに別のネットワークプラグイン(calico)を利用していたことが原因であることが分かり、
問題のもとになっているファイルがあることが分かりました。
vagrant@vagrant:~$ ll /etc/cni/net.d/
total 20
drwxr-xr-x 2 root root 4096 Apr 12 12:49 ./
drwxr-xr-x 3 root root 4096 Apr 11 14:56 ../
-rw-rw-r-- 1 root root 526 Apr 12 12:01 10-calico.conflist
-rw-r--r-- 1 root root 292 Apr 12 12:49 10-flannel.conflist
-rw------- 1 root root 2623 Apr 12 12:01 calico-kubeconfig
vagrant@vagrant:~$
「10-calico.conflist」、「calico-kubeconfig」を削除すると、corednsのSTATUSがRunningになりました。