Ubuntu で Kubeadm により k8s 環境構築実験(3)からの続きです。
前回は、おかしくなってしまったクラスタを修復するので終わってしまったので、今回こそは、ワーカーノード2を一から追加してみたいです。
ubuntu@Worker-Node2:~$ sudo kubeadm join 10.0.11.67:6443 --token <TOKEN> --discovery-token-ca-cert-hash sha256:<HASH>
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 524.612332ms
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
前回までに必要な設定を行っていたために、新しい join コマンドの設定で、クラスタに参加できました。
確認します。
ubuntu@Worker-Node2:~$ kubectl get nodes
E1103 18:11:38.097921 66164 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1103 18:11:38.107360 66164 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1103 18:11:38.115637 66164 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1103 18:11:38.124100 66164 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1103 18:11:38.132349 66164 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
Unable to connect to the server: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
あれ、おかしいな。
マスターノードで確認してみます。
ubuntu@Master-Node:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-node Ready control-plane 18h v1.31.2
worker-node Ready <none> 16h v1.31.2
worker-node2 NotReady <none> 4m10s v1.31.2
STATUS が NotReady となっています。
.kube/config を確認すると、client-key-dataマスターノードのものと異なっています。
マスターノードに合わせます。もう一度確認します。
ubuntu@Worker-Node2:~$ kubectl get nodes
error: tls: private key does not match public key
うーん、一度、worker-node2をクラスタから削除したいです。
マスターノードで、次のコマンドを実行します。
ubuntu@Master-Node:~$ kubectl drain worker-node2 --force --ignore-daemonsets
node/worker-node2 cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-z72lc
node/worker-node2 drained
ubuntu@Master-Node:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-node Ready control-plane 19h v1.31.2
worker-node Ready <none> 17h v1.31.2
worker-node2 NotReady,SchedulingDisabled <none> 22m v1.31.2
ワーカーノード2では、次のコマンドを実行します。
ubuntu@Worker-Node2:~$ sudo kubeadm reset
W1103 18:33:03.636816 68759 preflight.go:56] [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W1103 18:33:05.753008 68759 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Deleted contents of the etcd data directory: /var/lib/etcd
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/super-admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
マスターノードでワーカーノード2を削除します。
ubuntu@Master-Node:~$ kubectl delete node worker-node2
node "worker-node2" deleted
ubuntu@Master-Node:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-node Ready control-plane 19h v1.31.2
worker-node Ready <none> 17h v1.31.2
もう一度 join を実施しても、変化なしです。
確認すると、ワーカーノードの /etc/kubernetes/admin.conf が存在していません。
マスターノードのものをコピーして、このコマンドを実行します。
ubuntu@Worker-Node2:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp: overwrite '/home/ubuntu/.kube/config'? y
確認します。
ubuntu@Worker-Node2:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-node Ready control-plane 19h v1.31.2
worker-node Ready <none> 17h v1.31.2
worker-node2 NotReady <none> 9m2s v1.31.2
worker-node2 が NotReadyとなっているのはなぜなんだろう。node を describe してみます。
ubuntu@Worker-Node2:~$ kubectl describe node worker-node2
Name: worker-node2
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=worker-node2
kubernetes.io/os=linux
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 03 Nov 2024 18:55:10 +0000
Taints: node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: worker-node2
AcquireTime: <unset>
RenewTime: Sun, 03 Nov 2024 19:15:44 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Sun, 03 Nov 2024 19:14:22 +0000 Sun, 03 Nov 2024 18:55:10 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 03 Nov 2024 19:14:22 +0000 Sun, 03 Nov 2024 18:55:10 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sun, 03 Nov 2024 19:14:22 +0000 Sun, 03 Nov 2024 18:55:10 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Sun, 03 Nov 2024 19:14:22 +0000 Sun, 03 Nov 2024 18:55:10 +0000 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Addresses:
InternalIP: 10.0.12.218
Hostname: worker-node2
Capacity:
cpu: 2
ephemeral-storage: 7034376Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 936100Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 6482880911
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 833700Ki
pods: 110
System Info:
Machine ID: ec29f53375b92e24d6186a6cad778ae0
System UUID: ec29f533-75b9-2e24-d618-6a6cad778ae0
Boot ID: e39c4aa5-dda2-493d-94b3-130e61668978
Kernel Version: 6.8.0-1016-aws
OS Image: Ubuntu 24.04.1 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://2.0.0-rc.6
Kubelet Version: v1.31.2
Kube-Proxy Version: v1.31.2
PodCIDR: 192.168.4.0/24
PodCIDRs: 192.168.4.0/24
Non-terminated Pods: (1 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system kube-proxy-hk462 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 51m kubelet Starting kubelet.
Warning InvalidDiskCapacity 51m kubelet invalid capacity 0 on image filesystem
Normal NodeAllocatableEnforced 51m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 51m kubelet Node worker-node2 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 51m kubelet Node worker-node2 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 51m kubelet Node worker-node2 status is now: NodeHasSufficientPID
Normal NodeNotSchedulable 44m kubelet Node worker-node2 status is now: NodeNotSchedulable
Warning InvalidDiskCapacity 37m kubelet invalid capacity 0 on image filesystem
Normal NodeHasSufficientMemory 37m (x2 over 37m) kubelet Node worker-node2 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 37m (x2 over 37m) kubelet Node worker-node2 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 37m (x2 over 37m) kubelet Node worker-node2 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 37m kubelet Updated Node Allocatable limit across pods
Normal Starting 31m kubelet Starting kubelet.
Warning InvalidDiskCapacity 31m kubelet invalid capacity 0 on image filesystem
Normal NodeAllocatableEnforced 31m kubelet Updated Node Allocatable limit across pods
Normal NodeNotSchedulable 23m kubelet Node worker-node2 status is now: NodeNotSchedulable
Normal NodeSchedulable 22m kubelet Node worker-node2 status is now: NodeSchedulable
Normal NodeHasSufficientMemory 22m (x7 over 31m) kubelet Node worker-node2 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 22m (x7 over 31m) kubelet Node worker-node2 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 22m (x7 over 31m) kubelet Node worker-node2 status is now: NodeHasSufficientPID
Warning InvalidDiskCapacity 20m kubelet invalid capacity 0 on image filesystem
Normal Starting 20m kubelet Starting kubelet.
Normal NodeHasNoDiskPressure 20m (x2 over 20m) kubelet Node worker-node2 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientMemory 20m (x2 over 20m) kubelet Node worker-node2 status is now: NodeHasSufficientMemory
Normal NodeHasSufficientPID 20m (x2 over 20m) kubelet Node worker-node2 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 20m kubelet Updated Node Allocatable limit across pods
Normal RegisteredNode 20m node-controller Node worker-node2 event: Registered Node worker-node2 in Controller
Normal Starting 11m kubelet Starting kubelet.
Warning InvalidDiskCapacity 11m kubelet invalid capacity 0 on image filesystem
Normal NodeAllocatableEnforced 11m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 11m kubelet Node worker-node2 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 11m kubelet Node worker-node2 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 11m kubelet Node worker-node2 status is now: NodeHasSufficientPID
ちなみに、正常な Worker-Node の方は、こうなっています。worker-node2 上で、worker-node を確認しました。
ubuntu@Worker-Node2:~$ kubectl describe node worker-node
Name: worker-node
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=worker-node
kubernetes.io/os=linux
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 03 Nov 2024 01:20:35 +0000
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: worker-node
AcquireTime: <unset>
RenewTime: Sun, 03 Nov 2024 19:20:06 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Sun, 03 Nov 2024 19:16:57 +0000 Sun, 03 Nov 2024 01:20:35 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 03 Nov 2024 19:16:57 +0000 Sun, 03 Nov 2024 01:20:35 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sun, 03 Nov 2024 19:16:57 +0000 Sun, 03 Nov 2024 01:20:35 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 03 Nov 2024 19:16:57 +0000 Sun, 03 Nov 2024 01:20:36 +0000 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.0.11.173
Hostname: worker-node
Capacity:
cpu: 2
ephemeral-storage: 7034376Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 936104Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 6482880911
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 833704Ki
pods: 110
System Info:
Machine ID: ec2ee19a652380319af6e6df73b52a9d
System UUID: ec2ee19a-6523-8031-9af6-e6df73b52a9d
Boot ID: 692f111f-94a0-4cd7-9bec-8f8003122da5
Kernel Version: 6.8.0-1017-aws
OS Image: Ubuntu 24.04.1 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://2.0.0-rc.6
Kubelet Version: v1.31.2
Kube-Proxy Version: v1.31.2
PodCIDR: 192.168.1.0/24
PodCIDRs: 192.168.1.0/24
Non-terminated Pods: (1 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system kube-proxy-qdpkp 0 (0%) 0 (0%) 0 (0%) 0 (0%) 17h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>
Events にいろいろ出ているようです。どうしたら Ready になるのか。何が悪いのか。
(参考)
https://monowar-mukul.medium.com/kubernetes-remove-worker-node-from-the-cluster-and-completely-uninstall-af41e00c1244
https://komodor.com/learn/how-to-fix-kubernetes-node-not-ready-error/