Ubuntu で Kubeadm により k8s 環境構築実験(2)からの続きです。
2台めのワーカーノードを追加してみようとしたら、kubeadm join でエラーになりました。
kubeadm join 10.0.11.67:6443 --token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH>
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: failed to request the cluster-info ConfigMap: Get "https://10.0.11.67:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": context deadline exceeded
To see the stack trace of this error execute with --v=5 or higher
10.0.11.67 は、マスターノードの IP アドレスです。
マスターノードが壊れてしまったのでしょうか。
新しいワーカーノードを追加するどころか、マスターノードもおかしくなっているようです。
これは、新しいワーカーノードの追加をあきらめ、なんとかマスターノードを直さないといけないです。
マスターノードで、もう一度、sudo kubeadm init を実行します。
kube-apiserver, kube-controller-manager, kube-scheduler, etcd が立ち上がっていたら、sudo kill 9 しておきます。
ubuntu@Master-Node:~$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
[init] Using Kubernetes version: v1.31.2
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
preflightcheck をパスすればいいのかもしれないのですが、オプションに何を渡したらいいのかわからないので、それぞれ、削除しました。
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-apiserver.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-controller-manager.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-scheduler.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/etcd.yaml
ubuntu@Master-Node:~$ sudo rm -rf /var/lib/etcd
もう一度、kubeadm init を実行します。
ubuntu@Master-Node:~$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
[init] Using Kubernetes version: v1.31.2
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-apiserver.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-controller-manager.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-scheduler.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/etcd.yaml
ubuntu@Master-Node:~$ sudo rm -rf /var/lib/etcd
ubuntu@Master-Node:~$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
[init] Using Kubernetes version: v1.31.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
W1102 23:24:18.467182 11919 checks.go:846] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.k8s.io/pause:3.10" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/super-admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.670093ms
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is healthy after 20.502683967s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node master-node as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node master-node as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: fx9wwh.8br2g9eincnjdgex
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.0.11.67:6443 --token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH>
このコマンドを実行します。
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
一応直ったみたい。
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-node Ready control-plane 5m14s v1.31.2
では、ワーカーノード1で、kubeadm join を実行してみます。
ubuntu@Worker-Node:~$ sudo kubeadm join 10.0.11.67:6443 --token <TOKEN> --discovery-token-ca-cert-hash sha256:<HASH>
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
ファイルを消します。
ubuntu@Worker-Node:~$ sudo rm -rf /etc/kubernetes/kubelet.conf
ubuntu@Worker-Node:~$ sudo rm -rf /etc/kubernetes/pki/ca.crt
port 10250 は、kubelet が使っています。
ubuntu@Worker-Node:~$ sudo systemctl stop kubelet
もう一度実行します。
ubuntu@Worker-Node:~$ sudo kubeadm join 10.0.11.67:6443 --token <TOKEN> --discovery-token-ca-cert-hash sha256:<HASH>
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 502.248124ms
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
error execution phase kubelet-start: error uploading crisocket: Unauthorized
To see the stack trace of this error execute with --v=5 or higher
ここで、kubectl get nodes を実行します。
ubuntu@Worker-Node:~$ kubectl get nodes
E1102 23:57:03.708258 2443 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1102 23:57:03.714075 2443 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1102 23:57:03.718729 2443 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1102 23:57:03.723403 2443 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1102 23:57:03.728156 2443 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
Unable to connect to the server: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
だめです。
マスターノードから、admin.confをコピーしました。
このコマンドも実行します。
ubuntu@Worker-Node:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
ubuntu@Worker-Node:~$ sudo rm /etc/kubernetes/kubelet.conf
ubuntu@Worker-Node:~$ sudo rm /etc/kubernetes/pki/ca.crt
ubuntu@Worker-Node:~$ sudo systemctl stop kubelet
再度の join コマンドの実行で、このエラーとなりました。
error execution phase kubelet-start: error uploading crisocket: Unauthorized
--v=5 を追加して実行したところ、containerd のソケットファイルが邪魔をしているようでした。
I1103 00:17:53.042825 3287 initconfiguration.go:123] detected and using CRI socket: unix:///var/run/containerd/containerd.sock
削除します。
ubuntu@Worker-Node:~$ sudo rm /var/run/containerd/containerd.sock
ubuntu@Worker-Node:~$ sudo systemctl restart containerd
再度 join をしてみても、エラーとなりました。
このコマンドを実行します。
ubuntu@Worker-Node:~$ sudo swapoff -a # will turn off the swap
ubuntu@Worker-Node:~$ sudo kubeadm reset
ubuntu@Worker-Node:~$ sudo systemctl daemon-reload
ubuntu@Worker-Node:~$ sudo systemctl restart kubelet
ubuntu@Worker-Node:~$ sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X
さあ、join できるようになったかな。
ubuntu@Worker-Node:~$ sudo kubeadm join 10.0.11.67:6443 --token <TOKEN> --discovery-token-ca-cert-hash sha256:<HASH>
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 1.001234209s
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
できました!
では、確認してみます。
ubuntu@Worker-Node:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-node Ready control-plane 116m v1.31.2
worker-node Ready <none> 13s v1.31.2
もう一台増やすつもりが、現状の不具合の修正で終わってしまいました。
(参考)Creating a cluster with kubeadm
https://stackoverflow.com/questions/53525975/kubernetes-error-uploading-crisocket-timed-out-waiting-for-the-condition/54540512#54540512