Ubuntu で Kubeadm により k8s 環境構築実験(3)

Last updated at 2024-11-03Posted at 2024-11-03

Ubuntu で Kubeadm により k8s 環境構築実験(2)からの続きです。

2台めのワーカーノードを追加してみようとしたら、kubeadm join でエラーになりました。

 kubeadm join 10.0.11.67:6443 --token <TOKEN> \
        --discovery-token-ca-cert-hash sha256:<HASH>
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: failed to request the cluster-info ConfigMap: Get "https://10.0.11.67:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": context deadline exceeded
To see the stack trace of this error execute with --v=5 or higher

10.0.11.67 は、マスターノードの IP アドレスです。
マスターノードが壊れてしまったのでしょうか。

新しいワーカーノードを追加するどころか、マスターノードもおかしくなっているようです。
これは、新しいワーカーノードの追加をあきらめ、なんとかマスターノードを直さないといけないです。

マスターノードで、もう一度、sudo kubeadm init を実行します。
kube-apiserver, kube-controller-manager, kube-scheduler, etcd が立ち上がっていたら、sudo kill 9 しておきます。

ubuntu@Master-Node:~$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
[init] Using Kubernetes version: v1.31.2
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
        [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

preflightcheck をパスすればいいのかもしれないのですが、オプションに何を渡したらいいのかわからないので、それぞれ、削除しました。

ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-apiserver.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-controller-manager.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-scheduler.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/etcd.yaml
ubuntu@Master-Node:~$ sudo rm -rf /var/lib/etcd

もう一度、kubeadm init を実行します。

ubuntu@Master-Node:~$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
[init] Using Kubernetes version: v1.31.2
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
        [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-apiserver.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-controller-manager.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/kube-scheduler.yaml
ubuntu@Master-Node:~$ sudo rm /etc/kubernetes/manifests/etcd.yaml
ubuntu@Master-Node:~$ sudo rm -rf /var/lib/etcd
ubuntu@Master-Node:~$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
[init] Using Kubernetes version: v1.31.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
W1102 23:24:18.467182   11919 checks.go:846] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.k8s.io/pause:3.10" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/super-admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.670093ms
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is healthy after 20.502683967s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node master-node as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node master-node as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: fx9wwh.8br2g9eincnjdgex
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.0.11.67:6443 --token <TOKEN> \
        --discovery-token-ca-cert-hash sha256:<HASH>

このコマンドを実行します。

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

一応直ったみたい。

$ kubectl get nodes
NAME          STATUS   ROLES           AGE     VERSION
master-node   Ready    control-plane   5m14s   v1.31.2

では、ワーカーノード1で、kubeadm join を実行してみます。

ubuntu@Worker-Node:~$ sudo kubeadm join 10.0.11.67:6443 --token <TOKEN>         --discovery-token-ca-cert-hash sha256:<HASH>
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
        [ERROR Port-10250]: Port 10250 is in use
        [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

ファイルを消します。

ubuntu@Worker-Node:~$ sudo rm -rf /etc/kubernetes/kubelet.conf
ubuntu@Worker-Node:~$ sudo rm -rf /etc/kubernetes/pki/ca.crt

port 10250 は、kubelet が使っています。

ubuntu@Worker-Node:~$ sudo systemctl stop kubelet

もう一度実行します。

ubuntu@Worker-Node:~$ sudo kubeadm join 10.0.11.67:6443 --token <TOKEN>         --discovery-token-ca-cert-hash sha256:<HASH>
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 502.248124ms
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
error execution phase kubelet-start: error uploading crisocket: Unauthorized
To see the stack trace of this error execute with --v=5 or higher

ここで、kubectl get nodes を実行します。

ubuntu@Worker-Node:~$ kubectl get nodes
E1102 23:57:03.708258    2443 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1102 23:57:03.714075    2443 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1102 23:57:03.718729    2443 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1102 23:57:03.723403    2443 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
E1102 23:57:03.728156    2443 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.0.11.67:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"
Unable to connect to the server: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

だめです。
マスターノードから、admin.confをコピーしました。
このコマンドも実行します。

ubuntu@Worker-Node:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
ubuntu@Worker-Node:~$ sudo rm /etc/kubernetes/kubelet.conf
ubuntu@Worker-Node:~$ sudo rm /etc/kubernetes/pki/ca.crt
ubuntu@Worker-Node:~$ sudo systemctl stop kubelet

再度の join コマンドの実行で、このエラーとなりました。

error execution phase kubelet-start: error uploading crisocket: Unauthorized

--v=5 を追加して実行したところ、containerd のソケットファイルが邪魔をしているようでした。

I1103 00:17:53.042825    3287 initconfiguration.go:123] detected and using CRI socket: unix:///var/run/containerd/containerd.sock

削除します。

ubuntu@Worker-Node:~$ sudo rm /var/run/containerd/containerd.sock
ubuntu@Worker-Node:~$ sudo systemctl restart containerd

再度 join をしてみても、エラーとなりました。
このコマンドを実行します。

ubuntu@Worker-Node:~$ sudo swapoff -a    # will turn off the swap 
ubuntu@Worker-Node:~$ sudo kubeadm reset
ubuntu@Worker-Node:~$ sudo systemctl daemon-reload
ubuntu@Worker-Node:~$ sudo systemctl restart kubelet
ubuntu@Worker-Node:~$ sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X

さあ、join できるようになったかな。

ubuntu@Worker-Node:~$ sudo kubeadm join 10.0.11.67:6443 --token <TOKEN>         --discovery-token-ca-cert-hash sha256:<HASH>
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 1.001234209s
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

できました！
では、確認してみます。

ubuntu@Worker-Node:~$ kubectl get nodes
NAME          STATUS   ROLES           AGE    VERSION
master-node   Ready    control-plane   116m   v1.31.2
worker-node   Ready    <none>          13s    v1.31.2

もう一台増やすつもりが、現状の不具合の修正で終わってしまいました。

（参考）Creating a cluster with kubeadm
https://stackoverflow.com/questions/53525975/kubernetes-error-uploading-crisocket-timed-out-waiting-for-the-condition/54540512#54540512

Ubuntu で Kubeadm により k8s 環境構築実験(4)につづく。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up