0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Ubuntu Server 20.04 LTSにKubernetes1.21(cert-manager,ingress-nginx,device-plugin[cuda])をセットアップ

Last updated at Posted at 2021-10-05

Officialガイドに従いインストール

kubeadmのインストール | Kubernetesに従ってコマンド実行

※Versionは「1.21」を指定。(最新1.22では、ingress-nginxのPodが立ち上がらないため)

#iptablesがブリッジを通過するトラフィックを処理できるようにする
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system

#iptablesがnftablesバックエンドを使用しないようにする 
sudo apt-get install -y iptables arptables ebtables

sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo update-alternatives --set arptables /usr/sbin/arptables-legacy
sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy

# Dockerインストール
apt-get update && apt-get install -y \
  apt-transport-https ca-certificates curl software-properties-common gnupg2

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -

add-apt-repository \
  "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) \
  stable"

apt-get update && apt-get install -y \
containerd.io=1.2.13-2 \
docker-ce=5:19.03.11~3-0~ubuntu-$(lsb_release -cs) \
docker-ce-cli=5:19.03.11~3-0~ubuntu-$(lsb_release -cs)

# デーモンをセットアップ
cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

mkdir -p /etc/systemd/system/docker.service.d

# dockerを再起動
systemctl daemon-reload
systemctl restart docker
sudo systemctl enable docker

# kubeadm、kubelet、kubectlのインストール 
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
apt-get install -y kubelet=1.21.3-00 kubeadm=1.21.3-00 kubectl=1.21.3-00

Masterノード1台のみのセットアップ

swapoff -a

/etc/fstab
#/swap.img      none    swap    sw      0       0
kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

kubectl taint nodes --all node-role.kubernetes.io/master-

Podインストール

kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml

# しばらくするとPodが立ち上がる
kubectl get pods --all-namespaces
NAMESPACE      NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager   cert-manager-7c6f78c46d-tbkxp              1/1     Running   0          72s
cert-manager   cert-manager-cainjector-668d9c86df-dgmx7   1/1     Running   0          72s
cert-manager   cert-manager-webhook-764b556954-qmfgt      1/1     Running   0          71s
kube-system    calico-kube-controllers-74b8fbdb46-zth25   1/1     Running   0          5m51s
kube-system    calico-node-8l4d2                          1/1     Running   0          5m52s
kube-system    coredns-78fcd69978-dxjbr                   1/1     Running   0          7m1s
kube-system    coredns-78fcd69978-xzqt7                   1/1     Running   0          7m1s
kube-system    etcd-123456                               1/1     Running   0          7m20s
kube-system    kube-apiserver-123456                     1/1     Running   0          7m13s
kube-system    kube-controller-manager-123456            1/1     Running   0          7m17s
kube-system    kube-proxy-h9f5n                           1/1     Running   0          7m1s
kube-system    kube-scheduler-123456                     1/1     Running   0          7m15s

Let's Encrypt自動更新設定

clusterissuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: sample-issuer-prod
spec:
  acme:
    # The ACME server URL
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: yourmail@gmail.com
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: sample.tls.prod
    # Enable the HTTP-01 challenge provider
    solvers:
    - http01:
        ingress:
          class:  nginx
kubectl apply -f clusterissuer.yaml
clusterissuer.cert-manager.io/donaishitanyarobert-issuer-prod created

kubectl describe clusterissuer

Name:         sample-issuer-prod
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  cert-manager.io/v1
Kind:         ClusterIssuer
Metadata:
  Creation Timestamp:  2021-08-11T04:10:43Z
  Generation:          1
  Managed Fields:
    API Version:  cert-manager.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:acme:
          .:
          f:email:
          f:privateKeySecretRef:
            .:
            f:name:
          f:server:
          f:solvers:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2021-08-11T04:10:43Z
    API Version:  cert-manager.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:acme:
          .:
          f:lastRegisteredEmail:
          f:uri:
        f:conditions:
    Manager:         controller
    Operation:       Update
    Time:            2021-08-11T04:10:50Z
  Resource Version:  2088
  UID:               255b505d-b0a4-48aa-8889-eaaa6e92197d
Spec:
  Acme:
    Email:            yourmail@gmail.com
    Preferred Chain:
    Private Key Secret Ref:
      Name:  sample.mydns.jp.tls.prod
    Server:  https://acme-v02.api.letsencrypt.org/directory
    Solvers:
      http01:
        Ingress:
          Class:  nginx
Status:
  Acme:
    Last Registered Email:  yourmail@gmail.com
    Uri:                    https://acme-v02.api.letsencrypt.org/acme/acct/1568585xx
  Conditions:
    Last Transition Time:  2021-08-11T04:10:50Z
    Message:               The ACME account was registered with the ACME server
    Observed Generation:   1
    Reason:                ACMEAccountRegistered
    Status:                True
    Type:                  Ready
Events:                    <none>

Ingress-Nginx-Controller作成

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.47.0/deploy/static/provider/baremetal/deploy.yaml

namespace/ingress-nginx created
serviceaccount/ingress-nginx created
configmap/ingress-nginx-controller created
clusterrole.rbac.authorization.k8s.io/ingress-nginx created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created
role.rbac.authorization.k8s.io/ingress-nginx created
rolebinding.rbac.authorization.k8s.io/ingress-nginx created
service/ingress-nginx-controller-admission created
service/ingress-nginx-controller created
deployment.apps/ingress-nginx-controller created
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created
serviceaccount/ingress-nginx-admission created
clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
role.rbac.authorization.k8s.io/ingress-nginx-admission created
rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
job.batch/ingress-nginx-admission-create created
job.batch/ingress-nginx-admission-patch created

kubectl get pods --all-namespaces
NAMESPACE       NAME                                        READY   STATUS      RESTARTS   AGE
cert-manager    cert-manager-7c6f78c46d-n8dkt               1/1     Running     0          5m4s
cert-manager    cert-manager-cainjector-668d9c86df-t2hzl    1/1     Running     0          5m4s
cert-manager    cert-manager-webhook-764b556954-qm6nf       1/1     Running     0          5m3s
ingress-nginx   ingress-nginx-admission-create-ln2ld        0/1     Completed   0          2m19s
ingress-nginx   ingress-nginx-admission-patch-qqpj4         0/1     Completed   0          2m19s
ingress-nginx   ingress-nginx-controller-55bc4f5576-bnk8j   1/1     Running     0          2m19s
kube-system     calico-kube-controllers-74b8fbdb46-t9qb2    1/1     Running     0          5m22s
kube-system     calico-node-v6qqg                           1/1     Running     0          5m23s
kube-system     coredns-558bd4d5db-4frhq                    1/1     Running     0          6m33s
kube-system     coredns-558bd4d5db-4rrr5                    1/1     Running     0          6m33s
kube-system     etcd-123456                                1/1     Running     0          6m38s
kube-system     kube-apiserver-123456                      1/1     Running     0          6m38s
kube-system     kube-controller-manager-123456             1/1     Running     0          6m38s
kube-system     kube-proxy-4nw5k                            1/1     Running     0          6m33s
kube-system     kube-scheduler-123456                      1/1     Running     0          6m38s

kubectl edit service ingress-nginx-controller -n ingress-nginxして設定編集

kubectl-edit-podel.yaml
- type: NodePort
+ type: LoadBalancer
+ externalIPs:
+ - 192.168.0.xx

サンプルサービス作成

ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: sample-ingress
  annotations:
    cert-manager.io/cluster-issuer: sample-issuer-prod
    cert-manager.io/acme-challenge-type: http01
    kubernetes.io/ingress.class: "nginx"
spec:
  tls:
  - hosts:
    - sample.mydns.jp
    secretName: sample.jp.tls.prod
  rules:
  - host: sample.mydns.jp
    http:
      paths:
      - path: /testpath
        pathType: Prefix
        backend:
          service:
            name: hellok8s-service
            port:
              number: 8080

NVIDIA Dockerインストール

オープンソース版のドライバの無効化

無効化

/etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0

再起動

sudo update-initramfs -u
sudo reboot

ドライバインストール

apt-get -y install gcc make
wget "https://us.download.nvidia.com/XFree86/Linux-x86_64/470.57.02/NVIDIA-Linux-x86_64-470.57.02.run"
bash NVIDIA-Linux-x86_64-470.57.02.run

ランタイムインストール

curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update

apt-get install -y nvidia-container-runtime

Dockerランタイム変更&再起動

/etc/docker/daemon.json
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
+  "storage-driver": "overlay2",
+  "default-runtime": "nvidia",
+  "runtimes": {
+      "nvidia": {
+          "path": "/usr/bin/nvidia-container-runtime",
+          "runtimeArgs": []
+      }
+  }
systemctl restart docker && systemctl enable docker 

docker info | grep Runtime
 Runtimes: runc nvidia
 Default Runtime: nvidia
WARNING: No swap limit support

Device Pluginインストール

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml

kubectl get pods --all-namespaces
NAMESPACE       NAME                                        READY   STATUS      RESTARTS   AGE
cert-manager    cert-manager-7c6f78c46d-n8dkt               1/1     Running     1          29m
cert-manager    cert-manager-cainjector-668d9c86df-t2hzl    1/1     Running     1          29m
cert-manager    cert-manager-webhook-764b556954-qm6nf       1/1     Running     1          29m
ingress-nginx   ingress-nginx-admission-create-ln2ld        0/1     Completed   0          26m
ingress-nginx   ingress-nginx-admission-patch-qqpj4         0/1     Completed   0          26m
ingress-nginx   ingress-nginx-controller-55bc4f5576-bnk8j   1/1     Running     1          26m
kube-system     calico-kube-controllers-74b8fbdb46-t9qb2    1/1     Running     1          29m
kube-system     calico-node-v6qqg                           1/1     Running     1          30m
kube-system     coredns-558bd4d5db-4frhq                    1/1     Running     1          31m
kube-system     coredns-558bd4d5db-4rrr5                    1/1     Running     1          31m
kube-system     etcd-123456                                1/1     Running     1          31m
kube-system     kube-apiserver-123456                      1/1     Running     1          31m
kube-system     kube-controller-manager-123456             1/1     Running     1          31m
kube-system     kube-proxy-4nw5k                            1/1     Running     1          31m
kube-system     kube-scheduler-123456                      1/1     Running     1          31m
kube-system     nvidia-device-plugin-daemonset-9d646        1/1     Running     0          63s

Jenkins用アカウント作成

kubectl create clusterrolebinding default-admin --clusterrole cluster-admin --serviceaccount=default:default
clusterrolebinding.rbac.authorization.k8s.io/default-admin created
0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?