はじめに
Google Cloud が提供するオンプレ向け GKE(Kubernetes) Anthos clusters on bare metal を自宅ラボに入れてみたので記載する
クラスタの作成(IPv4/IPv6 DualStack)やクラウド側設定、作成後のノード増減のお試しや Pod 作成の確認を簡易的に行ったところまで記載する
最終的に Web Console では通常の GKE と同様の画面にオンプレのクラスタが表示され、クラウドのクラスタと一緒のインターフェースで運用が可能になる
Web Console: Kubernetes Engine > クラスタ
※種類は External
と表示される
実行環境・使用バージョン
実行環境である自宅ラボの使用機器、Anthos ソフトウェアバージョンなどを記載する
Network
項目 | Router | L2 Switch |
---|---|---|
機種 | EdgeRouterX | NETGEAR GS108T |
使用機能 | ルーティング, VLAN(SubIF) | Access/Trunk VLAN |
Compute
Intel NUC を使用した ESXi 環境上に VM を必要数作成して試験する
項目 | Node1 | Node2 |
---|---|---|
機種 | NUC8I5BEH | NUC12WSHi5 |
CPU |
i5-8259U 4core 8threads |
i5-1240P 12core(P:4,E:8) 16threads |
Memory | 32GiB DDR4-2400 | 64GiB DDR4-3200 |
Disk | 500GB | 1TB |
Hypervisor | ESXi8.0 | ESXi8.0 |
NIC | Onbord NIC 1000BASE-T USB NIC 1000BASE-T |
Onbord NIC 1000BASE-T USB NIC 1000BASE-T |
備考 | - | P/Eコア構成が非対応状態 |
ラボ環境でのホスト名 | NUC02 | NUC03 |
※ VM を設定する Ansible サーバは本構成外
※ ESXi 設定は別途実施済み (過去記載記事)
Software
現時点(2023.06.11)で最新の Anthos clusters on bare metal Version 1.15.1 / Kubernetes 1.26 (v1.26.2-gke.1001) (バージョンリスト) を使用する
構成
デプロイモデルはマルチクラスタ デプロイで試す
※ プロダクション想定前のお試しとしているためこの構成、最小はスタンドアローン構成がある
OS は全てUbuntu 22.04 LTS (対応 OS バージョン)
ノード種別 | ノード数 | スペック (cpu/memory/disk) | 設置ホスト |
---|---|---|---|
管理ワークステーション | 1 | 2core / 4GiB / 128 GiB (最小スペック) | Node01 |
管理クラスタ コントロール プレーン ノード | 1 | 4core / 16GiB / 128 GiB (最小スペック) | Node02 |
ユーザクラスタ コントロール プレーン ノード | 1 | 4core / 16GiB / 128 GiB (最小スペック) | Node02 |
ユーザクラスタ ワーカーノード | 1~2 | 4core / 16GiB / 128 GiB (最小スペック) | Node01,02 |
合計 | 5 | 18core / 52GiB / 512GiB | Node01 : 6core/20GiB/256GiB Node02 : 12core/48GiB/374GiB |
Google Cloud Project
事前に課金設定などの構築前準備済み。プロジェクト作成から Terraform を使用して実施をする。
ネットワーク・全体概要図
アドレスアサインやネットワーク構成の概要構成は下記の図の通り
L2/Router の設定 (VLAN や Gateway設定など) は本記載からは省略する
サーバ設定で利用する Ansible サーバは構成図からは省略している
構築 (Google Cloud)
ドキュメント(Set up Google Cloud resources)を参考にプロジェクト設定を実施する
プロジェクト作成以外のここでの設定は bmctl
ツールで自動設定も可能 (オプション --create-service-accounts
--enable-apis
)
ただし、ここではプロジェクト設定を IaC 化するため Terraform
で設定している (環境次第)
全体コードサンプル (参考.一部異なる) : https://github.com/suzuyu/terraform-public/releases/tag/v1.2.0
下記は一部抜粋
手動でプロジェクト作成をしておいてbmctlオプションでプロジェクト設定をする場合はここの設定はスキップ可能
# フォルダの作成
resource "google_folder" "folder" {
display_name = var.folder_name
parent = var.parent_folder_name
}
# プロジェクト作成
resource "google_project" "main" {
name = var.project_id
project_id = var.project_id
billing_account = var.billing_account
folder_id = google_folder.folder.name
}
# フォルダ IAM 設定
resource "google_folder_iam_binding" "admin" {
folder = google_folder.folder.name
for_each = toset([
"roles/storage.admin",
"roles/serviceusage.serviceUsageAdmin",
"roles/logging.admin",
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/installing/configure-sa#before_you_begin
"roles/compute.viewer",
"roles/iam.serviceAccountAdmin",
"roles/iam.securityAdmin",
"roles/iam.serviceAccountKeyAdmin",
"roles/serviceusage.serviceUsageAdmin",
"roles/gkeonprem.admin",
"roles/gkehub.viewer",
"roles/container.viewer",
# Monitoring 閲覧のため追加
"roles/monitoring.viewer",
])
role = each.value
members = [
join(":", ["group", var.admin_group_email]),
]
}
# プロジェクトの API 管理
resource "google_project_service" "main" {
project = google_project.main.id
disable_dependent_services = true
for_each = toset([
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/installing/configure-sa#enable_apis
"anthos.googleapis.com",
"anthosaudit.googleapis.com",
"anthosgke.googleapis.com",
"cloudresourcemanager.googleapis.com",
"connectgateway.googleapis.com",
"container.googleapis.com",
"gkeconnect.googleapis.com",
"gkehub.googleapis.com",
"gkeonprem.googleapis.com",
"iam.googleapis.com",
"logging.googleapis.com",
"monitoring.googleapis.com",
"opsconfigmonitoring.googleapis.com",
"serviceusage.googleapis.com",
"stackdriver.googleapis.com",
"storage.googleapis.com",
])
service = each.value
}
# サービスアカウント作成
## https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/installing/configure-sa#configure_service_accounts_manually
resource "google_service_account" "anthos-baremetal-gcr" {
account_id = "anthos-baremetal-gcr"
display_name = "anthos-baremetal-gcr"
project = google_project.main.name
}
resource "google_service_account" "anthos-baremetal-connect" {
account_id = "anthos-baremetal-connect"
display_name = "anthos-baremetal-connect"
project = google_project.main.name
}
resource "google_service_account" "anthos-baremetal-register" {
account_id = "anthos-baremetal-register"
display_name = "anthos-baremetal-register"
project = google_project.main.name
}
resource "google_service_account" "anthos-baremetal-cloud-ops" {
account_id = "anthos-baremetal-cloud-ops"
display_name = "anthos-baremetal-cloud-ops"
project = google_project.main.name
}
## https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/installing/configure-sa#bucket-sa
resource "google_service_account" "anthos-baremetal-snapshotupload" {
account_id = "anthos-baremetal-ssupload"
display_name = "anthos-baremetal-snapshotupload"
project = google_project.main.name
}
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/installing/configure-sa#configure_service_accounts_manually
resource "google_project_iam_binding" "gkehub_connect" {
project = google_project.main.id
role = "roles/gkehub.connect"
members = [
"serviceAccount:${google_service_account.anthos-baremetal-connect.email}",
]
}
resource "google_project_iam_binding" "gkehub_admin" {
project = google_project.main.id
role = "roles/gkehub.admin"
members = [
"serviceAccount:${google_service_account.anthos-baremetal-register.email}",
]
}
resource "google_project_iam_binding" "logging_logwriter" {
project = google_project.main.id
role = "roles/logging.logWriter"
members = [
"serviceAccount:${google_service_account.anthos-baremetal-cloud-ops.email}",
]
}
resource "google_project_iam_binding" "monitoring_metricwriter" {
project = google_project.main.id
role = "roles/monitoring.metricWriter"
members = [
"serviceAccount:${google_service_account.anthos-baremetal-cloud-ops.email}",
]
}
resource "google_project_iam_binding" "resourcemetadata_writer" {
project = google_project.main.id
role = "roles/stackdriver.resourceMetadata.writer"
members = [
"serviceAccount:${google_service_account.anthos-baremetal-cloud-ops.email}",
]
}
resource "google_project_iam_binding" "opsconfigmonitoring_resourcemetadata_writer" {
project = google_project.main.id
role = "roles/opsconfigmonitoring.resourceMetadata.writer"
members = [
"serviceAccount:${google_service_account.anthos-baremetal-cloud-ops.email}",
]
}
resource "google_project_iam_binding" "monitoring_dashboardeditor" {
project = google_project.main.id
role = "roles/monitoring.dashboardEditor"
members = [
"serviceAccount:${google_service_account.anthos-baremetal-cloud-ops.email}",
]
}
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/installing/configure-sa#bucket-sa
resource "google_project_iam_custom_role" "snapshotupload" {
role_id = "snapshotUpload"
title = "snapshotUpload"
description = "Anthos Baremetal Snapshot Upload"
permissions = ["storage.buckets.create", "storage.buckets.get", "storage.buckets.list", "storage.objects.create"]
}
resource "google_project_iam_binding" "snapshotupload" {
project = google_project.main.id
role = google_project_iam_custom_role.snapshotupload.id
members = [
"serviceAccount:${google_service_account.anthos-baremetal-snapshotupload.email}",
]
}
構築 (VM)
ドキュメント(Installation prerequisites overview 内の, Set up root SSH access to nodes, Configure Ubuntu) を参照して設定する
VM 構築
Ubuntu 22.04LTS をダウンロードして ESXi で VM を作成する
作成方法などはこちらにも記載したので省略
※ Ubuntu 22.04.1 LTS だと下記のような画面がリブート時に発生した。[OK]等で進められるがセキュアブートが有効だと毎回対応が必要になる。22.04.3 LTS ではこの問題は発生しなかった
管理ワークステーション VM 設定
ドキュメントを参考に下記を実施する
- docker のインストール (20.10.0 or higher)
- non-root 実行の設定
- Google Cloud SDK のインストール
- kubectl のインストール
- bmctl のインストール
docker, Google Cloud SDK, kubectl までは Ansible で実施し、それ以降は CLI で実施する
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/installing/workstation-prerequisites
# Docker-CE
- name: PreInstall Docker Ubuntu
apt:
pkg:
- apt-transport-https
- ca-certificates
- curl
- gnupg-agent
update_cache: yes
when:
- ansible_distribution == "Ubuntu"
become: yes
- name: Add Docker GPG apt Key
apt_key:
url: https://download.docker.com/linux/ubuntu/gpg
state: present
when:
- ansible_distribution == "Ubuntu"
become: yes
- name: Add Docker Repository
apt_repository:
repo: deb https://download.docker.com/linux/ubuntu jammy stable
state: present
when:
- ansible_distribution == "Ubuntu"
become: yes
- name: Install docker-ce
apt:
name: docker-ce>=20.10.0
update_cache: true
when:
- ansible_distribution == "Ubuntu"
become: yes
- name: Create docker group
group:
name: docker
state: present
become: yes
- name: Add user to the docker group
user:
name: "{{ item }}"
groups: docker
append: yes
with_items: "{{ bmctl_users }}"
become: yes
# Google Cloud SDK
- name: Add google cloud sdk GPG apt Key
apt_key:
url: https://packages.cloud.google.com/apt/doc/apt-key.gpg
state: present
when:
- ansible_distribution == "Ubuntu"
become: yes
- name: Add google cloud sdk Repository
apt_repository:
repo: deb https://packages.cloud.google.com/apt cloud-sdk main
state: present
when:
- ansible_distribution == "Ubuntu"
become: yes
- name: Install google-cloud-sdk
apt:
name: google-cloud-sdk
update_cache: true
when:
- ansible_distribution == "Ubuntu"
become: yes
# kubectl
- name: Add kubectl GPG apt Key
apt_key:
url: https://packages.cloud.google.com/apt/doc/apt-key.gpg
state: present
when:
- ansible_distribution == "Ubuntu"
become: yes
- name: Add kubectl Repository
apt_repository:
repo: deb https://apt.kubernetes.io/ kubernetes-xenial main
state: present
when:
- ansible_distribution == "Ubuntu"
become: yes
- name: Install kubectl
apt:
name: kubectl
update_cache: true
when:
- ansible_distribution == "Ubuntu"
become: yes
# Create SSH Keys
- name: generate SSH keys
openssh_keypair:
path: "/home/{{ item }}/.ssh/id_rsa"
type: rsa
size: 4096
state: present
force: no
owner: "{{ item }}"
with_items: "{{ bmctl_users }}"
become: yes
# get SSH Pub key
- name: Get SSH Pub key to /tmp
fetch:
src: "/home/{{ item }}/.ssh/id_rsa.pub"
dest: "/tmp/ansible_anthosbaremetal/"
with_items: "{{ bmctl_users }}"
become: yes
---
bmctl_users:
- suzuyu # ログインユーザ名
---
- hosts: adminws01
become: true
gather_facts: yes
roles:
- anthos-baremetal-ws
ansible-playbook playbook_anthosws.yml --ask-pass -Kk -u [管理ユーザ名]
上記 ansible 実行後に、管理ワークステーションへログインして bmctl
をダウンロードしてパスを通しておく
mkdir ~/anthos
mkdir ~/anthos/bin
cd ~/anthos
gcloud auth login --no-launch-browser
export ANTHOSVERSION=1.15.1
gsutil cp gs://anthos-baremetal-release/bmctl/$ANTHOSVERSION/linux-amd64/bmctl ~/anthos/bin/
chmod +x ~/anthos/bin/bmctl
echo 'export PATH="$PATH:$HOME/anthos/bin/"' >> ~/.bashrc
source ~/.bashrc
bmctl version
最後に下記が出力できれば OK
# bmctl version
[2023-06-09 10:08:36+0000] bmctl version: 1.15.1-gke.10, git commit: a452617fd520ed73bb1bc23254a8515d31ade553, build date: Fri May 26 16:55:09 PDT 2023
ノード VM 事前設定
マスターノード・ワーカーノードになるノード VM へ事前設定としてドキュメントを参考に下記を実施する
- Uncomplicated Firewall(UFW)を無効にする
- root SSH アクセスの許可 & ワークステーションの SSH 鍵登録
また、何度か運用しているとfailed to create fsnotify watcher: too many open filesPod
というエラーが出た
ググると sysctl でパラメータ設定が必要そうなことがわかったので、OpenShift のパラメータを参考に設定追加もしている
下記が ansible コード例
# Diasble UFW
- name: Disable UFW
ufw:
state: disabled
when:
- ansible_distribution == "Ubuntu"
become: yes
# Enable SSH Root Login
- name: Enable root Login
lineinfile:
dest: /etc/ssh/sshd_config
regexp: '^PermitRootLogin'
line: "PermitRootLogin yes"
state: present
backup: yes
become: yes
register: root_login_result
- name: restart sshd
when: root_login_result is changed
systemd:
name: sshd
state: restarted
become: yes
# SSH Key Registor adminws01 pub key to Nodes root
- name: Set authorized key taken from file
authorized_key:
user: root
state: present
key: "{{ lookup('file', '/tmp/ansible_anthosbaremetal/adminws01/home/{{ item }}/.ssh/id_rsa.pub') }}"
with_items: "{{ bmctl_users }}"
become: yes
# bm-system-create-cluster-xxx が failed to create fsnotify watcher: too many open となる対策
## 参考 https://github.com/openshift/cluster-node-tuning-operator/blob/release-4.15/assets/tuned/daemon/profiles/openshift-node/tuned.conf
- name: Sysctl set fs.inotify.max_user_instances
sysctl:
state: present
name: "fs.inotify.max_user_instances"
value: "8192"
sysctl_set: yes
reload: yes
become: yes
when:
- ansible_distribution == "Ubuntu"
- ansible_distribution_major_version == "22"
- name: Sysctl set fs.inotify.max_user_watches
sysctl:
state: present
name: "fs.inotify.max_user_watches"
value: "65536"
sysctl_set: yes
reload: yes
become: yes
when:
- ansible_distribution == "Ubuntu"
- ansible_distribution_major_version == "22"
- name: Sysctl set net.ipv4.tcp_fastopen
sysctl:
state: present
name: "net.ipv4.tcp_fastopen"
value: "3"
sysctl_set: yes
reload: yes
become: yes
when:
- ansible_distribution == "Ubuntu"
- ansible_distribution_major_version == "22"
IPv6 も今回実施するので設定する (下記は設定サンプル)
# /etc/netplan/
- name: Add /etc/netplan/99-manual-config.yaml
template:
src: 99-manual-config.yaml.j2
dest: /etc/netplan/99-manual-config.yaml
backup: yes
become: yes
register: change_netplan
- name: reboot when change netplan config
when: change_netplan is changed
reboot:
become: yes
network:
ethernets:
ens34:
addresses:
- "{{ ipv6 }}"
routes:
- to: default
via: "{{ ipv6_gateway }}"
version: 2
※ansible/hostsかhosts_varsなどに各ノードのipv6アドレス・ゲートウェイの定義をしておく
---
- hosts: anthos-baremetal-nodes
become: true
gather_facts: yes
roles:
- anthos-baremetal-nodes
- set-netplan-ipv6
VM の事前設定まで本 ansible 設定で完了
構築 (Anthos clusters on bare metal)
Kubernetes を構築する先のサーバ (今回は VM) の準備が前述で終わったので、
設定ファイル (IaC / yaml)を作成して、bmctl
で構築する
構成ファイルのフォーマット出力
ドキュメント を参照に構成ファイルのフォーマットを出力できる
cd ~/anthos
gcloud auth application-default login
export PROJECT_ID="YOUR_PROJECT_NAME"
export ADMIN_CLUSTER_NAME="admincluster"
bmctl create config -c $ADMIN_CLUSTER_NAME --project-id=$PROJECT_ID
cat bmctl-workspace/$ADMIN_CLUSTER_NAME/$ADMIN_CLUSTER_NAME.yaml
サービスアカウントキー作成
クラウドへ通信する機能ごとにサービスアカウントキーが必要になるため、事前にキーを作成しておく
(bmctl
の --create-service-accounts
オプションで自動作成・RBAC設定・キー生成も可能。今回はクラウド側は IaC するためこちらは手動で実施している)
mkdir ~/anthos/bmctl-workspace/.sa-keys
cd ~/anthos/bmctl-workspace/.sa-keys
gcloud --project $PROJECT_ID iam service-accounts keys create anthos-baremetal-cloud-ops.json --iam-account anthos-baremetal-cloud-ops@$PROJECT_ID.iam.gserviceaccount.com
gcloud --project $PROJECT_ID iam service-accounts keys create connect-agent.json --iam-account anthos-baremetal-connect@$PROJECT_ID.iam.gserviceaccount.com
gcloud --project $PROJECT_ID iam service-accounts keys create anthos-baremetal-gcr.json --iam-account anthos-baremetal-gcr@$PROJECT_ID.iam.gserviceaccount.com
gcloud --project $PROJECT_ID iam service-accounts keys create connect-register.json --iam-account anthos-baremetal-register@$PROJECT_ID.iam.gserviceaccount.com
gcloud --project $PROJECT_ID iam service-accounts keys create anthos-baremetal-ssupload.json --iam-account anthos-baremetal-ssupload@$PROJECT_ID.iam.gserviceaccount.com
anthos-baremetal-cloud-ops.json
anthos-baremetal-gcr.json
anthos-baremetal-ssupload.json
connect-agent.json
connect-register.json
管理クラスタの作成
構成ファイル作成
生成したフォーマットの記載変更でも良いが、サンプルファイルが用意されているのでそちらを参照に下記を作成した
(Admin Cluster 非HA サンプルフォーマット)
下記は本環境で使用した管理クラスタの yaml 例
(bmctl-workspace/[管理クラスタ名]/[管理クラスタ名].yamlに置く)
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/reference/config-samples#admin-basic
gcrKeyPath: bmctl-workspace/.sa-keys/anthos-baremetal-gcr.json
sshPrivateKeyPath: ../.ssh/id_rsa
gkeConnectAgentServiceAccountKeyPath: bmctl-workspace/.sa-keys/connect-agent.json
gkeConnectRegisterServiceAccountKeyPath: bmctl-workspace/.sa-keys/connect-register.json
cloudOperationsServiceAccountKeyPath: bmctl-workspace/.sa-keys/anthos-baremetal-cloud-ops.json
---
apiVersion: v1
kind: Namespace
metadata:
name: cluster-admincluster
---
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
name: admincluster
namespace: cluster-admincluster
spec:
type: admin
profile: default
anthosBareMetalVersion: 1.15.1
gkeConnect:
projectID: <クラウドプロジェクトID>
controlPlane:
nodePoolSpec:
nodes:
- address: 192.168.133.2
clusterNetwork:
pods:
cidrBlocks:
- 10.4.0.0/16
services:
cidrBlocks:
- 10.96.0.0/20
loadBalancer:
mode: bundled
ports:
controlPlaneLBPort: 443
vips:
controlPlaneVIP: 192.168.133.65
clusterOperations:
projectID: <クラウドプロジェクトID>
location: asia-northeast1
storage:
lvpNodeMounts:
path: /mnt/localpv-disk
storageClassName: local-disks
lvpShare:
path: /mnt/localpv-share
storageClassName: local-shared
numPVUnderSharedPath: 5
nodeConfig:
podDensity:
maxPodsPerNode: 250
下記で実行
cd ~/anthos/
bmctl create cluster -c admincluster
下記出力例
suzuyu@adminws01:~/anthos$ bmctl create cluster -c admincluster
Please check the logs at bmctl-workspace/admincluster/log/create-cluster-20230611-034021/create-cluster.log
[2023-06-11 03:40:27+0000] Creating bootstrap cluster... OK
[2023-06-11 03:42:05+0000] Installing dependency components... ⠼ W0611 03:42:57.443254 74119 schema.go:149] unexpected field validation directive: validator, skipping validation
[2023-06-11 03:42:05+0000] Installing dependency components... OK
[2023-06-11 03:43:45+0000] Waiting for preflight check job to finish... OK
[2023-06-11 03:46:55+0000] - Validation Category: machines and network
[2023-06-11 03:46:55+0000] - [PASSED] node-network
[2023-06-11 03:46:55+0000] - [PASSED] pod-cidr
[2023-06-11 03:46:55+0000] - [PASSED] 192.168.133.2
[2023-06-11 03:46:55+0000] - [PASSED] 192.168.133.2-gcp
[2023-06-11 03:46:55+0000] - [PASSED] gcp
[2023-06-11 03:46:55+0000] Flushing logs... OK
[2023-06-11 03:46:56+0000] Applying resources for new cluster
[2023-06-11 03:46:57+0000] Waiting for cluster kubeconfig to become ready OK
[2023-06-11 03:58:47+0000] Writing kubeconfig file
[2023-06-11 03:58:47+0000] kubeconfig of cluster being created is present at bmctl-workspace/admincluster/admincluster-kubeconfig
[2023-06-11 03:58:47+0000] Please restrict access to this file as it contains authentication credentials of your cluster.
[2023-06-11 03:58:47+0000] Waiting for cluster to become ready OK
[2023-06-11 04:04:47+0000] Please run
[2023-06-11 04:04:47+0000] kubectl --kubeconfig bmctl-workspace/admincluster/admincluster-kubeconfig get nodes
[2023-06-11 04:04:47+0000] to get cluster nodes status.
[2023-06-11 04:04:47+0000] Waiting for node pools to become ready OK
[2023-06-11 04:05:07+0000] Waiting for metrics to become ready in GCP OK
[2023-06-11 04:05:17+0000] Waiting for cluster API provider to install in the created admin cluster OK
[2023-06-11 04:05:27+0000] Moving admin cluster resources to the created admin cluster
[2023-06-11 04:05:32+0000] Waiting for node update jobs to finish OK
[2023-06-11 04:08:52+0000] Flushing logs... OK
[2023-06-11 04:08:52+0000] Deleting bootstrap cluster... OK
ユーザクラスタの作成
ユーザクラスタのサンプルファイルをを元に構成ファイルを作成する
mkdir ~/anthos/bmctl-workspace/usercluster1
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/reference/config-samples#user-basic
apiVersion: v1
kind: Namespace
metadata:
name: cluster-usercluster1
---
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
name: usercluster1
namespace: cluster-usercluster1
spec:
type: user
profile: default
anthosBareMetalVersion: 1.15.1
gkeConnect:
projectID: <クラウドプロジェクトID>
controlPlane:
nodePoolSpec:
nodes:
- address: 192.168.133.11
clusterNetwork:
pods:
cidrBlocks:
- 10.4.0.0/16
services:
cidrBlocks:
- 10.96.0.0/20
- "fd12::5:0/116"
loadBalancer:
mode: bundled
ports:
controlPlaneLBPort: 443
vips:
controlPlaneVIP: 192.168.133.66
ingressVIP: 192.168.133.129 # IngressVIP must be included in load balancer address pools
addressPools:
- name: pool1
addresses:
- 192.168.133.129-192.168.133.142
- "fd12::4:101-fd12::4:110"
clusterOperations:
projectID: <クラウドプロジェクトID>
location: asia-northeast1
storage:
lvpNodeMounts:
path: /mnt/localpv-disk
storageClassName: local-disks
lvpShare:
path: /mnt/localpv-share
storageClassName: local-shared
numPVUnderSharedPath: 5
nodeConfig:
podDensity:
maxPodsPerNode: 110
---
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
name: np1
namespace: cluster-usercluster1
spec:
clusterName: usercluster1
nodes:
- address: 192.168.133.21
- address: 192.168.133.22
---
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/how-to/dual-stack-networking#fill_in_a_configuration_file
apiVersion: baremetal.cluster.gke.io/v1alpha1
kind: ClusterCIDRConfig
metadata:
name: "cluster-wide-ranges"
namespace: cluster-usercluster1
spec:
ipv4:
cidr: "10.4.0.0/16" # For island mode, must be the same as the Cluster CIDR.
perNodeMaskSize: 24 # must be equal to the IPv4 PerNodeMaskSize
ipv6:
cidr: "fd12::1:0/112"
perNodeMaskSize: 120
bmctl
でユーザクラスタを作成する。--kubeconfig
で管理クラスタ作成時に生成される管理クラスタの kubeconfig を指定する。
bmctl create cluster -c usercluster1 --kubeconfig bmctl-workspace/admincluster/admincluster-kubeconfig
下記は本環境での出力例 (途中エラーがあり結果結合しているので実際は30分程度で終わると思われる)
suzuyu@adminws01:~/anthos$ bmctl create cluster -c usercluster1 --kubeconfig bmctl-workspace/admincluster/admincluster-kubeconfig
Please check the logs at bmctl-workspace/usercluster1/log/create-cluster-20230611-043329/create-cluster.log
[2023-06-11 04:33:32+0000] Waiting for preflight check job to finish... OK
[2023-06-11 04:35:02+0000] - Validation Category: machines and network
[2023-06-11 04:35:02+0000] - [PASSED] 192.168.133.11
[2023-06-11 04:35:02+0000] - [PASSED] 192.168.133.11-gcp
[2023-06-11 04:35:02+0000] - [PASSED] 192.168.133.21-gcp
[2023-06-11 04:35:02+0000] - [PASSED] 192.168.133.22-gcp
[2023-06-11 04:35:02+0000] - [PASSED] node-network
[2023-06-11 04:35:02+0000] - [PASSED] 192.168.133.21
[2023-06-11 04:35:02+0000] - [PASSED] 192.168.133.22
[2023-06-11 04:35:02+0000] - [PASSED] gcp
[2023-06-11 04:35:02+0000] - [PASSED] pod-cidr
[2023-06-11 04:35:02+0000] Flushing logs... OK
[2023-06-11 04:35:02+0000] Applying resources for new cluster
[2023-06-11 04:35:03+0000] Waiting for cluster kubeconfig to become ready OK
[2023-06-11 04:47:03+0000] Writing kubeconfig file
[2023-06-11 04:47:03+0000] kubeconfig of cluster being created is present at bmctl-workspace/usercluster1/usercluster1-kubeconfig
[2023-06-11 04:47:03+0000] Please restrict access to this file as it contains authentication credentials of your cluster.
[2023-06-11 09:48:47+0000] Waiting for cluster to become ready OK
[2023-06-11 09:48:57+0000] Please run
[2023-06-11 09:48:57+0000] kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig get nodes
[2023-06-11 09:48:57+0000] to get cluster nodes status.
[2023-06-11 09:48:57+0000] Waiting for node pools to become ready OK
[2023-06-11 09:49:17+0000] Flushing logs... OK
以上でクラスタ作成完了
Web Console からアクセス
Web Console からアクセスをするため署名なしトークンを作成する
下記連携用のサービスアカウント・ロール設定のファイル生成例 (参照元のマニュフェストや一部コマンドで作成したものや変数を一部固定化したもの)
cd ~/anthos
cat <<EOF > cloud-console-reader.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cloud-console-reader
rules:
- apiGroups: [""]
resources: ["nodes", "persistentvolumes", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
creationTimestamp: null
name: cloud-console-reader
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: null
name: cloud-console-reader-view
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: view
subjects:
- kind: ServiceAccount
name: cloud-console-reader
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: null
name: cloud-console-reader
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cloud-console-reader
subjects:
- kind: ServiceAccount
name: cloud-console-reader
namespace: default
---
apiVersion: v1
kind: Secret
metadata:
name: "cloud-console-reader-token"
annotations:
kubernetes.io/service-account.name: "cloud-console-reader"
type: kubernetes.io/service-account-token
EOF
作成したマニフェストを各クラスタに適用してサービスアカウントのシークレットを取得する
cd ~/anthos
export ADMIN_KUBECONFIG=bmctl-workspace/admincluster/admincluster-kubeconfig
export USER1_KUBECONFIG=bmctl-workspace/usercluster1/usercluster1-kubeconfig
kubectl --kubeconfig $ADMIN_KUBECONFIG apply -f cloud-console-reader.yaml
kubectl --kubeconfig $USER1_KUBECONFIG apply -f cloud-console-reader.yaml
kubectl --kubeconfig $ADMIN_KUBECONFIG get secret cloud-console-reader-token -o jsonpath='{$.data.token}' | base64 --decode
kubectl --kubeconfig $USER1_KUBECONFIG get secret cloud-console-reader-token -o jsonpath='{$.data.token}' | base64 --decode
最後の2コマンドで出力されるトークンをメモして次に進む
取得したサービスアカウントのトークンを使用して各クラスタへログインする
ログインが終わると下記のように警告(オレンジ)になっていたところが正常(グリーン)になりクラスタ情報が表示されるようになる
GKE 画面でオンプレクラスタの表示も可能になる
以上でクラスタと Google Cloud の連携設定も今回で試す分は完了
ノード運用
ノード状態の確認やノード数の削減・ノードプール削除の方法を試したので内容を記載する
ノード状態
Anthos > 概要 で全体の CPU 使用率やメモリ使用率などが確認できる
Monitoring > ダッシュボード でノードのメトリクスを表示可能
ロギング > ログ エクスプローラ でノードのログを閲覧・クエリが可能
ノード数削減
ドキュメント参考にノードの削減を実施する
下記のように config ファイル内のkind: NodePool
リソースの IP を一つコメントアウトする
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
name: np1
namespace: cluster-usercluster1
spec:
clusterName: usercluster1
nodes:
- address: 192.168.133.21
# - address: 192.168.133.22
適用する
cd ~/anthos
bmctl update cluster -c usercluster1 \
--kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig
下記適用時の出力例。しばらくするとノードが外される
suzuyu@adminws01:~/anthos$ bmctl update cluster -c usercluster1 \
--kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig
Please check the logs at bmctl-workspace/usercluster1/log/update-cluster-20230611-104830/update-cluster.log
[2023-06-11 10:48:31+0000] Deleting bootstrap cluster...
suzuyu@adminws01:~/anthos$ kubectl --kubeconfig $USER1_KUBECONFIG get node
NAME STATUS ROLES AGE VERSION
admin01 Ready control-plane 6h v1.26.2-gke.1001
worker01 Ready worker 5h58m v1.26.2-gke.1001
worker02 NotReady,SchedulingDisabled worker 5h58m v1.26.2-gke.1001
suzuyu@adminws01:~/anthos$ kubectl --kubeconfig $USER1_KUBECONFIG get node
NAME STATUS ROLES AGE VERSION
admin01 Ready control-plane 6h2m v1.26.2-gke.1001
worker01 Ready worker 5h59m v1.26.2-gke.1001
適用が完了すると Cloud Console でも削減されたことが確認できる (ノード数が 2 から 1 になった)
ノードプール削除
ノードプール自体を削除する場合は下記コマンドで実行可能
kubectl --kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig get nodepools.baremetal.cluster.gke.io -n cluster-usercluster1
kubectl --kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig delete nodepool np1 -n cluster-usercluster1
suzuyu@adminws01:~/anthos$ kubectl --kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig get nodepools.baremetal.cluster.gke.io -n cluster-usercluster1
NAME READY RECONCILING STALLED UNDERMAINTENANCE UNKNOWN
np1 1 0 0 0 0
usercluster1 1 0 0 0 0
suzuyu@adminws01:~/anthos$ kubectl --kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig delete nodepool np1 -n cluster-usercluster1
nodepool.baremetal.cluster.gke.io "np1" deleted
suzuyu@adminws01:~/anthos$ kubectl --kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig get nodepools.baremetal.cluster.gke.io -n cluster-usercluster1
NAME READY RECONCILING STALLED UNDERMAINTENANCE UNKNOWN
usercluster1 1 0 0 0 0
suzuyu@adminws01:~/anthos$
適用完了後には Cloud Console ではノード数が見えなくなる
戻すにはノードプールを追加するか、bmctl update cluster
を実施する
suzuyu@adminws01:~/anthos$ bmctl update cluster -c usercluster1 --kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig
Please check the logs at bmctl-workspace/usercluster1/log/update-cluster-20230611-111351/update-cluster.log
[2023-06-11 11:13:52+0000] Deleting bootstrap cluster...
suzuyu@adminws01:~/anthos$ kubectl --kubeconfig $USER1_KUBECONFIG get node
NAME STATUS ROLES AGE VERSION
admin01 Ready control-plane 6h27m v1.26.2-gke.1001
worker01 Ready worker 54s v1.26.2-gke.1001
Pod デプロイ試験
nginx の pod を構築して LoadbalancerIP でアクセスできることを確認する
kubectl --kubeconfig $USER1_KUBECONFIG run nginx --image nginx
kubectl --kubeconfig $USER1_KUBECONFIG expose pod nginx --type=LoadBalancer --port=80
kubectl --kubeconfig $USER1_KUBECONFIG get svc
curl $(kubectl --kubeconfig $USER1_KUBECONFIG get svc -o jsonpath='{.items[?(@.metadata.name=="nginx")].status.loadBalancer.ingress[0].ip}')
次に Service を DualStack に変更する
kubectl --kubeconfig $USER1_KUBECONFIG patch service nginx -p '{"spec":{"ipFamilyPolicy": "PreferDualStack"}}'
下記の通り、DualStack 設定に変更するとEXTERNAL-IP'に IPv6 アドレスが追加されてその IP でアクセスすると
Welcom to nginx!`が返ってくることが確認できる
suzuyu@adminws01:~/anthos$ kubectl --kubeconfig $USER1_KUBECONFIG get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 7h2m
nginx LoadBalancer 10.96.13.209 192.168.133.130 80:31248/TCP 13s
suzuyu@adminws01:~/anthos$ kubectl --kubeconfig $USER1_KUBECONFIG patch service nginx -p '{"spec":{"ipFamilyPolicy": "PreferDualStack"}}'
service/nginx patched
suzuyu@adminws01:~/anthos$ kubectl --kubeconfig $USER1_KUBECONFIG get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 7h29m
nginx LoadBalancer 10.96.13.209 192.168.133.130,fd12::4:101 80:31248/TCP 26m
suzuyu@admincl01:~$ curl -g 'http://[fd12::4:101]'
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Cloud Console でも Pod や Service の状態が下記のように確認できる
下記で cleanup する
kubectl --kubeconfig $USER1_KUBECONFIG delete svc nginx
kubectl --kubeconfig $USER1_KUBECONFIG delete po nginx
以上で簡易試験完了
(追記) Policy Controller
クラスタリソース全体でベストプラクティスの適用、業界基準の遵守、などについてチェック (ドライラン)や制限をすることができ、下記のダッシュポード画面で表示できる
設定するには上記画面 (「Kubernetes Engine」 > 「ポリシー」)の「+POLICY CONTROLLER のインストール」で設定する
Config Sync は実施しない場合はチェックを外す (今回は実施してない)
最後に「完了」をクリックして適用する
次に Kubernetes クラスタへドキュメントを参照して設定を適用する
下記はCIS Kubernetes Benchmark
,Pod Security Standards Baseline
, Pod Security Standards Restricted
, Anthos Service Mesh セキュリティ
, Policy Essentials
, NIST 800 r5
のチェック (ドライラン) 適用した例
export KUBECONFIG=$HOME/anthos/bmctl-workspace/usercluster1/usercluster1-kubeconfig
cd ~/anthos/
cat <<EOF > policycontroller-config.yaml
apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
name: config
namespace: "gatekeeper-system"
spec:
sync:
syncOnly:
- group: ""
version: "v1"
kind: "Namespace"
- group: "networking.k8s.io"
version: "v1"
kind: "NetworkPolicy"
- group: "apps"
version: "v1"
kind: "DaemonSet"
- group: "configsync.gke.io"
version: "v1beta1"
kind: "RootSync"
- group: "storage.k8s.io"
version: "v1"
kind: "StorageClass"
- group: "admissionregistration.k8s.io"
version: "v1"
kind: "ValidatingWebhookConfiguration"
- group: "security.istio.io"
version: "v1beta1"
kind: "AuthorizationPolicy"
- group: "security.istio.io"
version: "v1beta1"
kind: "PeerAuthentication"
EOF
kubectl apply -f policycontroller-config.yaml
kubectl apply -k https://github.com/GoogleCloudPlatform/acm-policy-controller-library.git/bundles/cis-k8s-v1.5.1
kubectl apply -k https://github.com/GoogleCloudPlatform/acm-policy-controller-library.git/bundles/policy-essentials-v2022
kubectl apply -k https://github.com/GoogleCloudPlatform/acm-policy-controller-library.git/bundles/pss-baseline-v2022
kubectl apply -k https://github.com/GoogleCloudPlatform/acm-policy-controller-library.git/anthos-bundles/pss-restricted-v2022
kubectl apply -k https://github.com/GoogleCloudPlatform/acm-policy-controller-library.git/anthos-bundles/nist-800-r5
kubectl apply -k https://github.com/GoogleCloudPlatform/acm-policy-controller-library.git/bundles/asm-policy-v0.0.1
適用が完了するとダッシュボードへ表示され、「違反」タブをクリックすると下記のようにどこで違反しているかも確認できる
下記は「cis-k8s-v1.5.1-prohibit-role-wildcard-access」をクリックして違反内容を表示した例
おわりに
自宅ラボに GKE クラスタ (Anthos Clusters on bare metal) を作成することができた
ネットワーク機能についてはこちらに記載もしている
その他
費用対策
料金表によると単価は月単位だと$24/vCPU
、時間単位だと$0.03288/vCPU
(オンプレミス オプションの場合は)管理クラスタとマスターノードの両方を除いたもの
とあるので、ユーザクラスタのワーカノードの vCPU 数課金となる
よって、ユーザクラスタのワーカノード1台で 4vCPU (最小値) で毎月13,700円(24h * 31day * 4vCPU * 0.03288/vCPU * 140ドル/円)前後の費用がかかる
なので、使用しないときはノードプールを削除して vCPU 0 にしておき、必要時だけノードを追加するようにする
これで土日日中(12h)での自宅ラボ利用では 12h * 8day * 4vCPU * 0.03288/vCPU * 140ドル/円 = 約1,800円 に抑えられる想定
cd ~/anthos
kubectl --kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig delete nodepool np1 -n cluster-usercluster1
cd ~/anthos
bmctl update cluster -c usercluster1 --kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig
ユーザクラスタのノードプール削除した後(ここでは6/11)に費用が発生しなくなったことを確認した
adminクラスタ・ノードは継続してフリート登録もしたままだが、Anthos の費用は発生しない
構築時 failed preflight check 対応
Ubuntu22.04LTS をデフォルトのまま作成すると 64GB 残してパーティションが作成されてしまったので、そのような場合は lvm 対応を実施する
下記サーバのディスクサイズが足りない場合に出る bmctl
エラー出力
[2023-06-11 03:03:15+0000] - [FAILED] 192.168.133.2 (log: bmctl-workspace/admincluster/log/preflight-20230611-025326/192.168.133.2)
[2023-06-11 03:03:15+0000] - following are the error messages for each failed preflight check {'check_disks_pass': 'isControl=true\nkubelet directory /var/lib/kubelet needs 524288KB and is backed by /\nroot directory / needs 17825792KB and is backed by /\netcd directory /var/lib/etcd needs 20971520KB and is backed by /\ncontainerd directory /var/lib/containerd needs 31457280KB and is backed by /\nDevice / does not have enough free space. Found 53544036KB, required: 70778880KB.'}
[2023-06-11 03:03:15+0000] - following are the error messages for each failed preflight check {'check_disks_pass': 'isControl=true\nkubelet directory /var/lib/kubelet needs 524288KB and is backed by /\nroot directory / needs 17825792KB and is backed by /\netcd directory /var/lib/etcd needs 20971520KB and is backed by /\ncontainerd directory /var/lib/containerd needs 31457280KB and is backed by /\nDevice / does not have enough free space. Found 53526916KB, required: 70778880KB.'}
[2023-06-11 03:03:15+0000] - following are the error messages for each failed preflight check {'check_disks_pass': 'isControl=true\nkubelet directory /var/lib/kubelet needs 524288KB and is backed by /\nroot directory / needs 17825792KB and is backed by /\netcd directory /var/lib/etcd needs 20971520KB and is backed by /\ncontainerd directory /var/lib/containerd needs 31457280KB and is backed by /\nDevice / does not have enough free space. Found 53526860KB, required: 70778880KB.'}
下記コマンドで空き容量の追加・リサイズを実施する
sudo lvextend -l +100%FREE /dev/mapper/ubuntu--vg-ubuntu--lv
sudo resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv
下記実施での出力例
suzuyu@admincl01:~$ sudo lvs
[sudo] password for suzuyu:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
ubuntu-lv ubuntu-vg -wi-ao---- 62.47g
suzuyu@admincl01:~$ sudo vgs
VG #PV #LV #SN Attr VSize VFree
ubuntu-vg 1 1 0 wz--n- <124.95g 62.47g
suzuyu@admincl01:~$ sudo lvextend -L +62.47G /dev/mapper/ubuntu--vg-ubuntu--lv
Rounding size to boundary between physical extents: 62.47 GiB.
Size of logical volume ubuntu-vg/ubuntu-lv changed from 62.47 GiB (15993 extents) to <124.95 GiB (31986 extents).
Logical volume ubuntu-vg/ubuntu-lv successfully resized.
suzuyu@admincl01:~$ sudo lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
ubuntu-lv ubuntu-vg -wi-ao---- <124.95g
suzuyu@admincl01:~$ sudo vgs
VG #PV #LV #SN Attr VSize VFree
ubuntu-vg 1 1 0 wz--n- <124.95g 0
suzuyu@admincl01:~$ sudo resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv
resize2fs 1.46.5 (30-Dec-2021)
Filesystem at /dev/mapper/ubuntu--vg-ubuntu--lv is mounted on /; on-line resizing required
old_desc_blocks = 8, new_desc_blocks = 16
The filesystem on /dev/mapper/ubuntu--vg-ubuntu--lv is now 32753664 (4k) blocks long.
参照