はじめに
Anthos clusters on bare metal の下記ネットワーク機能について試したので、実施した内容を記載する
- BGP 接続 : loadbalanceIP を BGP で外部広報を可能にし ECMP でのロードバランスも可能にする
- Egress NAT Gateway : 送信元 IP を固定化できる。マルチテナントで複数システムを入れる際に通信を IP で制御している環境だと有用だと思われる
使用機能について
今回検証する2機能について概要を記載する
BGP 接続
BGP 接続は、クラスタの外部 Border Gateway Protocol(eBGP)を介して ServiceType: LoadBalancer
の仮想 IP アドレス(VIP)のアドバタイズを実施できる
ノード所属の L2サブネットの IP アドレスの消費・制限を受けずに IP アドレスの拡張性が高い運用が可能になる
BGP Peer を複数持つことで、ECMP によるロードバランスすることも可能
eBGP のみサポートされる
Egress NAT Gateway
Egress NAT Gateway は、 クラスタ外部へ通信する際に Pod が発信方向 (Egress) で NAT で出ていく IP を指定できる
通常は Pod がいるノード IP で NAT されるため、マルチテナントなどにすると IP によるシステム識別がつかなくなるため、IP による通信制御をする環境では Egress NAT Gateway による IP 指定が有効
現在の制限 (v1.15) の抜粋したものが下記で、環境によっては運用設計上考慮が必要 (詳細はリンク先を参照)
- only enabled for IPv4 mode : IPv6 は非対応
- same Layer 2 domain with Node IP addresses : ノードと同じ L2サブネットを消費する
- Upgrades are not supported for clusters that have been configured to use the Preview of the egress NAT gateway. : プレビュー利用(Ubuntu18.04利用時)のクラスタはアップグレードサポートされてない
注記: 既存クラスタへの追加適用不可
どちらの機能もspec.clusterNetwork.advancedNetworking: true
の設定がクラスタ作成時に必要
この機能はデフォルトの Config には記載がなく設定されてないので、クラスタの再作成が必要になる。。
既存クラスタの yaml を更新してbmctl update
で適用しようとすると下記のようなエラーが出て適用できない
[2023-06-17 01:44:34+0000] exit with error
[2023-06-17 01:44:34+0000] Failed to update the cluster: failed to patch cluster usercluster1: admission webhook "vcluster.kb.io" denied the request: spec: Forbidden: Fields should be immutable.
そのため、本稿に記載するラボ構築ではクラスタの再構築から実施する
実施環境
自宅ラボで以前構築した環境ベースで実施する
Anthos clusters on bare metal のバージョンも本稿記載時(2023.06.18)の最新バージョンである v1.15.1 を使用する
構成
構成は事前構築の構成をベースとして、BGPの追加と合わせて IP アドレス設計の変更・追加を実施する
構築
構成で記載したパラメータをもとに、自宅ラボのルータ設定、機能を有効化したクラスタの再構築をする
ルータ設定 (EdgeRouterX での設定例)
事前にルータの BGP 設定をする
下記はこの環境での EdgeRouterX での設定例
set protocols bgp 65001 parameters router-id 192.168.129.254
set protocols bgp 65001 parameters graceful-restart stalepath-time 300
set protocols bgp 65001 neighbor 192.168.133.11 peer-group usercluster1
set protocols bgp 65001 neighbor 192.168.133.80 peer-group usercluster1
set protocols bgp 65001 neighbor 192.168.133.81 peer-group usercluster1
set protocols bgp 65001 neighbor 192.168.133.82 peer-group usercluster1
set protocols bgp 65001 neighbor 192.168.133.83 peer-group usercluster1
set protocols bgp 65001 neighbor 192.168.133.84 peer-group usercluster1
set protocols bgp 65001 neighbor 192.168.133.85 peer-group usercluster1
set protocols bgp 65001 neighbor 192.168.133.86 peer-group usercluster1
set protocols bgp 65001 neighbor 192.168.133.87 peer-group usercluster1
set protocols bgp 65001 network 192.168.133.0/24
set protocols bgp 65001 peer-group usercluster1 default-originate
set protocols bgp 65001 peer-group usercluster1 remote-as 65003
set protocols bgp 65001 peer-group usercluster1 address-family ipv6-unicast default-originate
set protocols bgp 65001 redistribute connected
set protocols bgp 65001 redistribute static
(入れ替え不要の場合はスキップ) 既存クラスタ削除
クラスタが既存にありパラメータを流用したい場合は、追加設定ができないためユーザクラスタの削除を実施する
bmctl reset --cluster CLUSTER_NAME --admin-kubeconfig ADMIN_KUBECONFIG_PATH
下記が実施した際のログ抜粋。10分程度時間がかかった
$ bmctl reset --cluster usercluster1 --admin-kubeconfig bmctl-workspace/admincluster/admincluster-kubeconfig
Please check the logs at bmctl-workspace/usercluster1/log/reset-20230617-054924/reset.log
[2023-06-17 05:49:25+0000] Waiting for reset jobs to finish...
[2023-06-17 05:49:35+0000] Resetting: 0 Completed: 0 Failed: 0
[2023-06-17 05:49:45+0000] Resetting: 0 Completed: 0 Failed: 0
[2023-06-17 05:49:55+0000] Resetting: 0 Completed: 0 Failed: 0
[2023-06-17 05:50:05+0000] Resetting: 1 Completed: 0 Failed: 0
...
[2023-06-17 05:59:05+0000] Resetting: 1 Completed: 0 Failed: 0
[2023-06-17 05:59:15+0000] Resetting: 0 Completed: 1 Failed: 0
[2023-06-17 05:59:15+0000] Deleting cluster... OK
[2023-06-17 05:59:35+0000] Deleting cluster namespace cluster-usercluster1...
[2023-06-17 05:59:35+0000] Flushing logs... OK
クラスタ構築
BGP でバンドルされたロードバランサを構成する 構成例 と 下り(外向き)NAT ゲートウェイを構成する を参考にユーザクラスタの YAML を作成する
下記は試験環境で使用した Config 例
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/reference/config-samples#user-basic
apiVersion: v1
kind: Namespace
metadata:
name: cluster-usercluster1
---
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
name: usercluster1
namespace: cluster-usercluster1
spec:
type: user
profile: default
anthosBareMetalVersion: 1.15.1
gkeConnect:
projectID: <クラウドプロジェクトID>
controlPlane:
nodePoolSpec:
nodes:
- address: 192.168.133.11
clusterNetwork:
pods:
cidrBlocks:
- 10.4.0.0/16
services:
cidrBlocks:
- 10.96.0.0/20
- "fd12::5:0/116"
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/how-to/egress-nat?hl=ja#create_the_networkgatewaygroup_custom_resource
advancedNetworking: true
loadBalancer:
mode: bundled
# type can be 'bgp' or 'layer2'. If no type is specified, we default to layer2.
type: bgp
# AS number for the cluster
localASN: 65003
bgpPeers:
- ip: 192.168.133.1
asn: 65001
controlPlaneNodes:
- 192.168.133.11
ports:
controlPlaneLBPort: 443
# When type=bgp, the VIPs are advertised over BGP
vips:
controlPlaneVIP: 192.168.134.1
ingressVIP: 192.168.134.2
addressPools:
- name: pool1
addresses:
# Each address must be either in the CIDR form (1.2.3.0/24)
# or range form (1.2.3.1-1.2.3.5).
- "192.168.134.0/24"
- "fd13::/120" # Note the additional IPv6 range
# # type can be 'bgp' or 'layer2'. If no type is specified, we default to layer2.
# type: layer2
# ports:
# controlPlaneLBPort: 443
# vips:
# controlPlaneVIP: 192.168.133.66
# ingressVIP: 192.168.133.129
# addressPools:
# - name: pool1
# addresses:
# - 192.168.133.129-192.168.133.142
# - "fd12::4:101-fd12::4:110"
clusterOperations:
projectID: <クラウドプロジェクトID>
location: asia-northeast1
storage:
lvpNodeMounts:
path: /mnt/localpv-disk
storageClassName: local-disks
lvpShare:
path: /mnt/localpv-share
storageClassName: local-shared
numPVUnderSharedPath: 5
nodeConfig:
podDensity:
maxPodsPerNode: 110
---
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
name: np1
namespace: cluster-usercluster1
spec:
clusterName: usercluster1
nodes:
- address: 192.168.133.21
# - address: 192.168.133.22
---
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/how-to/dual-stack-networking#fill_in_a_configuration_file
apiVersion: baremetal.cluster.gke.io/v1alpha1
kind: ClusterCIDRConfig
metadata:
name: "cluster-wide-ranges"
namespace: cluster-usercluster1
spec:
ipv4:
cidr: "10.4.0.0/16" # For island mode, must be the same as the Cluster CIDR.
perNodeMaskSize: 24
ipv6:
cidr: "fd12::1:0/112"
perNodeMaskSize: 120
---
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/how-to/lb-bundled-bgp?hl=ja#cluster-config
# https://cloud.google.com/anthos/clusters/docs/bare-metal/latest/how-to/egress-nat?hl=ja#create_the_networkgatewaygroup_custom_resource
apiVersion: networking.gke.io/v1
kind: NetworkGatewayGroup
metadata:
name: default
namespace: cluster-usercluster1
spec:
floatingIPs:
- 192.168.133.80
- 192.168.133.81
- 192.168.133.82
- 192.168.133.83
bmctl create cluster -c usercluster1 --kubeconfig bmctl-workspace/admincluster/admincluster-kubeconfig
下記は実施した際の出力結果抜粋
$ bmctl create cluster -c usercluster1 --kubeconfig bmctl-workspace/admincluster/admincluster-kubeconfig
Please check the logs at bmctl-workspace/usercluster1/log/create-cluster-20230617-072040/create-cluster.log
[2023-06-17 07:20:44+0000] Waiting for preflight check job to finish... OK
[2023-06-17 07:22:14+0000] - Validation Category: machines and network
[2023-06-17 07:22:14+0000] - [PASSED] 192.168.133.11
[2023-06-17 07:22:14+0000] - [PASSED] 192.168.133.11-gcp
[2023-06-17 07:22:14+0000] - [PASSED] 192.168.133.21
[2023-06-17 07:22:14+0000] - [PASSED] 192.168.133.21-gcp
[2023-06-17 07:22:14+0000] - [PASSED] gcp
[2023-06-17 07:22:14+0000] - [PASSED] node-network
[2023-06-17 07:22:14+0000] - [PASSED] pod-cidr
[2023-06-17 07:22:14+0000] Flushing logs... OK
[2023-06-17 07:22:14+0000] Applying resources for new cluster
[2023-06-17 07:22:14+0000] Waiting for cluster kubeconfig to become ready OK
[2023-06-17 07:26:04+0000] Writing kubeconfig file
[2023-06-17 07:26:04+0000] kubeconfig of cluster being created is present at bmctl-workspace/usercluster1/usercluster1-kubeconfig
[2023-06-17 07:26:04+0000] Please restrict access to this file as it contains authentication credentials of your cluster.
[2023-06-17 07:26:04+0000] Waiting for cluster to become ready OK
[2023-06-17 07:31:34+0000] Please run
[2023-06-17 07:31:34+0000] kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig get nodes
[2023-06-17 07:31:34+0000] to get cluster nodes status.
[2023-06-17 07:31:34+0000] Waiting for node pools to become ready OK
[2023-06-17 07:31:54+0000] Flushing logs... OK
以上でクラスタ構築完了 (Web console でアクセスする際は別途設定をする)
状態確認
構築したクラスタで今回設定変更した箇所の状態を確認する
BGP
コントロールプレーンの BGP 動作 Pod は下記のコマンドで確認できる
$ kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig get po -A | grep bgp
kube-system bgpadvertiser-admin01 1/1 Running 1 (8m37s ago) 10m
ワーカノードの BGP 状態は下記コマンドで確認可能 (参照)
$ kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig -n kube-system get bgpsessions
NAME LOCAL ASN PEER ASN LOCAL IP PEER IP STATE LAST REPORT
192-168-133-1-worker01 65003 65001 192.168.133.80 192.168.133.1 Established 2s
$ kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig -n kube-system get bgpadvertisedroutes
NAME PREFIX METRIC
default-gke-system-istio-ingress 192.168.134.32/32
$ kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig -n kube-system describe bgpsession 192-168-133-1-worker01
Name: 192-168-133-1-worker01
Namespace: kube-system
Labels: <none>
Annotations: <none>
API Version: networking.gke.io/v1
Kind: BGPSession
Metadata:
Creation Timestamp: 2023-06-17T07:31:45Z
Generation: 1
Owner References:
API Version: networking.gke.io/v1
Block Owner Deletion: true
Controller: true
Kind: BGPPeer
Name: 192-168-133-1
UID: 4cf81964-5214-497f-b1c4-e3b74b61c130
Resource Version: 16297
UID: 27318ab7-8371-4e49-a664-6f93c13c2709
Spec:
Floating IP: 192.168.133.80
Local ASN: 65003
Local IP: 192.168.133.80
Node Name: worker01
Peer ASN: 65001
Peer IP: 192.168.133.1
Status:
Advertised Routes:
192.168.134.32/32
Established At Least Once: true
Last Report Time: 2023-06-17T07:56:47Z
Received Routes:
10.2.0.0/20
10.2.16.0/20
172.18.0.0/24
172.18.1.0/24
172.18.2.0/28
172.18.3.0/24
172.18.5.1/32
172.20.0.0/24
172.20.10.0/24
192.168.129.0/24
192.168.133.0/24
192.168.134.1/32
199.36.153.4/30
199.36.153.8/30
State: Established
Events: <none>
ルータで BGP Peer 状態を見ると下記の通りで、adminノードと floating ip 1つと peering できていることを確認できる
$ show ip bgp summary
BGP router identifier 192.168.129.254, local AS number 65001
BGP table version is 23
3 BGP AS-PATH entries
0 BGP community entries
32 Configured ebgp ECMP multipath: Currently set at 32
1 Configured ibgp ECMP multipath: Currently set at 1
Neighbor V AS MsgRcv MsgSen TblVer InQ OutQ Up/Down State/PfxRcd
...
192.168.133.11 4 65003 33 35 23 0 0 00:09:27 1
192.168.133.80 4 65003 14 18 23 0 0 00:05:59 1
192.168.133.81 4 65003 0 0 0 0 0 never Active
192.168.133.82 4 65003 0 0 0 0 0 never Active
192.168.133.83 4 65003 0 0 0 0 0 never Active
...
また、ルーティングを確認するとコントロールプレーンの VIP と Ingress の VIP が BGP で広報を受けていることが確認できる
広報はプール IP も/32
でされるため、環境によってはここで集約するなどの対応が必要になる
$ show ip route | match 192.168.134
B *> 192.168.134.1/32 [20/0] via 192.168.133.11, eth4.300, 00:11:52
B *> 192.168.134.32/32 [20/0] via 192.168.133.11, eth4.300, 00:08:24
Floationg IP
Floating IP のアサイン状況は下記コマンドで確認できる (参照)
後に実施するノード追加の確認してわかったが、あるものが全てアサインされノード追加してもリアサインされなかったので最小数を floatingIPs に記載して必要に応じて追加する方が良さそう
$ kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig -n kube-system get NetworkGatewayGroup.networking.gke.io default -o yaml
apiVersion: networking.gke.io/v1
kind: NetworkGatewayGroup
metadata:
annotations:
baremetal.cluster.gke.io/managed: "true"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"networking.gke.io/v1","kind":"NetworkGatewayGroup","metadata":{"annotations":{"baremetal.cluster.gke.io/managed":"true"},"creationTimestamp":null,"name":"default","namespace":"kube-system"},"spec":{"floatingIPs":["192.168.133.80","192.168.133.81","192.168.133.82","192.168.133.83"]},"status":{}}
creationTimestamp: "2023-06-17T07:31:45Z"
generation: 1
name: default
namespace: kube-system
resourceVersion: "4220"
uid: a5755e9f-cc15-4a78-8cbf-98f09620a016
spec:
floatingIPs:
- 192.168.133.80
- 192.168.133.81
- 192.168.133.82
- 192.168.133.83
status:
floatingIPs:
192.168.133.80: worker01
192.168.133.81: worker01
192.168.133.82: worker01
192.168.133.83: worker01
nodes:
worker01: Up
動作確認
BGP の動作確認
クラスタへノード追加しても NetworkGatewayGroup の spec.floatingIPs
の既に IP がアサインされたものからリアサインされなかった
そのため、floatingIPs を追加して動作を確認する
...
apiVersion: networking.gke.io/v1
kind: NetworkGatewayGroup
metadata:
name: default
namespace: cluster-usercluster1
spec:
floatingIPs:
- 192.168.133.80
- 192.168.133.81
- 192.168.133.82
- 192.168.133.83
+ - 192.168.133.84
+ - 192.168.133.85
+ - 192.168.133.86
+ - 192.168.133.87
追加後に bmctl update
をして反映させる
bmctl update cluster -c usercluster1 --kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig
その後確認すると追加できていることが確認できた
$ kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig -n kube-system get NetworkGatewayGroup.networking.gke.io default -o yaml
apiVersion: networking.gke.io/v1
kind: NetworkGatewayGroup
metadata:
annotations:
baremetal.cluster.gke.io/managed: "true"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"networking.gke.io/v1","kind":"NetworkGatewayGroup","metadata":{"annotations":{"baremetal.cluster.gke.io/managed":"true"},"creationTimestamp":null,"name":"default","namespace":"kube-system"},"spec":{"floatingIPs":["192.168.133.80","192.168.133.81","192.168.133.82","192.168.133.83","192.168.133.84","192.168.133.85","192.168.133.86","192.168.133.87"]},"status":{}}
creationTimestamp: "2023-06-17T07:31:45Z"
generation: 2
name: default
namespace: kube-system
resourceVersion: "37130"
uid: a5755e9f-cc15-4a78-8cbf-98f09620a016
spec:
floatingIPs:
- 192.168.133.80
- 192.168.133.81
- 192.168.133.82
- 192.168.133.83
- 192.168.133.84
- 192.168.133.85
- 192.168.133.86
- 192.168.133.87
status:
floatingIPs:
192.168.133.80: worker01
192.168.133.81: worker01
192.168.133.82: worker01
192.168.133.83: worker01
192.168.133.84: worker02
192.168.133.85: worker02
192.168.133.86: worker02
192.168.133.87: worker02
nodes:
worker01: Up
worker02: Up
$ kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig -n kube-system get bgpsessions
NAME LOCAL ASN PEER ASN LOCAL IP PEER IP STATE LAST REPORT
192-168-133-1-worker01 65003 65001 192.168.133.80 192.168.133.1 Established 3s
192-168-133-1-worker02 65003 65001 192.168.133.84 192.168.133.1 Established 2s
$ kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig -n kube-system describe bgpsession 192-168-133-1-worker02
Name: 192-168-133-1-worker02
Namespace: kube-system
Labels: <none>
Annotations: <none>
API Version: networking.gke.io/v1
Kind: BGPSession
Metadata:
Creation Timestamp: 2023-06-17T08:38:47Z
Generation: 1
Owner References:
API Version: networking.gke.io/v1
Block Owner Deletion: true
Controller: true
Kind: BGPPeer
Name: 192-168-133-1
UID: 4cf81964-5214-497f-b1c4-e3b74b61c130
Resource Version: 41202
UID: e67d6800-b691-4135-8cba-3c789d1608ea
Spec:
Floating IP: 192.168.133.84
Local ASN: 65003
Local IP: 192.168.133.84
Node Name: worker02
Peer ASN: 65001
Peer IP: 192.168.133.1
Status:
Advertised Routes:
192.168.134.32/32
Established At Least Once: true
Last Report Time: 2023-06-17T08:45:56Z
Received Routes:
0.0.0.0/0
10.2.0.0/20
10.2.16.0/20
172.18.0.0/24
172.18.1.0/24
172.18.2.0/28
172.18.3.0/24
172.18.5.1/32
172.20.0.0/24
172.20.10.0/24
192.168.129.0/24
192.168.133.0/24
192.168.134.1/32
192.168.134.32/32
199.36.153.4/30
199.36.153.8/30
State: Established
Events: <none>
LoadbalancerIP 試験用 Pod 構築
nginx の pod 作成して type: LoadBalaner で公開して状態を見る
export USER1_KUBECONFIG=bmctl-workspace/usercluster1/usercluster1-kubeconfig
kubectl --kubeconfig $USER1_KUBECONFIG run nginx --image nginx
kubectl --kubeconfig $USER1_KUBECONFIG expose pod nginx --type=LoadBalancer --port=80
kubectl --kubeconfig $USER1_KUBECONFIG get svc
下記が実環境で実施した例。BGP 設定時に Pool IP に指定した IP レンジでEXTERNAL-IP
が設定できていることが確認でき、curlでの外部アクセスもできたことが確認できた。
$ kubectl --kubeconfig $USER1_KUBECONFIG run nginx --image nginx
kubectl --kubeconfig $USER1_KUBECONFIG expose pod nginx --type=LoadBalancer --port=80
pod/nginx created
service/nginx exposed
$ kubectl --kubeconfig $USER1_KUBECONFIG get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 85m
nginx LoadBalancer 10.96.6.187 192.168.134.33 80:30212/TCP 12s
$ curl $(kubectl --kubeconfig $USER1_KUBECONFIG get svc -o jsonpath='{.items[?(@.metadata.name=="nginx")].status.loadBalancer.ingress[0].ip}')
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
ルータから見ると loadbalancerIP の経路が追加されたことが確認できる
$ show ip route | match 192.168.134.
B *> 192.168.134.1/32 [20/0] via 192.168.133.11, eth4.300, 01:27:41
B *> 192.168.134.32/32 [20/0] via 192.168.133.11, eth4.300, 00:25:12
B *> 192.168.134.33/32 [20/0] via 192.168.133.11, eth4.300, 00:04:58
次に IPv6 についても同様の確認をするために、spec.ipFamilyPolicy: PreferDualStack
を追加して svc に IPv6 を追加する
kubectl --kubeconfig $USER1_KUBECONFIG patch service nginx -p '{"spec":{"ipFamilyPolicy: ": "PreferDualStack"}}'
下記が実環境の実施例で、同じく PoolIP から IPv6 が割り当てされアクセスできることが確認できた
$ kubectl --kubeconfig $USER1_KUBECONFIG patch service nginx -p '{"spec":{"ipFamilyPolicy": "PreferDualStack"}}'
service/nginx patched
$ kubectl --kubeconfig $USER1_KUBECONFIG get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 88m
nginx LoadBalancer 10.96.6.187 192.168.134.33,fd13:: 80:30212/TCP 3m
$ curl -g 'http://[fd13::]'
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
ルータでも確認可能
$ show ipv6 route | match fd13
B fd13::/128 [20/0] via fd12::2:11, eth4.300, 00:01:55
クラスタ側の広報ルート追加について下記コマンドで確認可能
(Status:
の Advertised Routes:
に192.168.134.33/32
とfd13::/128
がある)
$ kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig -n kube-system describe bgpsession 192-168-133-1-worker01
Name: 192-168-133-1-worker01
Namespace: kube-system
Labels: <none>
Annotations: <none>
API Version: networking.gke.io/v1
Kind: BGPSession
Metadata:
Creation Timestamp: 2023-06-17T07:31:45Z
Generation: 1
Owner References:
API Version: networking.gke.io/v1
Block Owner Deletion: true
Controller: true
Kind: BGPPeer
Name: 192-168-133-1
UID: 4cf81964-5214-497f-b1c4-e3b74b61c130
Resource Version: 57443
UID: 27318ab7-8371-4e49-a664-6f93c13c2709
Spec:
Floating IP: 192.168.133.80
Local ASN: 65003
Local IP: 192.168.133.80
Node Name: worker01
Peer ASN: 65001
Peer IP: 192.168.133.1
Status:
Advertised Routes:
192.168.134.32/32
192.168.134.33/32
fd13::/128
Established At Least Once: true
Last Report Time: 2023-06-17T09:13:27Z
Received Routes:
0.0.0.0/0
10.2.0.0/20
10.2.16.0/20
172.18.0.0/24
172.18.1.0/24
172.18.2.0/28
172.18.3.0/24
172.18.5.1/32
172.20.0.0/24
172.20.10.0/24
192.168.129.0/24
192.168.133.0/24
192.168.134.1/32
199.36.153.4/30
199.36.153.8/30
State: Established
Events: <none>
Egress NAT Gateway 動作確認
次に Egress NAT Gateway を使用して Pod からの送信元 IP の変更ができることを確認する
下記の通り busybox を構築して Ping を実施し続けて確認する
$ kubectl --kubeconfig $USER1_KUBECONFIG run busybox --image busybox:1.28 --command sleep 1000
$ kubectl --kubeconfig $USER1_KUBECONFIG exec -it busybox -- sh
/ # ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=111 time=7.835 ms
64 bytes from 8.8.8.8: seq=1 ttl=111 time=4.972 ms
64 bytes from 8.8.8.8: seq=2 ttl=111 time=5.128 ms
64 bytes from 8.8.8.8: seq=3 ttl=111 time=5.000 ms
...
ルータで送信元 IP がどうなっているか見ると192.168.133.22
(worker02のIP)であることが確認できる
$ show conntrack table ipv4 | match 8.8.8.8
2383452672 192.168.133.22 8.8.8.8 icmp [1] 29
下記の通り EgressNATPolicy を適用する
cat <<EOF > test-EgressNATPolicy.yaml
kind: EgressNATPolicy
apiVersion: networking.gke.io/v1
metadata:
name: egress
spec:
sources:
- podSelector:
matchLabels:
run: busybox
action: SNAT
destinations:
- cidr: 8.8.8.0/24
gatewayRef:
name: default
namespace: kube-system
EOF
kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig apply -f test-EgressNATPolicy.yaml
上記適用すると送信元 IP が 192.168.133.80
で NetworkGatewayGroup
で定義したfloatingIPs
のIPとなっていることが確認できた
$ show conntrack table ipv4 | match 8.8.8.8
2413224192 192.168.133.80 8.8.8.8 icmp [1] 29
下記のように NetworkGatewayGroup
を別名で作成して、 IP の割り当てを別名指定すると送信元 IP も変更できるかを確認する
kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig delete -f test-EgressNATPolicy.yaml
cat <<EOF > test-EgressNATPolicy.yaml
---
kind: NetworkGatewayGroup
apiVersion: networking.gke.io/v1
metadata:
namespace: kube-system
name: gateway1
spec:
floatingIPs:
- 192.168.133.100
---
kind: EgressNATPolicy
apiVersion: networking.gke.io/v1
metadata:
name: egress
spec:
sources:
- podSelector:
matchLabels:
run: busybox
action: SNAT
destinations:
- cidr: 8.8.8.0/24
gatewayRef:
name: gateway1
namespace: kube-system
EOF
kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig apply -f test-EgressNATPolicy.yaml
送信元 IP が gateway1
で定義した送信元 IP (192.168.133.100
) で実施できていることが確認できた
$ show conntrack table ipv4 | match 8.8.8.8 | grep 192.168.13
2269821184 192.168.133.100 8.8.8.8 icmp [1] 29
次に、ノード縮退時の動作を確認する
ラボでのアサイン状況は下記の通りで、 pod は worker02 にアサインされており、floatinigIP は worker01 にアサインされている
$ kubectl --kubeconfig $USER1_KUBECONFIG get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 91s 10.4.4.173 worker02 <none> <none>
$ kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig -n kube-system get NetworkGatewayGroup.networking.gke.io gateway1 -oyaml
apiVersion: networking.gke.io/v1
kind: NetworkGatewayGroup
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"networking.gke.io/v1","kind":"NetworkGatewayGroup","metadata":{"annotations":{},"name":"gateway1","namespace":"kube-system"},"spec":{"floatingIPs":["192.168.133.100"]}}
creationTimestamp: "2023-06-18T03:55:39Z"
generation: 3
name: gateway1
namespace: kube-system
resourceVersion: "604455"
uid: e4542acb-3ab4-4c70-a40f-fc9a0bbb69b4
spec:
floatingIPs:
- 192.168.133.100
status:
floatingIPs:
192.168.133.100: worker01
nodes:
worker01: Up
worker02: Up
worker01 を削除してステータス変化を確認する
...
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
name: np1
namespace: cluster-usercluster1
spec:
clusterName: usercluster1
nodes:
- - address: 192.168.133.21
- address: 192.168.133.22
...
$ bmctl update cluster -c usercluster1 --kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig
Please check the logs at bmctl-workspace/usercluster1/log/update-cluster-20230618-040508/update-cluster.log
$ kubectl --kubeconfig $USER1_KUBECONFIG get node
NAME STATUS ROLES AGE VERSION
admin01 Ready control-plane 20h v1.26.2-gke.1001
worker01 Ready,SchedulingDisabled worker 12m v1.26.2-gke.1001
worker02 Ready worker 12m v1.26.2-gke.1001
$ kubectl --kubeconfig $USER1_KUBECONFIG get node
NAME STATUS ROLES AGE VERSION
admin01 Ready control-plane 20h v1.26.2-gke.1001
worker02 Ready worker 13m v1.26.2-gke.1001
worker01 削除後は下記の通りで、floatingIP のアサインノードが worker02 へ移行した
※ busybox の ping は 0% packet loss
だった。セッション維持されるかまでは見てない
$ kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig -n kube-system get NetworkGatewayGroup.networking.gke.io gateway1 -oyaml
apiVersion: networking.gke.io/v1
kind: NetworkGatewayGroup
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"networking.gke.io/v1","kind":"NetworkGatewayGroup","metadata":{"annotations":{},"name":"gateway1","namespace":"kube-system"},"spec":{"floatingIPs":["192.168.133.100"]}}
creationTimestamp: "2023-06-18T03:55:39Z"
generation: 3
name: gateway1
namespace: kube-system
resourceVersion: "609198"
uid: e4542acb-3ab4-4c70-a40f-fc9a0bbb69b4
spec:
floatingIPs:
- 192.168.133.100
status:
floatingIPs:
192.168.133.100: worker02
nodes:
worker02: Up
以上で動作確認を終わりにした
まとめ
BGP での LoadBalancerIP の広報化 (ノード L2サブネットとの分離) と、
Egress NAT Gateway での Pod の送信元 IP のノード IP からの分離・システム別での IP アサイン(podSelectorなど)の方法が確認できた
Egress NAT Gateway の制限にノード IP と同じ L2 サブネット内の IP アドレスという限定があるので、拡張性・可搬性のために LoadBalancerIP と同じく BGP で広報できるようになることを期待したい
その他
BGP 接続にしてノードプール削除してしばらくすると API エンドポイントへアクセスができない
Kubernetes の API エンドポイントが何故かノードプール削除してしばらくすると広報されなくなった
adminノードの Static Pod が広報してるはずなので問題なさそうだが、/etc/kubernetes/
配下が空になっており Static Pod 自体も無くなっていた
原因は不明だが、発生時は下記でクラスタ自体の再作成・Cloud Console でのログインし直しを実施する
bmctl reset cluster -c usercluster1 --kubeconfig=bmctl-workspace/admincluster/admincluster-kubeconfig
bmctl create cluster -c usercluster1 --kubeconfig bmctl-workspace/admincluster/admincluster-kubeconfig
kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig apply -f cloud-console-reader.yaml
kubectl --kubeconfig bmctl-workspace/usercluster1/usercluster1-kubeconfig get secret cloud-console-reader-token -o jsonpath='{$.data.token}' | base64 --decode
参照