背景:
pod間でデータを共有したいときに、AWSではAmazon Elastic Block Store (Amazon EBS)のNFSをMountすれば良いが、GCPにはpodごとにPersistent DiskをMountすることがしかできない。 GCPのNFSとしてはCloud Filestoreあるんですが、The minimum Standard tier instance size is 1 terabyte (TB)と書いてあるので、非常に値段が高い。なので、Ceph分散ストレージをGKEにdeployするのが選択肢になった。そこでCephのdepolyは非常に面倒なので、Cloud NativeのRookを活かす。
さらに Taint/Toleration とNode selectorを用いて、指定したNode poolにCeph ClusterをDeployし、
他のPodがCeph Clusterに侵入しないようにDeployした。
手順:
もしすでに本記事の詳細部分を見ましたら、以下のコマンドで便利に組み立てる
====================================================
1.
helm install --namespace rook-ceph-system --name rook-ceph rook-stable/rook-ceph -f values.yaml
kubectl --namespace rook-ceph-system get pods -l "app=rook-ceph-operator"
kubectl get po -n rook-ceph-system
2.
kubectl apply -f cluster.yaml
###rook-ceph-mon-* のSTATUSがCrashLoopBackOffのときに VMにsshでloginし sudo rm -rf /var/lib/rookを実行してください ###
kubectl get po -n rook-ceph -o wide
kubectl logs `kubectl get pod -n rook-ceph-system | grep -F -i 'rook-ceph-operator' | awk '{print $1}'` -n rook-ceph-system
3.
kubectl apply -f filesystem.yaml
###CephFilesystemのpodを立ち上がるまでに3~5分かかる###
kubectl get po -n rook-ceph
4.
cat <<EOF > testfs.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
volumeMounts:
- name: nginx-data
mountPath: /usr/share/nginx/html
volumes:
- name: nginx-data
flexVolume:
driver: ceph.rook.io/rook
fsType: ceph
options:
fsName: myfs
clusterNamespace: rook-ceph
EOF
kubectl apply -f testfs.yaml
kubectl delete -f testfs.yaml
kubectl delete -f filesystem.yaml
kubectl delete -f cluster.yaml
kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge
kubectl delete ns rook-ceph
kubectl delete ns rook-ceph-system
helm del rook-ceph --purge
====================================================
以下は詳細
I Helmを用いてRook OperatorのDeploy
cat <<EOF > values.yaml
nodeSelector:
role: app-node
tolerations:
- key: node-type
operator: Equal
value: app-node
effect: "NoSchedule"
agent:
flexVolumeDirPath:
/home/kubernetes/flexvolume
toleration:
"NoSchedule"
discover:
toleration:
"NoSchedule"
EOF
helm install --namespace rook-ceph-system --name rook-ceph rook-stable/rook-ceph -f values.yaml
II. Node AffinityとNode Taintsを用いてCeph ClusterのDeploy
ここのspec.placement.all.nodeAffinityはosd(object storage deamon),mon,mgrがどの物理nodeに優先deploy意味であり、物理nodeのtoleration(values.yaml)と必ず整合性を取ってください
cluster.yaml
apiVersion: v1
kind: Namespace
metadata:
name: rook-ceph
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rook-ceph-osd
namespace: rook-ceph
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rook-ceph-mgr
namespace: rook-ceph
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-osd
namespace: rook-ceph
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: [ "get", "list", "watch", "create", "update", "delete" ]
---
# Aspects of ceph-mgr that require access to the system namespace
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-mgr-system
namespace: rook-ceph
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
---
# Aspects of ceph-mgr that operate within the cluster's namespace
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-mgr
namespace: rook-ceph
rules:
- apiGroups:
- ""
resources:
- pods
- services
verbs:
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs
verbs:
- get
- list
- watch
- create
- update
- delete
- apiGroups:
- ceph.rook.io
resources:
- "*"
verbs:
- "*"
---
# Allow the operator to create resources in this cluster's namespace
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-cluster-mgmt
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: rook-ceph-cluster-mgmt
subjects:
- kind: ServiceAccount
name: rook-ceph-system
namespace: rook-ceph-system
---
# Allow the osd pods in this namespace to work with configmaps
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-osd
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rook-ceph-osd
subjects:
- kind: ServiceAccount
name: rook-ceph-osd
namespace: rook-ceph
---
# Allow the ceph mgr to access the cluster-specific resources necessary for the mgr modules
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-mgr
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rook-ceph-mgr
subjects:
- kind: ServiceAccount
name: rook-ceph-mgr
namespace: rook-ceph
---
# Allow the ceph mgr to access the rook system resources necessary for the mgr modules
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-mgr-system
namespace: rook-ceph-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rook-ceph-mgr-system
subjects:
- kind: ServiceAccount
name: rook-ceph-mgr
namespace: rook-ceph
---
# Allow the ceph mgr to access cluster-wide resources necessary for the mgr modules
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-mgr-cluster
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: rook-ceph-mgr-cluster
subjects:
- kind: ServiceAccount
name: rook-ceph-mgr
namespace: rook-ceph
---
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
# The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).
# v12 is luminous, v13 is mimic, and v14 is nautilus.
# RECOMMENDATION: In production, use a specific version tag instead of the general v13 flag, which pulls the latest release and could result in different
# versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.
image: ceph/ceph:v13.2.4-20190109
# Whether to allow unsupported versions of Ceph. Currently only luminous and mimic are supported.
# After nautilus is released, Rook will be updated to support nautilus.
# Do not set to true in production.
allowUnsupported: false
# The path on the host where configuration files will be persisted. If not specified, a kubernetes emptyDir will be created (not recommended).
# Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
# In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.
dataDirHostPath: /var/lib/rook
# set the amount of mons to be started
mon:
count: 3
allowMultiplePerNode: true
# enable the ceph dashboard for viewing cluster status
dashboard:
enabled: true
# serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
# urlPrefix: /ceph-dashboard
# serve the dashboard at the given port.
# port: 8443
# serve the dashboard using SSL
# ssl: true
network:
# toggle to use hostNetwork
hostNetwork: false
rbdMirroring:
# The number of daemons that will perform the rbd mirroring.
# rbd mirroring must be configured with "rbd mirror" from the rook toolbox.
workers: 0
# To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
# The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
# tolerate taints with a key of 'storage-node'.
placement:
osd:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: role
operator: In
values:
- app-node
tolerations:
- key: node-type
operator: Equal
value: app-node
effect: "NoSchedule"
all:
tolerations:
- key: node-type
operator: Equal
value: app-node
effect: "NoSchedule"
# The above placement information can also be specified for mon, osd, and mgr components
# mon:
# osd:
# mgr:
resources:
# The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
# mgr:
# limits:
# cpu: "500m"
# memory: "1024Mi"
# requests:
# cpu: "500m"
# memory: "1024Mi"
# The above example requests/limits can also be added to the mon and osd components
# mon:
# osd:
storage: # cluster level storage configuration and selection
useAllNodes: true
useAllDevices: false
deviceFilter:
location:
config:
# The default and recommended storeType is dynamically set to bluestore for devices and filestore for directories.
# Set the storeType explicitly only if it is required not to use the default.
# storeType: bluestore
databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger)
journalSizeMB: "1024" # this value can be removed for environments with normal sized disks (20 GB or larger)
osdsPerDevice: "1" # this value can be overridden at the node or device level
# Cluster level list of directories to use for storage. These values will be set for all nodes that have no `directories` set.
# directories:
# - path: /rook/storage-dir
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label.
# nodes:
# - name: "172.17.4.101"
# directories: # specific directories to use for storage can be specified for each node
# - path: "/rook/storage-dir"
# resources:
# limits:
# cpu: "500m"
# memory: "1024Mi"
# requests:
# cpu: "500m"
# memory: "1024Mi"
# - name: "172.17.4.201"
# devices: # specific devices to use for storage can be specified for each node
# - name: "sdb"
# - name: "nvme01" # multiple osds can be created on high performance devices
# config:
# osdsPerDevice: "5"
# config: # configuration can be specified at the node level which overrides the cluster level config
# storeType: filestore
# - name: "172.17.4.301"
# deviceFilter: "^sd."
kubectl apply -f cluster.yaml
ここで、もし前回ceph-clusterをdeployしたことがあった場合、次のエラー出る:
The keyring does not match the existing keyring in /var/lib/rook/mon-a/data/keyring. You may need to delete the contents of dataDirHostPath on the host from a previous deployment.
対処案:
GCPのVM INSTANCEでSSHでnodeにloginし, sudo rm -rf /var/lib/rook
を実行する
III
Rook Operatorを更新したいときに
helm upgrade --namespace rook-ceph-system rook-ceph rook-stable/rook-ceph -f values.yaml
IV Node AffinityとNode Taintを用いてFilesystemをProvisioningする
filesystem.yaml
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
name: myfs
namespace: rook-ceph
spec:
# The metadata pool spec
metadataPool:
replicated:
# Increase the replication size if you have more than one osd
size: 1
# The list of data pool specs
dataPools:
- failureDomain: osd
replicated:
size: 1
# The metadata service (mds) configuration
metadataServer:
# The number of active MDS instances
activeCount: 1
# Whether each active MDS instance will have an active standby with a warm metadata cache for faster failover.
# If false, standbys will be available, but will not have a warm cache.
activeStandby: true
# The affinity rules to apply to the mds deployment
placement:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: role
# operator: In
# values:
# - storage-node
tolerations:
- key: "node-type"
operator: "Equal"
value: "app-node"
effect: "NoSchedule"
# tolerations:
# - key: mds-node
# operator: Exists
# podAffinity:
# podAntiAffinity:
resources:
# The requests and limits set here, allow the filesystem MDS Pod(s) to use half of one CPU core and 1 gigabyte of memory
# limits:
# cpu: "500m"
# memory: "1024Mi"
# requests:
# cpu: "500m"
# memory: "1024Mi"
kubectl apply -f filesystem.yaml
V.Rook CephをUninstallする
泥臭いのやり方:
0.kubectl delete -f filesystem.yaml
1.kubectl delete -f cluster.yaml
2.helm del rook-ceph --purge
3.kubectl delete daemonset --all -n rook-ceph-system
4.kubectl delete ns rook-ceph
5.kubectl delete crd cephclusters.ceph.rook.io
6.kubectl delete ns rook-ceph-system
Step 4を実行した後に、kubectl get ns
したらrook-cephずっとTerminatingのStatusになることがあるそこで kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge
を実行してください
やや綺麗なやり方:
0.kubectl delete -f filesystem.yaml
1.kubectl delete -f cluster.yaml
2.kubectl delete ns rook-ceph-system
3.helm del rook-ceph --purge
非常にいい参考になるページ