OpenShift Virtualization
Red Hat® OpenShift® Virtualization は Red Hat OpenShift に含まれる機能であり、組織が新規および既存の仮想マシン (VM) ワークロードを実行およびデプロイするための先進的なプラットフォームを提供します。このソリューションにより、従来の仮想マシンを、信頼できる、一貫した包括的なハイブリッドクラウド・アプリケーション・プラットフォームに簡単に移行できます。
OpenShift Virtualization は、VM の移行を単純化するとともに、クラウドネイティブ・アプリケーション・プラットフォームのシンプルさと速度を利用してインフラストラクチャのモダナイゼーションの道筋を提供します。また、先進的な管理の原則を取り入れつつ既存の仮想化への投資を維持することを目指しており、Red Hat の包括的な仮想化ソリューションの基盤となります。
Postinstallation Configuration
ここでは、以下の手順を参考に OCP 4.17 Bare Metal 環境 OpenShift Virtualization (OCP-V) の Postinstallation Configuration を確認してみます。
Specifying nodes for OpenShift Virtualization components
OCP-V 関連リソースのノード配置を指定することが可能です。
Subscription
OCP-V Operator のノード配置を指定する場合は Subscription.spec.config
で指定可能な nodeSelector, affinity, tolerations を使用します。現時点では、OCP Console からの設定はできません。
$ oc explain Subscription.spec.config
GROUP: operators.coreos.com
KIND: Subscription
VERSION: v1alpha1
FIELD: config <Object>
DESCRIPTION:
SubscriptionConfig contains configuration specified for a subscription.
FIELDS:
affinity <Object>
If specified, overrides the pod's scheduling constraints.
nil sub-attributes will *not* override the original values in the pod.spec
for those sub-attributes.
Use empty object ({}) to erase original sub-attribute values.
annotations <map[string]string>
Annotations is an unstructured key value map stored with each Deployment,
Pod, APIService in the Operator.
Typically, annotations may be set by external tools to store and retrieve
arbitrary metadata.
Use this field to pre-define annotations that OLM should add to each of the
Subscription's
deployments, pods, and apiservices.
env <[]Object>
Env is a list of environment variables to set in the container.
Cannot be updated.
envFrom <[]Object>
EnvFrom is a list of sources to populate environment variables in the
container.
The keys defined within a source must be a C_IDENTIFIER. All invalid keys
will be reported as an event when the container is starting. When a key
exists in multiple
sources, the value associated with the last source will take precedence.
Values defined by an Env with a duplicate key will take precedence.
Immutable.
nodeSelector <map[string]string>
NodeSelector is a selector which must be true for the pod to fit on a node.
Selector which must match a node's labels for the pod to be scheduled on
that node.
More info:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
resources <Object>
Resources represents compute resources required by this container.
Immutable.
More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
selector <Object>
Selector is the label selector for pods to be configured.
Existing ReplicaSets whose pods are
selected by this will be the ones affected by this deployment.
It must match the pod template's labels.
tolerations <[]Object>
Tolerations are the pod's tolerations.
volumeMounts <[]Object>
List of VolumeMounts to set in the container.
volumes <[]Object>
List of Volumes to set in the podSpec.
HyperConverged
OCP-V Component のノード配置を指定する場合は HyperConverged を使用します。
HyperConverged.spec.infra.nodePlacement
HyperConverged.spec.infra.nodePlacement
では、OCP-V Infra Component (OCP-V Pod) のノード配置を指定します。
$ oc explain HyperConverged.spec.infra
GROUP: hco.kubevirt.io
KIND: HyperConverged
VERSION: v1beta1
FIELD: infra <Object>
DESCRIPTION:
infra HyperConvergedConfig influences the pod configuration (currently only
placement)
for all the infra components needed on the virtualization enabled cluster
but not necessarily directly on each node running VMs/VMIs.
FIELDS:
nodePlacement <Object>
NodePlacement describes node scheduling configuration.
$ oc explain HyperConverged.spec.infra.nodePlacement
GROUP: hco.kubevirt.io
KIND: HyperConverged
VERSION: v1beta1
FIELD: nodePlacement <Object>
DESCRIPTION:
NodePlacement describes node scheduling configuration.
FIELDS:
affinity <Object>
affinity enables pod affinity/anti-affinity placement expanding the types of
constraints
that can be expressed with nodeSelector.
affinity is going to be applied to the relevant kind of pods in parallel
with nodeSelector
See
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity
nodeSelector <map[string]string>
nodeSelector is the node selector applied to the relevant kind of pods
It specifies a map of key-value pairs: for the pod to be eligible to run on
a node,
the node must have each of the indicated key-value pairs as labels
(it can have additional labels as well).
See
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
tolerations <[]Object>
tolerations is a list of tolerations applied to the relevant kind of pods
See https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
for more info.
These are additional tolerations other than default ones.
HyperConverged.spec.workloads.nodePlacement
HyperConverged.spec.workloads.nodePlacement
では、OCP-V Workload Component (OCP-V 管理下で稼働する VM Pod) のノード配置を指定します。
$ oc explain HyperConverged.spec.workloads
GROUP: hco.kubevirt.io
KIND: HyperConverged
VERSION: v1beta1
FIELD: workloads <Object>
DESCRIPTION:
workloads HyperConvergedConfig influences the pod configuration (currently
only placement) of components
which need to be running on a node where virtualization workloads should be
able to run.
Changes to Workloads HyperConvergedConfig can be applied only without
existing workload.
FIELDS:
nodePlacement <Object>
NodePlacement describes node scheduling configuration.
$ oc explain HyperConverged.spec.workloads.nodePlacement
GROUP: hco.kubevirt.io
KIND: HyperConverged
VERSION: v1beta1
FIELD: nodePlacement <Object>
DESCRIPTION:
NodePlacement describes node scheduling configuration.
FIELDS:
affinity <Object>
affinity enables pod affinity/anti-affinity placement expanding the types of
constraints
that can be expressed with nodeSelector.
affinity is going to be applied to the relevant kind of pods in parallel
with nodeSelector
See
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity
nodeSelector <map[string]string>
nodeSelector is the node selector applied to the relevant kind of pods
It specifies a map of key-value pairs: for the pod to be eligible to run on
a node,
the node must have each of the indicated key-value pairs as labels
(it can have additional labels as well).
See
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
tolerations <[]Object>
tolerations is a list of tolerations applied to the relevant kind of pods
See https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
for more info.
These are additional tolerations other than default ones.
HostPathProvisioner (HPP)
HostPathProvisioner を使用する場合は、HostPathProvisioner.spec.workload
でノード配置を指定します。指定する場合は、HostPathProvisioner を使用する VM Pod と同じノード配置にする必要があります。
Configuring local storage by using the hostpath provisioner
$ oc explain HostPathProvisioner.spec.workload
GROUP: hostpathprovisioner.kubevirt.io
KIND: HostPathProvisioner
VERSION: v1beta1
FIELD: workload <Object>
DESCRIPTION:
Restrict on which nodes HPP workload pods will be scheduled
FIELDS:
affinity <Object>
affinity enables pod affinity/anti-affinity placement expanding the types of
constraints
that can be expressed with nodeSelector.
affinity is going to be applied to the relevant kind of pods in parallel
with nodeSelector
See
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity
nodeSelector <map[string]string>
nodeSelector is the node selector applied to the relevant kind of pods
It specifies a map of key-value pairs: for the pod to be eligible to run on
a node,
the node must have each of the indicated key-value pairs as labels
(it can have additional labels as well).
See
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
tolerations <[]Object>
tolerations is a list of tolerations applied to the relevant kind of pods
See https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
for more info.
These are additional tolerations other than default ones.
Postinstallation network configuration
OCP-V 管理下の VM Pod が使用する Network を独自に構成する事が可能です。
NMState Operator
SR-IOV Operator
MetalLB Operator
を使用したり、NodeNetworkConfigurationPolicy (NNCP)
NetworkAttachmentDefinition (NAD)
で VM Pod の Live Migration や External Access に使用する Linux Bridge Network を構成したりすることが可能です。
なお、Live Migration 専用の Network を設定したい場合は、HyperConverged.spec.liveMigrationConfig.network
を使用します。
$ oc explain HyperConverged.spec.liveMigrationConfig.network
GROUP: hco.kubevirt.io
KIND: HyperConverged
VERSION: v1beta1
FIELD: network <string>
DESCRIPTION:
The migrations will be performed over a dedicated multus network to minimize
disruption to tenant workloads due to network saturation when VM live
migrations are triggered.
$ oc explain HyperConverged.spec.liveMigrationConfig
GROUP: hco.kubevirt.io
KIND: HyperConverged
VERSION: v1beta1
FIELD: liveMigrationConfig <Object>
DESCRIPTION:
Live migration limits and timeouts are applied so that migration processes
do not
overwhelm the cluster.
FIELDS:
allowAutoConverge <boolean>
AllowAutoConverge allows the platform to compromise performance/availability
of VMIs to
guarantee successful VMI live migrations. Defaults to false
allowPostCopy <boolean>
When enabled, KubeVirt attempts to use post-copy live-migration in case it
reaches its completion timeout while attempting pre-copy live-migration.
Post-copy migrations allow even the busiest VMs to successfully
live-migrate.
However, events like a network failure or a failure in any of the source or
destination nodes can cause the migrated VM to crash or reach inconsistency.
Enable this option when evicting nodes is more important than keeping VMs
alive.
Defaults to false.
bandwidthPerMigration <string>
Bandwidth limit of each migration, the value is quantity of bytes per second
(e.g. 2048Mi = 2048MiB/sec)
completionTimeoutPerGiB <integer>
If a migrating VM is big and busy, while the connection to the destination
node
is slow, migration may never converge. The completion timeout is calculated
based on completionTimeoutPerGiB times the size of the guest (both RAM and
migrated disks, if any). For example, with completionTimeoutPerGiB set to
800,
a virtual machine instance with 6GiB memory will timeout if it has not
completed migration in 1h20m. Use a lower completionTimeoutPerGiB to induce
quicker failure, so that another destination or post-copy is attempted. Use
a
higher completionTimeoutPerGiB to let workload with spikes in its memory
dirty
rate to converge.
The format is a number.
network <string>
The migrations will be performed over a dedicated multus network to minimize
disruption to tenant workloads due to network saturation when VM live
migrations are triggered.
parallelMigrationsPerCluster <integer>
Number of migrations running in parallel in the cluster.
parallelOutboundMigrationsPerNode <integer>
Maximum number of outbound migrations per node.
progressTimeout <integer>
The migration will be canceled if memory copy fails to make progress in this
time, in seconds.
この設定は OCP Console からも可能です。
Postinstallation storage configuration
OCP-V Storage の概要は以下で確認することが出来ます。
Storage configuration overview
Postinstallation Configuration として、マニュアルに以下の必須設定項目が記載されています。
- You must configure a default storage class for your cluster. Otherwise, the cluster cannot receive automated boot source updates.
- You must configure storage profiles if your storage provider is not recognized by CDI. A storage profile provides recommended storage settings based on the associated storage class.
Default StorageClass
今回の構成では OpenShift Data Foundation (ODF) を使用しているため ocs-storagecluster-ceph-rbd
を Default に設定しています。
なお、ocs-storagecluster-ceph-rbd-virtualization
は OCP-V Operator 導入時に構成された StorageClass で、その他は ODF 導入時に構成されたものです。
$ oc get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ocs-storagecluster-ceph-rbd (default) openshift-storage.rbd.csi.ceph.com Delete Immediate true 1d
ocs-storagecluster-ceph-rbd-virtualization openshift-storage.rbd.csi.ceph.com Delete Immediate true 1d
ocs-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 1d
ocs-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 1d
openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 1d
$ oc -o yaml get sc ocs-storagecluster-ceph-rbd-virtualization
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
description: Provides RWO and RWX Block volumes suitable for Virtual Machine disks
reclaimspace.csiaddons.openshift.io/schedule: '@weekly'
storageclass.kubevirt.io/is-default-virt-class: "true"
name: ocs-storagecluster-ceph-rbd-virtualization
parameters:
clusterID: openshift-storage
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: openshift-storage
csi.storage.k8s.io/fstype: ext4
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: openshift-storage
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: openshift-storage
imageFeatures: layering,deep-flatten,exclusive-lock,object-map,fast-diff
imageFormat: "2"
mapOptions: krbd:rxbounce
mounter: rbd
pool: ocs-storagecluster-cephblockpool
provisioner: openshift-storage.rbd.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
StorageProfile
OCP-V を設定すると、StorageClass に対応した StorageProfile が自動的に構成されます。
oc get StorageProfile -o wide
NAME AGE
ocs-storagecluster-ceph-rbd 1d
ocs-storagecluster-ceph-rbd-virtualization 1d
ocs-storagecluster-ceph-rgw 1d
ocs-storagecluster-cephfs 1d
openshift-storage.noobaa.io 1d
例えば、前述の ocs-storagecluster-ceph-rbd-virtualization
に対応する StorageProfile は以下のようになっています。
$ oc -o yaml get StorageProfile ocs-storagecluster-ceph-rbd-virtualization
apiVersion: cdi.kubevirt.io/v1beta1
kind: StorageProfile
metadata:
generation: 1
labels:
app: containerized-data-importer
app.kubernetes.io/component: storage
app.kubernetes.io/managed-by: cdi-controller
app.kubernetes.io/part-of: hyperconverged-cluster
app.kubernetes.io/version: 4.17.4
cdi.kubevirt.io: ""
name: ocs-storagecluster-ceph-rbd-virtualization
ownerReferences:
- apiVersion: cdi.kubevirt.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: CDI
name: cdi-kubevirt-hyperconverged
spec: {}
status:
claimPropertySets:
- accessModes:
- ReadWriteMany
volumeMode: Block
- accessModes:
- ReadWriteOnce
volumeMode: Block
- accessModes:
- ReadWriteOnce
volumeMode: Filesystem
cloneStrategy: csi-clone
dataImportCronSourceFormat: snapshot
provisioner: openshift-storage.rbd.csi.ceph.com
snapshotClass: ocs-storagecluster-rbdplugin-snapclass
storageClass: ocs-storagecluster-ceph-rbd-virtualization
なお、マニュアルには ODF 使用時の注記が記載されています。
Important
When using OpenShift Virtualization with Red Hat OpenShift Data Foundation, specify RBD block mode persistent volume claims (PVCs) when creating virtual machine disks. RBD block mode volumes are more efficient and provide better performance than Ceph FS or RBD filesystem-mode PVCs.
To specify RBD block mode PVCs, use the 'ocs-storagecluster-ceph-rbd' storage class and VolumeMode: Block.
HostPathProvisioner (HPP)
OCP-V Operator を導入すると HPP が使用可能になります。オプションの設定項目として HPP を使用した VM Storage の構成が可能です。
Configuring local storage by using the hostpath provisioner
Configuring higher VM workload density
メモリー (RAM) をオーバーコミットすることで、ノード上の仮想マシン数を増やすことができます。