OpenShift AI
Red Hat® OpenShift® AI とは、柔軟でスケーラブルな人工知能 (AI) および機械学習 (ML) プラットフォームです。このプラットフォームにより、企業はハイブリッドクラウド環境全体で AI 対応アプリケーションを大規模に作成および提供できるようになります。
OpenShift AI はオープンソース・テクノロジーを使用して構築されており、実験、モデル提供、革新的なアプリケーションの提供のための、信頼性と一貫性に優れた運用機能を提供します。
Computing Resources
ここでは、OpenShift AI 2.9.1 / Red Hat OpenShift on IBM Cloud (ROKS) 4.14 の環境で、どの程度の Computing Resource が使用されているか確認します。
今回は、mx2-8x64
Flavor の Worker Node x3 を使用します。
OpenShift AI Components
OpenShift AI の導入に必要な Computing Resource の要件は、以下に記載されています。
-
Use an existing cluster or create a new cluster by following the OpenShift Container Platform documentation: OpenShift Container Platform installation overview.
Your cluster must have
at least 2 worker nodes with at least 8 CPUs and 32 GiB RAM available for OpenShift AI to use when you install the Operator.
To ensure that OpenShift AI is usable, additional cluster resources are required beyond the minimum requirements.
-
A default storage class that can be dynamically provisioned must be configured.
Confirm that a default storage class is configured by running the oc get storageclass command.
If no storage classes are noted with (default) beside the name, follow the OpenShift Container Platform documentation to configure a default storage class: Changing the default storage class. For more information about dynamic provisioning, see Dynamic provisioning.
-
Open Data Hub must not be installed on the cluster.
For more information about managing the machines that make up an OpenShift cluster, see Overview of machine management.
OpenShift AI では、導入・構成した Component
の Pod が redhat-ods-applications
Project で稼働します。
$ oc get pod -n redhat-ods-applications --sort-by=.spec.nodeName
NAME READY STATUS RESTARTS AGE
notebook-controller-deployment-556969c7cf-jj5jj 1/1 Running 0 15d
rhods-dashboard-6f64f58c89-pwhqh 2/2 Running 0 15d
kserve-controller-manager-849dcf98b6-xq6k8 1/1 Running 0 15d
kuberay-operator-759464465c-xsl2j 1/1 Running 0 15d
kueue-controller-manager-56559d6db6-wkc7s 1/1 Running 0 15d
modelmesh-controller-58cc8bd7c5-d57mg 1/1 Running 0 15d
odh-model-controller-6d9b6db854-sbhvr 1/1 Running 0 15d
data-science-pipelines-operator-controller-manager-bc4f57dc2xtk 1/1 Running 0 15d
modelmesh-controller-58cc8bd7c5-594jp 1/1 Running 0 15d
codeflare-operator-manager-6cb5c976d5-zzrkw 1/1 Running 0 15d
rhods-dashboard-6f64f58c89-9b9dt 2/2 Running 0 15d
odh-model-controller-6d9b6db854-5zp9l 1/1 Running 0 15d
rhods-dashboard-6f64f58c89-khvd4 2/2 Running 0 15d
modelmesh-controller-58cc8bd7c5-xjkcm 1/1 Running 0 15d
odh-notebook-controller-manager-66d759cbb4-6bdql 1/1 Running 0 15d
odh-model-controller-6d9b6db854-wlxn5 1/1 Running 0 15d
rhods-dashboard-6f64f58c89-mn9mq 2/2 Running 0 15d
rhods-dashboard-6f64f58c89-nbfkd 2/2 Running 0 15d
etcd-7ddbc959b8-5wf2m 1/1 Running 0 15d
今回は、デフォルトで構成しています。
$ oc get DataScienceCluster default-dsc -o json | jq -r '.status.installedComponents'
{
"codeflare": true,
"dashboard": true,
"data-science-pipelines-operator": true,
"kserve": true,
"kueue": true,
"model-mesh": true,
"ray": true,
"trustyai": false,
"workbenches": true
}
導入直後に各 Pod が要求する Computing Resource (Pod.spec.containers[].resources)
等を確認すると、以下のようになっています。
$ oc get pod -n redhat-ods-applications --sort-by=.spec.nodeName -o custom-columns-file=POD_CUSTOM.txt
NAME NODE OWNER_KIND CONTAINERS CPU_LIMITS MEM_LIMITS CPU_REQUESTS MEM_REQUESTS
notebook-controller-deployment-556969c7cf-jj5jj 11.222.3.4 ReplicaSet manager 500m 4Gi 500m 256Mi
rhods-dashboard-6f64f58c89-pwhqh 11.222.3.4 ReplicaSet rhods-dashboard,oauth-proxy 1,1 2Gi,2Gi 500m,500m 1Gi,1Gi
kserve-controller-manager-849dcf98b6-xq6k8 11.222.3.4 ReplicaSet manager 500m 5Gi 100m 200Mi
kuberay-operator-759464465c-xsl2j 11.222.3.4 ReplicaSet kuberay-operator 100m 512Mi 100m 512Mi
kueue-controller-manager-56559d6db6-wkc7s 11.222.3.4 ReplicaSet manager 500m 512Mi 500m 512Mi
modelmesh-controller-58cc8bd7c5-d57mg 11.222.3.4 ReplicaSet manager 1 2Gi 50m 96Mi
odh-model-controller-6d9b6db854-sbhvr 11.222.3.4 ReplicaSet manager 500m 2Gi 10m 64Mi
data-science-pipelines-operator-controller-manager-bc4f57dc2xtk 11.222.3.5 ReplicaSet manager 1 4Gi 10m 64Mi
modelmesh-controller-58cc8bd7c5-594jp 11.222.3.5 ReplicaSet manager 1 2Gi 50m 96Mi
codeflare-operator-manager-6cb5c976d5-zzrkw 11.222.3.5 ReplicaSet manager 1 1Gi 1 1Gi
rhods-dashboard-6f64f58c89-9b9dt 11.222.3.5 ReplicaSet rhods-dashboard,oauth-proxy 1,1 2Gi,2Gi 500m,500m 1Gi,1Gi
odh-model-controller-6d9b6db854-5zp9l 11.222.3.5 ReplicaSet manager 500m 2Gi 10m 64Mi
rhods-dashboard-6f64f58c89-khvd4 11.222.3.5 ReplicaSet rhods-dashboard,oauth-proxy 1,1 2Gi,2Gi 500m,500m 1Gi,1Gi
modelmesh-controller-58cc8bd7c5-xjkcm 11.222.3.6 ReplicaSet manager 1 2Gi 50m 96Mi
odh-notebook-controller-manager-66d759cbb4-6bdql 11.222.3.6 ReplicaSet manager 500m 4Gi 500m 256Mi
odh-model-controller-6d9b6db854-wlxn5 11.222.3.6 ReplicaSet manager 500m 2Gi 10m 64Mi
rhods-dashboard-6f64f58c89-mn9mq 11.222.3.6 ReplicaSet rhods-dashboard,oauth-proxy 1,1 2Gi,2Gi 500m,500m 1Gi,1Gi
rhods-dashboard-6f64f58c89-nbfkd 11.222.3.6 ReplicaSet rhods-dashboard,oauth-proxy 1,1 2Gi,2Gi 500m,500m 1Gi,1Gi
etcd-7ddbc959b8-5wf2m 11.222.3.6 ReplicaSet etcd 800m 800Mi 400m 200Mi
rhods-dashboard
や codeflare-operator-manager
が、比較的多くのリソースを要求していることが分かります。PVC はありません。
$ oc get pvc -n redhat-ods-applications
No resources found in redhat-ods-applications namespace.
同様に、redhat-ods-monitoring
、redhat-ods-operator
Project も確認してみます。
$ oc get pod -n redhat-ods-monitoring --sort-by=.spec.nodeName -o custom-columns-file=POD_CUSTOM.txt
NAME NODE OWNER_KIND CONTAINERS CPU_LIMITS MEM_LIMITS CPU_REQUESTS MEM_REQUESTS
$ oc get pvc -n redhat-ods-monitoring
No resources found in redhat-ods-monitoring namespace.
$ oc get pod -n redhat-ods-operator --sort-by=.spec.nodeName -o custom-columns-file=POD_CUSTOM.txt
NAME NODE OWNER_KIND CONTAINERS CPU_LIMITS MEM_LIMITS CPU_REQUESTS MEM_REQUESTS
rhods-operator-7666b99787-lsn6z 11.222.3.0.4 ReplicaSet rhods-operator 500m 4Gi 500m 256Mi
$ oc get pvc -n redhat-ods-operator
No resources found in redhat-ods-operator namespace.
Jupyter Notebook
導入完了時点では Jupyter Notebook のみが使用可能になっています。 簡単な設定で構成できるようになっており、Deployment size
で CPU/Memory サイズの選択が可能です。
Settings -> Cluster settings にある PVC size
で PVC のサイズ指定も可能です。
Pod は rhods-notebooks
Project に配置され、Deployment size = Small
、PVC size = 20GiB (Default)
の Computing Resource は、以下のようになっています。
$ oc get pod -n rhods-notebooks -o custom-columns-file=POD_CUSTOM.txt
NAME NODE OWNER_KIND CONTAINERS CPU_LIMITS MEM_LIMITS CPU_REQUESTS MEM_REQUESTS
jupyter-nb-iam-xxue91db-40jp-2exyz-2eabc-0 11.222.3.4 StatefulSet jupyter-nb-iam-xxue91db-40jp-2exyz-2eabc,oauth-proxy 2,100m 8Gi,64Mi 1,100m 8Gi,64Mi
$ oc get pvc -n rhods-notebooks
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
jupyter-nb-iam-xxue91db-40jp-2exyz-2eabc-pvc Bound pvc-bb0ee0cc-2797-4dcb-a8ec 20Gi RWO ibmc-vpc-block-10iops-tier 13d
Data Science Project (OpenShift AI tutorial - Fraud detection example)
Data Science Project の例として、OpenShift AI tutorial - Fraud detection example を確認してみます。
Tutorial を実行すると fraud-detection
Project に Pod が配置されます。 Tutorial 実施時の設定変更の有無にもよりますが、Computing Resource は、概ね以下のようになります。
$ oc get pod -n fraud-detection --sort-by=.spec.nodeName -o custom-columns-file=POD_CUSTOM.txt
NAME NODE OWNER_KIND CONTAINERS CPU_LIMITS MEM_LIMITS CPU_REQUESTS MEM_REQUESTS
ds-pipeline-metadata-grpc-dspa-57d47dd549-r5zds 11.222.3.4 ReplicaSet container 100m 256Mi 100m 256Mi
create-minio-buckets-sqstp 11.222.3.4 Job create-buckets <none> <none> <none> <none>
fraud-detection-0 11.222.3.4 StatefulSet fraud-detection,oauth-proxy 2,100m 8Gi,64Mi 1,100m 8Gi,64Mi
ds-pipeline-persistenceagent-dspa-6b96b9c8d8-sgwkj 11.222.3.4 ReplicaSet ds-pipeline-persistenceagent 250m 1Gi 120m 500Mi
ds-pipeline-workflow-controller-dspa-5b4cdb559-bzg6g 11.222.3.5 ReplicaSet ds-pipeline-workflow-controller <none> <none> 100m 500Mi
ds-pipeline-scheduledworkflow-dspa-64c767c896-lv8d2 11.222.3.5 ReplicaSet ds-pipeline-scheduledworkflow 250m 250Mi 120m 100Mi
create-minio-root-user-gxcpj 11.222.3.5 Job create-minio-root-user <none> <none> <none> <none>
minio-6bff775986-hh2kz 11.222.3.5 ReplicaSet minio 2 2Gi 200m 1Gi
modelmesh-serving-model-server-778b47cc76-487x2 11.222.3.5 ReplicaSet rest-proxy,oauth-proxy,ovms,ovms-adapter,mm 1,100m,2,2,3 512Mi,256Mi,8Gi,512Mi,448Mi 50m,100m,1,50m,300m 96Mi,256Mi,4Gi,96Mi,448Mi
ds-pipeline-metadata-envoy-dspa-9c745bb5d-bphwb 11.222.3.6 ReplicaSet container,oauth-proxy 100m,100m 256Mi,256Mi 100m,100m 256Mi,256Mi
ds-pipeline-dspa-7ffd8ff549-dhtnx 11.222.3.6 ReplicaSet ds-pipeline-api-server,oauth-proxy 500m,100m 1Gi,256Mi 250m,100m 500Mi,256Mi
create-s3-storage-hjqr2 11.222.3.6 Job create-s3-storage <none> <none> <none> <none>
create-ds-connections-j62fq 11.222.3.6 Job create-ds-connections <none> <none> <none> <none>
mariadb-dspa-8685d54888-8qhhj
$ oc get pvc -n fraud-detection
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
fraud-detection Bound pvc-32db862a-b2bb-45c7-8218-6462b18a2472 20Gi RWO ibmc-vpc-block-10iops-tier 13d
mariadb-dspa Bound pvc-a1be0870-cf1e-49fe-a564-b0c6d6e23fd0 10Gi RWO ibmc-vpc-block-10iops-tier 13d
minio Bound pvc-1ec0b7a6-9d38-4d0c-b76b-2fdd5599ba9b 10Gi RWO ibmc-vpc-block-10iops-tier 13d