0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

OpenShift AI - Computing Resources

Last updated at Posted at 2024-06-28

OpenShift AI

Red Hat® OpenShift® AI とは、柔軟でスケーラブルな人工知能 (AI) および機械学習 (ML) プラットフォームです。このプラットフォームにより、企業はハイブリッドクラウド環境全体で AI 対応アプリケーションを大規模に作成および提供できるようになります。

OpenShift AI はオープンソース・テクノロジーを使用して構築されており、実験、モデル提供、革新的なアプリケーションの提供のための、信頼性と一貫性に優れた運用機能を提供します。

Computing Resources

ここでは、OpenShift AI 2.9.1 / Red Hat OpenShift on IBM Cloud (ROKS) 4.14 の環境で、どの程度の Computing Resource が使用されているか確認します。

今回は、mx2-8x64 Flavor の Worker Node x3 を使用します。

image.png

OpenShift AI Components

OpenShift AI の導入に必要な Computing Resource の要件は、以下に記載されています。

  • Use an existing cluster or create a new cluster by following the OpenShift Container Platform documentation: OpenShift Container Platform installation overview.

    Your cluster must have at least 2 worker nodes with at least 8 CPUs and 32 GiB RAM available for OpenShift AI to use when you install the Operator. To ensure that OpenShift AI is usable, additional cluster resources are required beyond the minimum requirements.

  • A default storage class that can be dynamically provisioned must be configured.

    Confirm that a default storage class is configured by running the oc get storageclass command. If no storage classes are noted with (default) beside the name, follow the OpenShift Container Platform documentation to configure a default storage class: Changing the default storage class. For more information about dynamic provisioning, see Dynamic provisioning.

  • Open Data Hub must not be installed on the cluster.

    For more information about managing the machines that make up an OpenShift cluster, see Overview of machine management.

OpenShift AI では、導入・構成した Component の Pod が redhat-ods-applications Project で稼働します。

$ oc get pod -n redhat-ods-applications --sort-by=.spec.nodeName
NAME                                                              READY   STATUS    RESTARTS       AGE
notebook-controller-deployment-556969c7cf-jj5jj                   1/1     Running   0              15d
rhods-dashboard-6f64f58c89-pwhqh                                  2/2     Running   0              15d
kserve-controller-manager-849dcf98b6-xq6k8                        1/1     Running   0              15d
kuberay-operator-759464465c-xsl2j                                 1/1     Running   0              15d
kueue-controller-manager-56559d6db6-wkc7s                         1/1     Running   0              15d
modelmesh-controller-58cc8bd7c5-d57mg                             1/1     Running   0              15d
odh-model-controller-6d9b6db854-sbhvr                             1/1     Running   0              15d
data-science-pipelines-operator-controller-manager-bc4f57dc2xtk   1/1     Running   0              15d
modelmesh-controller-58cc8bd7c5-594jp                             1/1     Running   0              15d
codeflare-operator-manager-6cb5c976d5-zzrkw                       1/1     Running   0              15d
rhods-dashboard-6f64f58c89-9b9dt                                  2/2     Running   0              15d
odh-model-controller-6d9b6db854-5zp9l                             1/1     Running   0              15d
rhods-dashboard-6f64f58c89-khvd4                                  2/2     Running   0              15d
modelmesh-controller-58cc8bd7c5-xjkcm                             1/1     Running   0              15d
odh-notebook-controller-manager-66d759cbb4-6bdql                  1/1     Running   0              15d
odh-model-controller-6d9b6db854-wlxn5                             1/1     Running   0              15d
rhods-dashboard-6f64f58c89-mn9mq                                  2/2     Running   0              15d
rhods-dashboard-6f64f58c89-nbfkd                                  2/2     Running   0              15d
etcd-7ddbc959b8-5wf2m                                             1/1     Running   0              15d

今回は、デフォルトで構成しています。

$ oc get DataScienceCluster default-dsc -o json | jq -r '.status.installedComponents'
{
  "codeflare": true,
  "dashboard": true,
  "data-science-pipelines-operator": true,
  "kserve": true,
  "kueue": true,
  "model-mesh": true,
  "ray": true,
  "trustyai": false,
  "workbenches": true
}

導入直後に各 Pod が要求する Computing Resource (Pod.spec.containers[].resources) 等を確認すると、以下のようになっています。

$ oc get pod -n redhat-ods-applications --sort-by=.spec.nodeName -o custom-columns-file=POD_CUSTOM.txt
NAME                                                              NODE         OWNER_KIND   CONTAINERS                    CPU_LIMITS   MEM_LIMITS   CPU_REQUESTS   MEM_REQUESTS
notebook-controller-deployment-556969c7cf-jj5jj                   11.222.3.4   ReplicaSet   manager                       500m         4Gi          500m           256Mi
rhods-dashboard-6f64f58c89-pwhqh                                  11.222.3.4   ReplicaSet   rhods-dashboard,oauth-proxy   1,1          2Gi,2Gi      500m,500m      1Gi,1Gi
kserve-controller-manager-849dcf98b6-xq6k8                        11.222.3.4   ReplicaSet   manager                       500m         5Gi          100m           200Mi
kuberay-operator-759464465c-xsl2j                                 11.222.3.4   ReplicaSet   kuberay-operator              100m         512Mi        100m           512Mi
kueue-controller-manager-56559d6db6-wkc7s                         11.222.3.4   ReplicaSet   manager                       500m         512Mi        500m           512Mi
modelmesh-controller-58cc8bd7c5-d57mg                             11.222.3.4   ReplicaSet   manager                       1            2Gi          50m            96Mi
odh-model-controller-6d9b6db854-sbhvr                             11.222.3.4   ReplicaSet   manager                       500m         2Gi          10m            64Mi
data-science-pipelines-operator-controller-manager-bc4f57dc2xtk   11.222.3.5   ReplicaSet   manager                       1            4Gi          10m            64Mi
modelmesh-controller-58cc8bd7c5-594jp                             11.222.3.5   ReplicaSet   manager                       1            2Gi          50m            96Mi
codeflare-operator-manager-6cb5c976d5-zzrkw                       11.222.3.5   ReplicaSet   manager                       1            1Gi          1              1Gi
rhods-dashboard-6f64f58c89-9b9dt                                  11.222.3.5   ReplicaSet   rhods-dashboard,oauth-proxy   1,1          2Gi,2Gi      500m,500m      1Gi,1Gi
odh-model-controller-6d9b6db854-5zp9l                             11.222.3.5   ReplicaSet   manager                       500m         2Gi          10m            64Mi
rhods-dashboard-6f64f58c89-khvd4                                  11.222.3.5   ReplicaSet   rhods-dashboard,oauth-proxy   1,1          2Gi,2Gi      500m,500m      1Gi,1Gi
modelmesh-controller-58cc8bd7c5-xjkcm                             11.222.3.6   ReplicaSet   manager                       1            2Gi          50m            96Mi
odh-notebook-controller-manager-66d759cbb4-6bdql                  11.222.3.6   ReplicaSet   manager                       500m         4Gi          500m           256Mi
odh-model-controller-6d9b6db854-wlxn5                             11.222.3.6   ReplicaSet   manager                       500m         2Gi          10m            64Mi
rhods-dashboard-6f64f58c89-mn9mq                                  11.222.3.6   ReplicaSet   rhods-dashboard,oauth-proxy   1,1          2Gi,2Gi      500m,500m      1Gi,1Gi
rhods-dashboard-6f64f58c89-nbfkd                                  11.222.3.6   ReplicaSet   rhods-dashboard,oauth-proxy   1,1          2Gi,2Gi      500m,500m      1Gi,1Gi
etcd-7ddbc959b8-5wf2m                                             11.222.3.6   ReplicaSet   etcd                          800m         800Mi        400m           200Mi

rhods-dashboardcodeflare-operator-manager が、比較的多くのリソースを要求していることが分かります。PVC はありません。

$ oc get pvc -n redhat-ods-applications
No resources found in redhat-ods-applications namespace.

同様に、redhat-ods-monitoringredhat-ods-operator Project も確認してみます。

$ oc get pod -n redhat-ods-monitoring --sort-by=.spec.nodeName -o custom-columns-file=POD_CUSTOM.txt
NAME   NODE   OWNER_KIND   CONTAINERS   CPU_LIMITS   MEM_LIMITS   CPU_REQUESTS   MEM_REQUESTS

$ oc get pvc -n redhat-ods-monitoring
No resources found in redhat-ods-monitoring namespace.
$ oc get pod -n redhat-ods-operator --sort-by=.spec.nodeName -o custom-columns-file=POD_CUSTOM.txt
NAME                              NODE         OWNER_KIND   CONTAINERS       CPU_LIMITS   MEM_LIMITS   CPU_REQUESTS   MEM_REQUESTS
rhods-operator-7666b99787-lsn6z   11.222.3.0.4   ReplicaSet   rhods-operator   500m         4Gi          500m           256Mi

$ oc get pvc -n redhat-ods-operator
No resources found in redhat-ods-operator namespace.
Jupyter Notebook

導入完了時点では Jupyter Notebook のみが使用可能になっています。 簡単な設定で構成できるようになっており、Deployment size で CPU/Memory サイズの選択が可能です。

1.png

Settings -> Cluster settings にある PVC size で PVC のサイズ指定も可能です。

6.png

Pod は rhods-notebooks Project に配置され、Deployment size = SmallPVC size = 20GiB (Default) の Computing Resource は、以下のようになっています。

$ oc get pod -n rhods-notebooks -o custom-columns-file=POD_CUSTOM.txt
NAME                                         NODE         OWNER_KIND    CONTAINERS                                             CPU_LIMITS   MEM_LIMITS   CPU_REQUESTS   MEM_REQUESTS
jupyter-nb-iam-xxue91db-40jp-2exyz-2eabc-0   11.222.3.4   StatefulSet   jupyter-nb-iam-xxue91db-40jp-2exyz-2eabc,oauth-proxy   2,100m       8Gi,64Mi     1,100m         8Gi,64Mi

$ oc get pvc -n rhods-notebooks
NAME                                                  STATUS   VOLUME                        CAPACITY   ACCESS MODES   STORAGECLASS                 AGE
jupyter-nb-iam-xxue91db-40jp-2exyz-2eabc-pvc          Bound    pvc-bb0ee0cc-2797-4dcb-a8ec   20Gi       RWO            ibmc-vpc-block-10iops-tier   13d
Data Science Project (OpenShift AI tutorial - Fraud detection example)

Data Science Project の例として、OpenShift AI tutorial - Fraud detection example を確認してみます。

Tutorial を実行すると fraud-detection Project に Pod が配置されます。 Tutorial 実施時の設定変更の有無にもよりますが、Computing Resource は、概ね以下のようになります。

$ oc get pod -n fraud-detection --sort-by=.spec.nodeName -o custom-columns-file=POD_CUSTOM.txt
NAME                                                   NODE         OWNER_KIND    CONTAINERS                                    CPU_LIMITS     MEM_LIMITS                    CPU_REQUESTS          MEM_REQUESTS
ds-pipeline-metadata-grpc-dspa-57d47dd549-r5zds        11.222.3.4   ReplicaSet    container                                     100m           256Mi                         100m                  256Mi
create-minio-buckets-sqstp                             11.222.3.4   Job           create-buckets                                <none>         <none>                        <none>                <none>
fraud-detection-0                                      11.222.3.4   StatefulSet   fraud-detection,oauth-proxy                   2,100m         8Gi,64Mi                      1,100m                8Gi,64Mi
ds-pipeline-persistenceagent-dspa-6b96b9c8d8-sgwkj     11.222.3.4   ReplicaSet    ds-pipeline-persistenceagent                  250m           1Gi                           120m                  500Mi
ds-pipeline-workflow-controller-dspa-5b4cdb559-bzg6g   11.222.3.5   ReplicaSet    ds-pipeline-workflow-controller               <none>         <none>                        100m                  500Mi
ds-pipeline-scheduledworkflow-dspa-64c767c896-lv8d2    11.222.3.5   ReplicaSet    ds-pipeline-scheduledworkflow                 250m           250Mi                         120m                  100Mi
create-minio-root-user-gxcpj                           11.222.3.5   Job           create-minio-root-user                        <none>         <none>                        <none>                <none>
minio-6bff775986-hh2kz                                 11.222.3.5   ReplicaSet    minio                                         2              2Gi                           200m                  1Gi
modelmesh-serving-model-server-778b47cc76-487x2        11.222.3.5   ReplicaSet    rest-proxy,oauth-proxy,ovms,ovms-adapter,mm   1,100m,2,2,3   512Mi,256Mi,8Gi,512Mi,448Mi   50m,100m,1,50m,300m   96Mi,256Mi,4Gi,96Mi,448Mi
ds-pipeline-metadata-envoy-dspa-9c745bb5d-bphwb        11.222.3.6   ReplicaSet    container,oauth-proxy                         100m,100m      256Mi,256Mi                   100m,100m             256Mi,256Mi
ds-pipeline-dspa-7ffd8ff549-dhtnx                      11.222.3.6   ReplicaSet    ds-pipeline-api-server,oauth-proxy            500m,100m      1Gi,256Mi                     250m,100m             500Mi,256Mi
create-s3-storage-hjqr2                                11.222.3.6   Job           create-s3-storage                             <none>         <none>                        <none>                <none>
create-ds-connections-j62fq                            11.222.3.6   Job           create-ds-connections                         <none>         <none>                        <none>                <none>
mariadb-dspa-8685d54888-8qhhj

$ oc get pvc -n fraud-detection
NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                 AGE
fraud-detection   Bound    pvc-32db862a-b2bb-45c7-8218-6462b18a2472   20Gi       RWO            ibmc-vpc-block-10iops-tier   13d
mariadb-dspa      Bound    pvc-a1be0870-cf1e-49fe-a564-b0c6d6e23fd0   10Gi       RWO            ibmc-vpc-block-10iops-tier   13d
minio             Bound    pvc-1ec0b7a6-9d38-4d0c-b76b-2fdd5599ba9b   10Gi       RWO            ibmc-vpc-block-10iops-tier   13d
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?