ai-services CLI で Power + Spyre 環境に AIアプリをデプロイ

Last updated at 2026-06-15Posted at 2026-06-15

IBM が公開している ai-services CLI を使うと、IBM Power + Spyre アクセラレーター環境に RAGや要約の機能を利用できるAIアシスタントをワンコマンドに近い操作で構築できます。
今回は bootstrap による環境セットアップから、AIアシスタントのデプロイ・操作、要約 API のテストまで一通り試してみます。

公式のドキュメントはこちらです。

環境

サーバー : IBM Power（IBM Spyre アクセラレーター搭載）
OS : RHEL 9.6（ppc64le）
コンテナランタイム : Podman
ai-services バージョン : v0.2.0
モデル : ibm-granite/granite-3.3-8b-instruct（推論）/ ibm-granite/granite-embedding-278m-multilingual（埋め込み）/ BAAI/bge-reranker-v2-m3（リランク）

ai-services CLI のインストール

GitHub Releases から ppc64le 向けのバイナリをダウンロードし、パスの通った場所に配置します。

curl -Lo ./ai-services https://github.com/IBM/project-ai-services/releases/download/v0.2.0/ai-services-linux-ppc64le
chmod +x ai-services
mv ai-services /usr/local/bin/

インストール後、以下のコマンドで Podman のセットアップと Spyre カードの設定確認をまとめて行います。

ai-services bootstrap --runtime podman

実行するとまず Podman の設定が行われ、続いて Spyre カードの設定チェックが走ります。
初回実行時はいくつかの設定ファイルが自動で作成（Auto Fixed）されます。

servicereport 2.2.5

Spyre configuration checks                          PASS

  VFIO Driver configuration                         PASS     Auto Fixed
  User memlock configuration                        PASS     Auto Fixed
  sos config                                        PASS     Auto Fixed
  sos package                                       PASS
  VFIO udev rules configuration                     PASS     Auto Fixed
  User group configuration                          PASS     Auto Fixed
  VFIO device permission                            PASS
  VFIO kernel module loaded                         PASS
  VFIO module dep configuration                     PASS     Auto Fixed

最終的に以下のように表示されれば bootstrap 完了です。

✔ Spyre cards configuration validated successfully.
All validations passed
LPAR bootstrapped successfully
----------------------------------------------------------------------------
Re-login to the shell to reflect necessary permissions assigned to vfio cards

補足: NUMA ノードのアラインメントに関する警告が出ることがあります。
✖ current NUMA node configuration (4) is not aligned for maximum efficiency.
動作には影響しませんが、最大パフォーマンスを得るには NUMA ノード数を 1 に揃えることが推奨されます。詳細は IBM ドキュメントを参照してください。

registry.redhat.io へのログイン

RAG テンプレートでは Red Hat AI Inference Server（RHAIIS）のコンテナイメージ（registry.redhat.io/rhaiis/vllm-spyre-rhel9）を使用します。
事前に Red Hat アカウントでログインしておきます。

podman login registry.redhat.io

Spyre デバイスの確認

lspci で搭載されている Spyre アクセラレーターを確認しておきます。

lspci

実行例では NVMe コントローラーに加えて、5 枚の IBM Spyre Accelerator が認識されていることが確認できます。

0182:70:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller ...
0381:50:00.0 Processing accelerators: IBM Spyre Accelerator (rev 02)
0481:50:00.0 Processing accelerators: IBM Spyre Accelerator (rev 02)
0482:60:00.0 Processing accelerators: IBM Spyre Accelerator (rev 02)
0483:70:00.0 Processing accelerators: IBM Spyre Accelerator (rev 02)
0484:80:00.0 Processing accelerators: IBM Spyre Accelerator (rev 02)

lspci -v で各カードの詳細を確認すると、Spyre カードは vfio-pci ドライバーで管理されていることがわかります。

Kernel driver in use: vfio-pci

AIアシスタントのデプロイ

準備ができたら ai-services application create でアプリを作成します。
-t rag でテンプレートを指定するだけで、必要なコンテナイメージのプルとモデルのダウンロードが自動で行われます。

ai-services application create rag-app -t rag --runtime podman

実行するとまず LPAR の検証が走り、続いてコンテナイメージとモデルのダウンロードが始まります。
ダウンロードされるイメージとモデルは以下の通りです。

コンポーネント	イメージ / モデル
RAG UI	`icr.io/ai-services/rag-ui:v0.0.26`
RAG バックエンド	`icr.io/ai-services/rag:v0.0.80`
Digitize UI	`icr.io/ai-services/digitize-ui:v0.0.13`
OpenSearch	`icr.io/ppc64le-oss/opensearch-ppc64le:3.3.0`
vLLM（Spyre 向け）	`registry.redhat.io/rhaiis/vllm-spyre-rhel9:3.3.0`
vLLM（ppc64le）	`icr.io/ppc64le-oss/vllm-ppc64le:0.9.1`
リランクモデル	`BAAI/bge-reranker-v2-m3`
埋め込みモデル	`ibm-granite/granite-embedding-278m-multilingual`
推論モデル	`ibm-granite/granite-3.3-8b-instruct`（約 16GB）

モデルはすべて /var/lib/ai-services/models 以下に保存されます。
ダウンロード完了後、Pod のデプロイが 3 つのレイヤーに分けて順次実行されます。

Executing Layer 1/3: [opensearch.yaml.tmpl vllm-server.yaml.tmpl]
Executing Layer 2/3: [clean-docs.yaml.tmpl]
Executing Layer 3/3: [ingest-docs.yaml.tmpl digitize.yaml.tmpl summarize-api.yaml.tmpl chat-bot.yaml.tmpl]

最終的に以下のように表示されればデプロイ完了です。

✔ Application 'rag-app' deployed successfully

アクセス情報の確認

デプロイ完了後、各サービスの URL を application info コマンドで確認できます。

ai-services application info rag-app --runtime podman

Application Name: rag-app
Application Template: rag
Version: 0.0.1
Info:
-------
Day N:

- Q&A Chatbot is available to use at http://xxx.xxx.xxx.xxx:xxxxx.

- Add documents to your RAG application using the Digitize Documents UI: http://xxx.xxx.xxx.xxx:xxxxx.

- Digitize Documents API is available to use at http://xxx.xxx.xxx.xxx:xxxxx.

- Summarize API is available to use at http://xxx.xxx.xxx.xxx:xxxxx.

各エンドポイントの用途は以下の通りです。

サービス	説明
Q&A Chatbot	RAG によるチャット UI
Digitize Documents UI	ドキュメントをアップロードして RAG のインデックスに取り込む UI
Digitize Documents API	ドキュメント取り込みの API エンドポイント
Summarize API	テキスト要約の API エンドポイント

エンドポイントへの接続ポートは、デフォルトの場合空いているハイポートを自動的に割り振るようになっています。
指定する場合は、以下のようにアプリケーション作成時に指定してください。

ai-services application create rag-app -t rag --params "ui.port=4000,digitizeUi.port=4001,summarize.port=4002," --runtime podman

指定可能なパラメーターはテンプレート情報から確認可能です。

ai-services application templates --runtime podman

実行例

# ai-services application templates --runtime podman
Available application templates:
- rag
  Description: Retrieval Augmented Generation (RAG) application that combines a vector database, a large language model, and a retrieval mechanism to provide accurate and context-aware responses based on ingested documents.

  Supported Parameters:
	backend.port:  Host port for the OpenAI-compatible RAG service. Defaults to unexposed; assign a port to enable external access.
	digitize.port:  Host port for the DIGITIZE API. If unspecified, a random available port is assigned. Specify a port number to use a custom value.
	digitizeUi.port:  Host port for the DIGITIZE UI. If unspecified, a random available port is assigned. Specify a port number to use a custom value.
	opensearch.memoryLimit:  Sets the memory limit for the Opensearch service(Default: 8Gi). Override by passing a value with a unit suffix (e.g., Mi, Gi).
	opensearch.auth.password:  Password for OpenSearch authentication. Must be at least 15 characters and contain at least one uppercase letter, one lowercase letter, one digit, and one special character. Avoid common words, predictable patterns, or dictionary terms. Use this to override the default admin password.
	summarize.port:  Host port for the Summarize API. If unspecified, a random available port is assigned. Specify a port number to use a custom value.
	ui.port:  Host port for the RAG UI. If unspecified, a random available port is assigned. Specify a port number to use a custom value.

Pod の稼働状況確認

ai-services application ps rag-app --runtime podman

 APPLICATION NAME    POD NAME                  STATUS
───────────────────────────────────────────────────────────────────
 rag-app             rag-app--opensearch       Running (healthy)
                     rag-app--vllm-server      Running (healthy)
                     rag-app--clean-docs       Created
                     rag-app--ingest-docs      Created
                     rag-app--summarize-api    Running (healthy)
                     rag-app--digitize         Running (healthy)
                     rag-app--chat-bot         Running (healthy)

clean-docs と ingest-docs は常時起動するサービスではないため Created 状態になっています。

アプリの停止と再起動

停止

ai-services application stop rag-app --runtime podman

停止対象の Pod が一覧表示された後、確認プロンプトが出ます。
y または Yes を選択すると停止処理が始まります。

Stopping the pod: rag-app--opensearch
Successfully stopped the pod: rag-app--opensearch
...
Successfully stopped the pod: rag-app--chat-bot

再起動

ai-services application start rag-app --runtime podman

停止と同様に確認プロンプトの後、順次 Pod が起動します。
なお clean-docs と ingest-docs は常時起動のサービスではないため、start の対象外となります。

Starting the pod: rag-app--opensearch
Successfully started the pod: rag-app--opensearch
...
Successfully started the pod: rag-app--chat-bot

要約 API をテストする

Summarize API は OpenAI 互換の REST エンドポイントとして動作します。
curl でシンプルにテストできます。

curl -X POST http://xxx.xxx.xxx.xxx:xxxxx/v1/summarize \
  -H "Content-Type: application/json" \
  -d "{
    \"text\": $(jq -Rs . < input.txt),
    \"length\": 25
  }" | jq .

レスポンス例:

{
  "data": {
    "summary": "IBM launches the Sports Tech Startup Challenge, a global initiative to identify AI-driven startups revolutionizing the sports industry. The program includes regional showcases and a prize competition at Web Summit Lisbon, with a $100,000 proof-of-concept prize for the winner.",
    "original_length": 951,
    "summary_length": 77
  },
  "meta": {
    "model": "ibm-granite/granite-3.3-8b-instruct",
    "processing_time_ms": 8675,
    "input_type": "text"
  },
  "usage": {
    "input_tokens": 1652,
    "output_tokens": 133,
    "total_tokens": 1785
  }
}

今回は英語のテキストを渡しましたが、Granite が英語で要約を返してくれました。
処理時間は約 8.7 秒で、元の 951 トークンが 77 トークンに圧縮されています。

まとめ

ai-services CLI を使うことで、IBM Power + Spyre 環境への RAG アプリ構築がかなり手軽になりました。

bootstrap コマンド一発で Podman の設定と Spyre カードの検証・修正が完了する
application create でテンプレート指定するだけでイメージ・モデルのダウンロードからデプロイまで自動実行される
デプロイ後すぐに Chatbot UI・Digitize UI・要約 API などが使える状態になる
start / stop / ps / info といった CLI 操作でアプリのライフサイクルを管理できる

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up