AI Search: マルチモーダルインデックスの作成(REST API)

Posted at 2025-10-24

Azure AI Search のインデクサー定義に慣れるために、以下のチュートリアルを実施しました。

ただ、変更した点などもあり、記録しておきます。

前提

REST Client 0.25.1 をVS Codeから使って実行しています。

REST Client 設定で Decode Escaped Unicode Characters を ON にするとHTTP Response Bodyの日本語がデコードされます。

また、以下の記事の1～5までのStepも前提作業です。

完成フロー

デバッガセッションで見るとこんなフローです。

REST

インデックス作成

オペレーションはすべて登録/更新(Upsert)にしています。

固定値定義

api_versionの値がpreviewなのが注意点です。

## Azure AI Searchのエンドポイント 
@searchUrl = https://<AI Search resource name>.search.windows.net

## Azure AI SearchのAPI Key
@searchApiKey=<key>

## Blob Storageの接続文字列
@storageConnection=<connection string>

## Azure AI Servicesのエンドポイント
@cognitiveServicesUrl = https://<ai service resource name>.cognitiveservices.azure.com/

## モデルバージョン
@modelVersion = 2023-04-15

## Blob Storageのコンテナ名(画像格納)
@imageProjectionContainer=sustainable-ai-pdf-images

## Blob Storageのコンテナ名(PDF格納)
@blob_container_name=rag-doc-test

## Azure AI SearchのAPI Version
## 2025-09-01 だとインデックス定義およびスキルセット定義でエラー
@api_version=2025-08-01-preview

## データソース名
@datasource_name=doc-intelligence-multimodal-embedding-ds

## インデックス名
@index_name=doc-intelligence-multimodal-embedding-index

## スキルセット名
@skillset_name=doc-intelligence-multimodal-embedding-skillset

## インデクサー名
@indexer_name=doc-intelligence-multimodal-embedding-indexer

データソース作成

ADLS Gen2を使っているのが、公式チュートリアルと変えた点。

### データソース作成
PUT {{searchUrl}}/datasources('{{datasource_name}}')?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}

  {
    "name": "{{datasource_name}}",
    "description": "A data source to store multimodal documents",
    "type": "adlsgen2",
    "subtype": null,
    "credentials":{
      "connectionString":"{{storageConnection}}"
    },
    "container": {
      "name": "{{blob_container_name}}",
      "query": null
    },
    "dataChangeDetectionPolicy": null,
    "dataDeletionDetectionPolicy": null,
    "encryptionKey": null
  }

インデックス作成

公式と以下を変更

offsetを削除(何も値入れ込まないため)
locationMetadataに項目変更(元だと項目名おかしいため)

あと、ここでapi-versionが2025-09-01だと、vectorizers->aiServicesVisionParameters 箇所でエラー発生するので、previewにしています。

### インデックス作成
PUT {{searchUrl}}/indexes('{{index_name}}')?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}

  {
    "name": "{{index_name}}",
    "fields": [
        {
            "name": "content_id",
            "type": "Edm.String",
            "retrievable": true,
            "key": true,
            "analyzer": "keyword"
        },
        {
            "name": "text_document_id",
            "type": "Edm.String",
            "searchable": false,
            "filterable": true,
            "retrievable": true,
            "stored": true,
            "sortable": false,
            "facetable": false
        },          
        {
            "name": "document_title",
            "type": "Edm.String",
            "searchable": true
        },
        {
            "name": "image_document_id",
            "type": "Edm.String",
            "filterable": true,
            "retrievable": true
        },
        {
            "name": "content_text",
            "type": "Edm.String",
            "searchable": true,
            "retrievable": true
        },
        {
            "name": "content_embedding",
            "type": "Collection(Edm.Single)",
            "dimensions": 1024,
            "searchable": true,
            "retrievable": true,
            "vectorSearchProfile": "hnsw"
        },
        {
            "name": "content_path",
            "type": "Edm.String",
            "searchable": false,
            "retrievable": true
        },
        {
            "name": "locationMetadata",
            "type": "Edm.ComplexType",
            "fields": [
                {
                "name": "boundingPolygons",
                "type": "Edm.String",
                "searchable": false,
                "retrievable": true,
                "filterable": false,
                "sortable": false,
                "facetable": false
                },
                {
                "name": "pageNumber",
                "type": "Edm.Int32",
                "searchable": false,
                "retrievable": true
                }
            ]
        }         
    ],
    "vectorSearch": {
        "profiles": [
            {
                "name": "hnsw",
                "algorithm": "defaulthnsw",
                "vectorizer": "demo-vectorizer"
            }
        ],
        "algorithms": [
            {
                "name": "defaulthnsw",
                "kind": "hnsw",
                "hnswParameters": {
                    "m": 4,
                    "efConstruction": 400,
                    "metric": "cosine"
                }
            }
        ],
        "vectorizers": [
            {
                "name": "demo-vectorizer",
                "kind": "aiServicesVision",
                "aiServicesVisionParameters": {
                    "resourceUri": "{{cognitiveServicesUrl}}",
                    "authIdentity": null,
                    "modelVersion": "{{modelVersion}}"
                }
            }
        ]     
    },
    "semantic": {
        "defaultConfiguration": "semanticconfig",
        "configurations": [
            {
                "name": "semanticconfig",
                "prioritizedFields": {
                    "titleField": {
                        "fieldName": "document_title"
                    },
                    "prioritizedContentFields": [
                    ],
                    "prioritizedKeywordsFields": []
                }
            }
        ]
    }
  }

スキルセット登録

ここのスキルセット要件でリージョンが限定されます。具体的にはAzure AI Serviceが使えて、かつAzure AI Serchと同じリージョン

### Create a skillset
PUT {{searchUrl}}/skillsets/{{skillset_name}}?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}

{
  "name": "",
  "description": "A sample skillset for multimodal using multimodal embedding",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
      "name": "document-layout-skill",
      "description": "Document Intelligence skill for document cracking",
      "context": "/document",
      "outputMode": "oneToMany",
      "outputFormat": "text",
      "extractionOptions": ["images", "locationMetadata"],
      "chunkingProperties": {     
          "unit": "characters",
          "maximumLength": 2000, 
          "overlapLength": 200
      },
      "inputs": [
        {
          "name": "file_data",
          "source": "/document/file_data"
        }
      ],
      "outputs": [
        { 
          "name": "text_sections", 
          "targetName": "text_sections" 
        }, 
        { 
          "name": "normalized_images", 
          "targetName": "normalized_images" 
        } 
      ]
    },
    { 
      "@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill", 
      "name": "text-embedding-skill",
      "description": "Vision Vectorization skill for text",
      "context": "/document/text_sections/*", 
      "modelVersion": "2023-04-15", 
      "inputs": [ 
        { 
          "name": "text", 
          "source": "/document/text_sections/*/content" 
        } 
      ], 
      "outputs": [ 
        { 
          "name": "vector",
          "targetName": "text_vector"
        } 
      ] 
    },    
    { 
      "@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill", 
      "name": "image-embedding-skill",
      "description": "Vision Vectorization skill for images",
      "context": "/document/normalized_images/*", 
      "modelVersion": "2023-04-15", 
      "inputs": [ 
        { 
          "name": "image", 
          "source": "/document/normalized_images/*" 
        } 
      ], 
      "outputs": [ 
        { 
          "name": "vector",
          "targetName": "image_vector"
        } 
      ] 
    },
    {
      "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
      "name": "shaper-skill",
      "context": "/document/normalized_images/*",
      "inputs": [
        {
          "name": "normalized_images",
          "source": "/document/normalized_images/*",
          "inputs": []
        },
        {
          "name": "imagePath",
          "source": "='my_container_name/'+$(/document/normalized_images/*/imagePath)",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "output",
          "targetName": "new_normalized_images"
        }
      ]
    }      
  ], 
   "indexProjections": {
      "selectors": [
        {
          "targetIndexName": "{{index_name}}",
          "parentKeyFieldName": "text_document_id",
          "sourceContext": "/document/text_sections/*",
          "mappings": [    
            {
            "name": "content_embedding",
            "source": "/document/text_sections/*/text_vector"
            },                      
            {
              "name": "content_text",
              "source": "/document/text_sections/*/content"
            },
            {
              "name": "locationMetadata",
              "source": "/document/text_sections/*/locationMetadata"
            },                
            {
              "name": "document_title",
              "source": "/document/document_title"
            }   
          ]
        },        
        {
          "targetIndexName": "{{index_name}}",
          "parentKeyFieldName": "image_document_id",
          "sourceContext": "/document/normalized_images/*",
          "mappings": [    
            {
            "name": "content_embedding",
            "source": "/document/normalized_images/*/image_vector"
            },                                           
            {
              "name": "content_path",
              "source": "/document/normalized_images/*/new_normalized_images/imagePath"
            },                    
            {
              "name": "document_title",
              "source": "/document/document_title"
            },
            {
              "name": "locationMetadata",
              "source": "/document/normalized_images/*/locationMetadata"
            }             
          ]
        }
      ],
      "parameters": {
        "projectionMode": "skipIndexingParentDocuments"
      }
  },
  "cognitiveServices": {
    "@odata.type": "#Microsoft.Azure.Search.AIServicesByIdentity",
    "subdomainUrl": "{{cognitiveServicesUrl}}",
    "identity": null
  },
  "knowledgeStore": {
    "storageConnectionString": "{{storageConnection}}",
    "identity": null,
    "projections": [
      {
        "files": [
          {
            "storageContainer": "{{imageProjectionContainer}}",
            "source": "/document/normalized_images/*"
          }
        ]
      }
    ]
  }
}

AI 使って項目のフローをマーメイド記法で書きました。少し不足はありますが、正しいです。
ただ、Qiitaで見ると小さいので以下のツールなどを使ってみてください。

インデクサー作成

### インデクサー作成
PUT {{searchUrl}}/indexers/{{indexer_name}}?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}

{
  "name": "{{indexer_name}}",
  "dataSourceName": "{{datasource_name}}",
  "targetIndexName": "{{index_name}}",
  "skillsetName": "{{skillset_name}}",
  "parameters": {
    "maxFailedItems": -1,
    "maxFailedItemsPerBatch": 0,
    "batchSize": 1,
    "configuration": {
      "allowSkillsetToReadFileData": true
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "document_title"
    }
  ],
  "outputFieldMappings": []
}

検索

フル検索

### Query the index
POST {{searchUrl}}/indexes/{{index_name}}/docs/search?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
  
{
  "search": "*",
  "count": true
}

画像以外を検索

### Query for only images
POST {{searchUrl}}/indexes/{{index_name}}/docs/search?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
  
  {
    "search": "*",
    "count": true,
    "filter": "image_document_id ne null"
  }

項目指定して検索

### Query 
POST {{searchUrl}}/indexes/{{index_name}}/docs/search?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
  
  {
    "search": "保険",
    "count": true,
    "select": "content_id, document_title, content_text, content_path"
  }

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up