0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

AI Search: マルチモーダルインデックスの作成(REST API)

Posted at

Azure AI Search のインデクサー定義に慣れるために、以下のチュートリアルを実施しました。

ただ、変更した点などもあり、記録しておきます。

前提

REST Client 0.25.1 をVS Codeから使って実行しています。

REST Client 設定で Decode Escaped Unicode Characters を ON にするとHTTP Response Bodyの日本語がデコードされます。

また、以下の記事の1~5までのStepも前提作業です。

完成フロー

デバッガセッションで見るとこんなフローです。
image.png

REST

インデックス作成

  • オペレーションはすべて登録/更新(Upsert)にしています。

固定値定義

api_versionの値がpreviewなのが注意点です。

## Azure AI Searchのエンドポイント 
@searchUrl = https://<AI Search resource name>.search.windows.net

## Azure AI SearchのAPI Key
@searchApiKey=<key>

## Blob Storageの接続文字列
@storageConnection=<connection string>

## Azure AI Servicesのエンドポイント
@cognitiveServicesUrl = https://<ai service resource name>.cognitiveservices.azure.com/

## モデルバージョン
@modelVersion = 2023-04-15

## Blob Storageのコンテナ名(画像格納)
@imageProjectionContainer=sustainable-ai-pdf-images

## Blob Storageのコンテナ名(PDF格納)
@blob_container_name=rag-doc-test

## Azure AI SearchのAPI Version
## 2025-09-01 だとインデックス定義およびスキルセット定義でエラー
@api_version=2025-08-01-preview

## データソース名
@datasource_name=doc-intelligence-multimodal-embedding-ds

## インデックス名
@index_name=doc-intelligence-multimodal-embedding-index

## スキルセット名
@skillset_name=doc-intelligence-multimodal-embedding-skillset

## インデクサー名
@indexer_name=doc-intelligence-multimodal-embedding-indexer

データソース作成

ADLS Gen2を使っているのが、公式チュートリアルと変えた点。

### データソース作成
PUT {{searchUrl}}/datasources('{{datasource_name}}')?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}

  {
    "name": "{{datasource_name}}",
    "description": "A data source to store multimodal documents",
    "type": "adlsgen2",
    "subtype": null,
    "credentials":{
      "connectionString":"{{storageConnection}}"
    },
    "container": {
      "name": "{{blob_container_name}}",
      "query": null
    },
    "dataChangeDetectionPolicy": null,
    "dataDeletionDetectionPolicy": null,
    "encryptionKey": null
  }

インデックス作成

公式と以下を変更

  • offsetを削除(何も値入れ込まないため)
  • locationMetadataに項目変更(元だと項目名おかしいため)

あと、ここでapi-versionが2025-09-01だと、vectorizers->aiServicesVisionParameters 箇所でエラー発生するので、previewにしています。

### インデックス作成
PUT {{searchUrl}}/indexes('{{index_name}}')?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}

  {
    "name": "{{index_name}}",
    "fields": [
        {
            "name": "content_id",
            "type": "Edm.String",
            "retrievable": true,
            "key": true,
            "analyzer": "keyword"
        },
        {
            "name": "text_document_id",
            "type": "Edm.String",
            "searchable": false,
            "filterable": true,
            "retrievable": true,
            "stored": true,
            "sortable": false,
            "facetable": false
        },          
        {
            "name": "document_title",
            "type": "Edm.String",
            "searchable": true
        },
        {
            "name": "image_document_id",
            "type": "Edm.String",
            "filterable": true,
            "retrievable": true
        },
        {
            "name": "content_text",
            "type": "Edm.String",
            "searchable": true,
            "retrievable": true
        },
        {
            "name": "content_embedding",
            "type": "Collection(Edm.Single)",
            "dimensions": 1024,
            "searchable": true,
            "retrievable": true,
            "vectorSearchProfile": "hnsw"
        },
        {
            "name": "content_path",
            "type": "Edm.String",
            "searchable": false,
            "retrievable": true
        },
        {
            "name": "locationMetadata",
            "type": "Edm.ComplexType",
            "fields": [
                {
                "name": "boundingPolygons",
                "type": "Edm.String",
                "searchable": false,
                "retrievable": true,
                "filterable": false,
                "sortable": false,
                "facetable": false
                },
                {
                "name": "pageNumber",
                "type": "Edm.Int32",
                "searchable": false,
                "retrievable": true
                }
            ]
        }         
    ],
    "vectorSearch": {
        "profiles": [
            {
                "name": "hnsw",
                "algorithm": "defaulthnsw",
                "vectorizer": "demo-vectorizer"
            }
        ],
        "algorithms": [
            {
                "name": "defaulthnsw",
                "kind": "hnsw",
                "hnswParameters": {
                    "m": 4,
                    "efConstruction": 400,
                    "metric": "cosine"
                }
            }
        ],
        "vectorizers": [
            {
                "name": "demo-vectorizer",
                "kind": "aiServicesVision",
                "aiServicesVisionParameters": {
                    "resourceUri": "{{cognitiveServicesUrl}}",
                    "authIdentity": null,
                    "modelVersion": "{{modelVersion}}"
                }
            }
        ]     
    },
    "semantic": {
        "defaultConfiguration": "semanticconfig",
        "configurations": [
            {
                "name": "semanticconfig",
                "prioritizedFields": {
                    "titleField": {
                        "fieldName": "document_title"
                    },
                    "prioritizedContentFields": [
                    ],
                    "prioritizedKeywordsFields": []
                }
            }
        ]
    }
  }

スキルセット登録

ここのスキルセット要件でリージョンが限定されます。具体的にはAzure AI Serviceが使えて、かつAzure AI Serchと同じリージョン

### Create a skillset
PUT {{searchUrl}}/skillsets/{{skillset_name}}?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}

{
  "name": "",
  "description": "A sample skillset for multimodal using multimodal embedding",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
      "name": "document-layout-skill",
      "description": "Document Intelligence skill for document cracking",
      "context": "/document",
      "outputMode": "oneToMany",
      "outputFormat": "text",
      "extractionOptions": ["images", "locationMetadata"],
      "chunkingProperties": {     
          "unit": "characters",
          "maximumLength": 2000, 
          "overlapLength": 200
      },
      "inputs": [
        {
          "name": "file_data",
          "source": "/document/file_data"
        }
      ],
      "outputs": [
        { 
          "name": "text_sections", 
          "targetName": "text_sections" 
        }, 
        { 
          "name": "normalized_images", 
          "targetName": "normalized_images" 
        } 
      ]
    },
    { 
      "@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill", 
      "name": "text-embedding-skill",
      "description": "Vision Vectorization skill for text",
      "context": "/document/text_sections/*", 
      "modelVersion": "2023-04-15", 
      "inputs": [ 
        { 
          "name": "text", 
          "source": "/document/text_sections/*/content" 
        } 
      ], 
      "outputs": [ 
        { 
          "name": "vector",
          "targetName": "text_vector"
        } 
      ] 
    },    
    { 
      "@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill", 
      "name": "image-embedding-skill",
      "description": "Vision Vectorization skill for images",
      "context": "/document/normalized_images/*", 
      "modelVersion": "2023-04-15", 
      "inputs": [ 
        { 
          "name": "image", 
          "source": "/document/normalized_images/*" 
        } 
      ], 
      "outputs": [ 
        { 
          "name": "vector",
          "targetName": "image_vector"
        } 
      ] 
    },
    {
      "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
      "name": "shaper-skill",
      "context": "/document/normalized_images/*",
      "inputs": [
        {
          "name": "normalized_images",
          "source": "/document/normalized_images/*",
          "inputs": []
        },
        {
          "name": "imagePath",
          "source": "='my_container_name/'+$(/document/normalized_images/*/imagePath)",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "output",
          "targetName": "new_normalized_images"
        }
      ]
    }      
  ], 
   "indexProjections": {
      "selectors": [
        {
          "targetIndexName": "{{index_name}}",
          "parentKeyFieldName": "text_document_id",
          "sourceContext": "/document/text_sections/*",
          "mappings": [    
            {
            "name": "content_embedding",
            "source": "/document/text_sections/*/text_vector"
            },                      
            {
              "name": "content_text",
              "source": "/document/text_sections/*/content"
            },
            {
              "name": "locationMetadata",
              "source": "/document/text_sections/*/locationMetadata"
            },                
            {
              "name": "document_title",
              "source": "/document/document_title"
            }   
          ]
        },        
        {
          "targetIndexName": "{{index_name}}",
          "parentKeyFieldName": "image_document_id",
          "sourceContext": "/document/normalized_images/*",
          "mappings": [    
            {
            "name": "content_embedding",
            "source": "/document/normalized_images/*/image_vector"
            },                                           
            {
              "name": "content_path",
              "source": "/document/normalized_images/*/new_normalized_images/imagePath"
            },                    
            {
              "name": "document_title",
              "source": "/document/document_title"
            },
            {
              "name": "locationMetadata",
              "source": "/document/normalized_images/*/locationMetadata"
            }             
          ]
        }
      ],
      "parameters": {
        "projectionMode": "skipIndexingParentDocuments"
      }
  },
  "cognitiveServices": {
    "@odata.type": "#Microsoft.Azure.Search.AIServicesByIdentity",
    "subdomainUrl": "{{cognitiveServicesUrl}}",
    "identity": null
  },
  "knowledgeStore": {
    "storageConnectionString": "{{storageConnection}}",
    "identity": null,
    "projections": [
      {
        "files": [
          {
            "storageContainer": "{{imageProjectionContainer}}",
            "source": "/document/normalized_images/*"
          }
        ]
      }
    ]
  }
}

AI 使って項目のフローをマーメイド記法で書きました。少し不足はありますが、正しいです。
ただ、Qiitaで見ると小さいので以下のツールなどを使ってみてください。

インデクサー作成

### インデクサー作成
PUT {{searchUrl}}/indexers/{{indexer_name}}?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}

{
  "name": "{{indexer_name}}",
  "dataSourceName": "{{datasource_name}}",
  "targetIndexName": "{{index_name}}",
  "skillsetName": "{{skillset_name}}",
  "parameters": {
    "maxFailedItems": -1,
    "maxFailedItemsPerBatch": 0,
    "batchSize": 1,
    "configuration": {
      "allowSkillsetToReadFileData": true
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "document_title"
    }
  ],
  "outputFieldMappings": []
}

検索

フル検索

### Query the index
POST {{searchUrl}}/indexes/{{index_name}}/docs/search?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
  
{
  "search": "*",
  "count": true
}

画像以外を検索

### Query for only images
POST {{searchUrl}}/indexes/{{index_name}}/docs/search?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
  
  {
    "search": "*",
    "count": true,
    "filter": "image_document_id ne null"
  }

項目指定して検索

### Query 
POST {{searchUrl}}/indexes/{{index_name}}/docs/search?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
  
  {
    "search": "保険",
    "count": true,
    "select": "content_id, document_title, content_text, content_path"
  }
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?