Azure AI Search のインデクサー定義に慣れるために、以下のチュートリアルを実施しました。
ただ、変更した点などもあり、記録しておきます。
前提
REST Client 0.25.1 をVS Codeから使って実行しています。
REST Client 設定で Decode Escaped Unicode Characters を ON にするとHTTP Response Bodyの日本語がデコードされます。
また、以下の記事の1~5までのStepも前提作業です。
完成フロー
REST
インデックス作成
- オペレーションはすべて登録/更新(Upsert)にしています。
固定値定義
api_versionの値がpreviewなのが注意点です。
## Azure AI Searchのエンドポイント
@searchUrl = https://<AI Search resource name>.search.windows.net
## Azure AI SearchのAPI Key
@searchApiKey=<key>
## Blob Storageの接続文字列
@storageConnection=<connection string>
## Azure AI Servicesのエンドポイント
@cognitiveServicesUrl = https://<ai service resource name>.cognitiveservices.azure.com/
## モデルバージョン
@modelVersion = 2023-04-15
## Blob Storageのコンテナ名(画像格納)
@imageProjectionContainer=sustainable-ai-pdf-images
## Blob Storageのコンテナ名(PDF格納)
@blob_container_name=rag-doc-test
## Azure AI SearchのAPI Version
## 2025-09-01 だとインデックス定義およびスキルセット定義でエラー
@api_version=2025-08-01-preview
## データソース名
@datasource_name=doc-intelligence-multimodal-embedding-ds
## インデックス名
@index_name=doc-intelligence-multimodal-embedding-index
## スキルセット名
@skillset_name=doc-intelligence-multimodal-embedding-skillset
## インデクサー名
@indexer_name=doc-intelligence-multimodal-embedding-indexer
データソース作成
ADLS Gen2を使っているのが、公式チュートリアルと変えた点。
### データソース作成
PUT {{searchUrl}}/datasources('{{datasource_name}}')?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
{
"name": "{{datasource_name}}",
"description": "A data source to store multimodal documents",
"type": "adlsgen2",
"subtype": null,
"credentials":{
"connectionString":"{{storageConnection}}"
},
"container": {
"name": "{{blob_container_name}}",
"query": null
},
"dataChangeDetectionPolicy": null,
"dataDeletionDetectionPolicy": null,
"encryptionKey": null
}
インデックス作成
公式と以下を変更
-
offsetを削除(何も値入れ込まないため) -
locationMetadataに項目変更(元だと項目名おかしいため)
あと、ここでapi-versionが2025-09-01だと、vectorizers->aiServicesVisionParameters 箇所でエラー発生するので、previewにしています。
### インデックス作成
PUT {{searchUrl}}/indexes('{{index_name}}')?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
{
"name": "{{index_name}}",
"fields": [
{
"name": "content_id",
"type": "Edm.String",
"retrievable": true,
"key": true,
"analyzer": "keyword"
},
{
"name": "text_document_id",
"type": "Edm.String",
"searchable": false,
"filterable": true,
"retrievable": true,
"stored": true,
"sortable": false,
"facetable": false
},
{
"name": "document_title",
"type": "Edm.String",
"searchable": true
},
{
"name": "image_document_id",
"type": "Edm.String",
"filterable": true,
"retrievable": true
},
{
"name": "content_text",
"type": "Edm.String",
"searchable": true,
"retrievable": true
},
{
"name": "content_embedding",
"type": "Collection(Edm.Single)",
"dimensions": 1024,
"searchable": true,
"retrievable": true,
"vectorSearchProfile": "hnsw"
},
{
"name": "content_path",
"type": "Edm.String",
"searchable": false,
"retrievable": true
},
{
"name": "locationMetadata",
"type": "Edm.ComplexType",
"fields": [
{
"name": "boundingPolygons",
"type": "Edm.String",
"searchable": false,
"retrievable": true,
"filterable": false,
"sortable": false,
"facetable": false
},
{
"name": "pageNumber",
"type": "Edm.Int32",
"searchable": false,
"retrievable": true
}
]
}
],
"vectorSearch": {
"profiles": [
{
"name": "hnsw",
"algorithm": "defaulthnsw",
"vectorizer": "demo-vectorizer"
}
],
"algorithms": [
{
"name": "defaulthnsw",
"kind": "hnsw",
"hnswParameters": {
"m": 4,
"efConstruction": 400,
"metric": "cosine"
}
}
],
"vectorizers": [
{
"name": "demo-vectorizer",
"kind": "aiServicesVision",
"aiServicesVisionParameters": {
"resourceUri": "{{cognitiveServicesUrl}}",
"authIdentity": null,
"modelVersion": "{{modelVersion}}"
}
}
]
},
"semantic": {
"defaultConfiguration": "semanticconfig",
"configurations": [
{
"name": "semanticconfig",
"prioritizedFields": {
"titleField": {
"fieldName": "document_title"
},
"prioritizedContentFields": [
],
"prioritizedKeywordsFields": []
}
}
]
}
}
スキルセット登録
ここのスキルセット要件でリージョンが限定されます。具体的にはAzure AI Serviceが使えて、かつAzure AI Serchと同じリージョン
### Create a skillset
PUT {{searchUrl}}/skillsets/{{skillset_name}}?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
{
"name": "",
"description": "A sample skillset for multimodal using multimodal embedding",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
"name": "document-layout-skill",
"description": "Document Intelligence skill for document cracking",
"context": "/document",
"outputMode": "oneToMany",
"outputFormat": "text",
"extractionOptions": ["images", "locationMetadata"],
"chunkingProperties": {
"unit": "characters",
"maximumLength": 2000,
"overlapLength": 200
},
"inputs": [
{
"name": "file_data",
"source": "/document/file_data"
}
],
"outputs": [
{
"name": "text_sections",
"targetName": "text_sections"
},
{
"name": "normalized_images",
"targetName": "normalized_images"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill",
"name": "text-embedding-skill",
"description": "Vision Vectorization skill for text",
"context": "/document/text_sections/*",
"modelVersion": "2023-04-15",
"inputs": [
{
"name": "text",
"source": "/document/text_sections/*/content"
}
],
"outputs": [
{
"name": "vector",
"targetName": "text_vector"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill",
"name": "image-embedding-skill",
"description": "Vision Vectorization skill for images",
"context": "/document/normalized_images/*",
"modelVersion": "2023-04-15",
"inputs": [
{
"name": "image",
"source": "/document/normalized_images/*"
}
],
"outputs": [
{
"name": "vector",
"targetName": "image_vector"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"name": "shaper-skill",
"context": "/document/normalized_images/*",
"inputs": [
{
"name": "normalized_images",
"source": "/document/normalized_images/*",
"inputs": []
},
{
"name": "imagePath",
"source": "='my_container_name/'+$(/document/normalized_images/*/imagePath)",
"inputs": []
}
],
"outputs": [
{
"name": "output",
"targetName": "new_normalized_images"
}
]
}
],
"indexProjections": {
"selectors": [
{
"targetIndexName": "{{index_name}}",
"parentKeyFieldName": "text_document_id",
"sourceContext": "/document/text_sections/*",
"mappings": [
{
"name": "content_embedding",
"source": "/document/text_sections/*/text_vector"
},
{
"name": "content_text",
"source": "/document/text_sections/*/content"
},
{
"name": "locationMetadata",
"source": "/document/text_sections/*/locationMetadata"
},
{
"name": "document_title",
"source": "/document/document_title"
}
]
},
{
"targetIndexName": "{{index_name}}",
"parentKeyFieldName": "image_document_id",
"sourceContext": "/document/normalized_images/*",
"mappings": [
{
"name": "content_embedding",
"source": "/document/normalized_images/*/image_vector"
},
{
"name": "content_path",
"source": "/document/normalized_images/*/new_normalized_images/imagePath"
},
{
"name": "document_title",
"source": "/document/document_title"
},
{
"name": "locationMetadata",
"source": "/document/normalized_images/*/locationMetadata"
}
]
}
],
"parameters": {
"projectionMode": "skipIndexingParentDocuments"
}
},
"cognitiveServices": {
"@odata.type": "#Microsoft.Azure.Search.AIServicesByIdentity",
"subdomainUrl": "{{cognitiveServicesUrl}}",
"identity": null
},
"knowledgeStore": {
"storageConnectionString": "{{storageConnection}}",
"identity": null,
"projections": [
{
"files": [
{
"storageContainer": "{{imageProjectionContainer}}",
"source": "/document/normalized_images/*"
}
]
}
]
}
}
AI 使って項目のフローをマーメイド記法で書きました。少し不足はありますが、正しいです。
ただ、Qiitaで見ると小さいので以下のツールなどを使ってみてください。
インデクサー作成
### インデクサー作成
PUT {{searchUrl}}/indexers/{{indexer_name}}?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
{
"name": "{{indexer_name}}",
"dataSourceName": "{{datasource_name}}",
"targetIndexName": "{{index_name}}",
"skillsetName": "{{skillset_name}}",
"parameters": {
"maxFailedItems": -1,
"maxFailedItemsPerBatch": 0,
"batchSize": 1,
"configuration": {
"allowSkillsetToReadFileData": true
}
},
"fieldMappings": [
{
"sourceFieldName": "metadata_storage_name",
"targetFieldName": "document_title"
}
],
"outputFieldMappings": []
}
検索
フル検索
### Query the index
POST {{searchUrl}}/indexes/{{index_name}}/docs/search?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
{
"search": "*",
"count": true
}
画像以外を検索
### Query for only images
POST {{searchUrl}}/indexes/{{index_name}}/docs/search?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
{
"search": "*",
"count": true,
"filter": "image_document_id ne null"
}
項目指定して検索
### Query
POST {{searchUrl}}/indexes/{{index_name}}/docs/search?api-version={{api_version}}
Content-Type: application/json
api-key: {{searchApiKey}}
{
"search": "保険",
"count": true,
"select": "content_id, document_title, content_text, content_path"
}
