背景・目的
前回、ベクトル検索について整理しました。今回はOpenSearch Service Serverlessを使用して試してみます。
まとめ
下記に特徴を整理します。
特徴 | 説明 |
---|---|
settings | インデックスの設定(KNN検索を有効にするなど) index.knn:trueは、KNN検索を有効にしている |
mappings | フィールドの定義(型やベクトルの次元など) 文章のタイトルなどを保存するフィールド。全文検索(フリーワード検索)に対応している text型は、検索時にトークンに分割されて検索可能になる |
embedding | ベクトル検索用のフィールド |
概要
CloudFormation
CloudFormationで、OpneSearch Service Serverlessを作成します。
Collection
Type: AWS::OpenSearchServerless::Collection
Properties:
Description: String
Name: String
StandbyReplicas: String
Tags:
- Tag
Type: String
- StandbyReplicas
- レプリカを用意するか?今回は検証なのでDISALBLEにします
- Type
- コレクションのタイプ
- 下記から選びます
- SEARCH
- TIMESERIES
- VECTORSEARCH
SecurityPolicy
networkとencryptionの2つがある
Type: AWS::OpenSearchServerless::SecurityPolicy
Properties:
Description: String
Name: String
Policy: String
Type: String
- Type
- セキュリティ ポリシーのタイプ
- 下記から選びます
- encryption
- network
- Policy
- Typeに応じた内容を書きます
encryption
Description: OpenSearch Serverless encryption policy template
Resources:
TestSecurityPolicy:
Type: 'AWS::OpenSearchServerless::SecurityPolicy'
Properties:
Name: logs-encryption-policy
Type: encryption
Description: Encryption policy for test collections
Policy: >-
{"Rules":[{"ResourceType":"collection","Resource":["collection/logs*"]}],"AWSOwnedKey":true}
- AWSOwnedKey:trueの場合、AWS所有のKMSキーを使う
network
Description: OpenSearch Serverless network policy template
Resources:
SecurityPolicy:
Type: 'AWS::OpenSearchServerless::SecurityPolicy'
Properties:
Name: logs-network-policy
Type: network
Description: Network policy for test collections
Policy >-
[{"Rules":[{"ResourceType":"collection","Resource":["collection/logs*"]},
{"ResourceType":"dashboard","Resource":["collection/logs*"]}],"AllowFromPublic":true}]
- AllowFromPublic:Public アクセス可能。falseの場合にVPCアクセスのみとなるため、SourceVPCeを指定する
AccessPolicy
Type: AWS::OpenSearchServerless::AccessPolicy
Properties:
Description: String
Name: String
Policy: String
Type: String
- Typeにはdataを指定
実践
前提
環境
OS:MacOS
エディタ:Cursor
言語:Python
IAMロール
下記のIAMポリシーをSSOアカウントのインラインポリシーに設定しておきます
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"aoss:CreateCollection",
"aoss:ListCollections",
"aoss:BatchGetCollection",
"aoss:DeleteCollection",
"aoss:CreateAccessPolicy",
"aoss:ListAccessPolicies",
"aoss:UpdateAccessPolicy",
"aoss:CreateSecurityPolicy",
"aoss:GetSecurityPolicy",
"aoss:UpdateSecurityPolicy",
"iam:ListUsers",
"iam:ListRoles"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
事前準備
- リポジトリを作成します
% gh repo create opensearch-test --public --clone ✓ Created repository XXXXX/opensearch-test on GitHub https://github.com/XXXXX/opensearch-test % % gh repo view XXXXX/opensearch-test XXXXX/opensearch-test No description provided This repository does not have a README View this repository on GitHub: https://github.com/XXXXX/opensearch-test %
- フォルダをワークスペースに追加で、作成したフォルダを指定します
CloudFormation テンプレートで OpenSearch Serverlessを作成
- templatesフォルダを作成します
- templates/opensearch-serveless.ymlを作成し、下記のコードを書きます
AWSTemplateFormatVersion: '2010-09-09' Description: 'AWS OpenSearch Serverless Deployment Template' Parameters: DataAccessPrincipal: Type: String Resources: VectorSearchCollection: Type: AWS::OpenSearchServerless::Collection Properties: Name: vector-search-collection Description: 'OpenSearch Serverless collection for vector search' StandbyReplicas: DISABLED Type: VECTORSEARCH DependsOn: - EncryptionPolicy - NetworkPolicy EncryptionPolicy: Type: AWS::OpenSearchServerless::SecurityPolicy Properties: Name: encryption-policy Description: 'Encryption policy for OpenSearch Serverless' Type: encryption Policy: >- { "Rules": [ { "ResourceType": "collection", "Resource": ["collection/vector-search-collection"] } ], "AWSOwnedKey": true } NetworkPolicy: Type: AWS::OpenSearchServerless::SecurityPolicy Properties: Name: network-policy Description: 'Network policy for OpenSearch Serverless' Type: network Policy: >- [ { "Rules": [ { "ResourceType": "collection", "Resource": ["collection/vector-search-collection"] }, { "ResourceType": "dashboard", "Resource": ["collection/vector-search-collection"] } ], "AllowFromPublic": true } ] # ベクター検索用アクセスポリシー VectorSearchAccessPolicy: Type: AWS::OpenSearchServerless::AccessPolicy Properties: Name: vector-search-access-policy Description: 'Access policy for vector search collection' Type: data Policy: !Sub >- [ { "Description": "Access for cfn user", "Rules": [ { "ResourceType": "index", "Resource": ["index/*/*"], "Permission": ["aoss:*"] }, { "ResourceType": "collection", "Resource": ["collection/vector-search-collection"], "Permission": ["aoss:*"] } ], "Principal": ["${DataAccessPrincipal}"] } ] DependsOn: - VectorSearchCollection
makeファイルを作成
- プロジェクト直下に Makefileを作成し、下記のコードを書きます
.PHONY: deploy validate destroy # AWS profile and region settings AWS_PROFILE ?= default AWS_REGION ?= ap-northeast-1 STACK_NAME ?= opensearch-serverless-stack DATA_ACCESS_PRINCIPAL ?= $(shell aws sts get-caller-identity --query 'Account' --output text --profile $(AWS_PROFILE) --region $(AWS_REGION)) # CloudFormation template path TEMPLATE_FILE = templates/opensearch-serveless.yml validate: @echo "Validating CloudFormation template..." aws cloudformation validate-template \ --template-body file://$(TEMPLATE_FILE) \ --profile $(AWS_PROFILE) \ --region $(AWS_REGION) deploy: @echo "Deploying OpenSearch Serverless stack..." @echo "Using DataAccessPrincipal: $(DATA_ACCESS_PRINCIPAL)" aws cloudformation deploy \ --template-file $(TEMPLATE_FILE) \ --stack-name $(STACK_NAME) \ --capabilities CAPABILITY_IAM \ --profile $(AWS_PROFILE) \ --region $(AWS_REGION) \ --parameter-overrides DataAccessPrincipal=$(DATA_ACCESS_PRINCIPAL) destroy: @echo "Destroying OpenSearch Serverless stack..." aws cloudformation delete-stack \ --stack-name $(STACK_NAME) \ --profile $(AWS_PROFILE) \ --region $(AWS_REGION) help: @echo "Available commands:" @echo " make validate - Validate the CloudFormation template" @echo " make deploy - Deploy the OpenSearch Serverless stack" @echo " make destroy - Delete the OpenSearch Serverless stack" @echo " make help - Show this help message" @echo "" @echo "Environment variables:" @echo " AWS_PROFILE - AWS profile to use (default: default)" @echo " AWS_REGION - AWS region to deploy to (default: ap-northeast-1)" @echo " STACK_NAME - CloudFormation stack name (default: opensearch-serverless-stack)" @echo " DATA_ACCESS_PRINCIPAL - AWS account ID (default: current account)"
リソースの作成
-
下記のコマンドを実行します
% make deploy DATA_ACCESS_PRINCIPAL=arn:aws:iam::XXXXX:role/aws-reserved/sso.amazonaws.com/ap-northeast-1/AWSReservedSSO_XXXXX Deploying OpenSearch Serverless stack... Using DataAccessPrincipal: arn:aws:iam::XXXXX:role/aws-reserved/sso.amazonaws.com/ap-northeast-1/AWSReservedSSO_XXXXX aws cloudformation deploy \ --template-file templates/opensearch-serveless.yml \ --stack-name opensearch-serverless-stack \ --capabilities CAPABILITY_IAM \ --profile XXXXX \ --region ap-northeast-1 \ --parameter-overrides DataAccessPrincipal=arn:aws:iam::XXXXX:role/aws-reserved/sso.amazonaws.com/ap-northeast-1/AWSReservedSSO_XXXXX Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - opensearch-serverless-stack %
データの登録〜検索
OpneSearchのDev toolsから操作します
-
OpenSearch Serverlessに移動
-
ダッシュボードから作成したCollectionをクリックします
ベクトル検索用インデックスの作成
-
下記を貼り付けて実行します
PUT my-vector-index { "settings": { "index.knn": true }, "mappings": { "properties": { "title": { "type": "text" }, "description": { "type": "text" }, "embedding": { "type": "knn_vector", "dimension": 4, "method": { "name": "hnsw", "space_type": "l2", "engine": "faiss" } } } } }
-
PUT my-vector-index
は、my-vector-index というインデックス名 -
settings
は、インデックスの設定(KNN検索を有効にするなど)- index.knn:trueは、KNN検索を有効にしている
-
mappings
は、フィールドの定義(型やベクトルの次元など)- 文章のタイトルなどを保存するフィールド。全文検索(フリーワード検索)に対応している
- text型は、検索時にトークンに分割されて検索可能になる
- embedding
- type : knn_vector
- ベクトル検索専用
- dimension: 4
- このベクトルは4次元
- 登録するデータも同じ長さにする必要がある
- method
- ベクトル検索のアルゴリズム設定
- name:hnsw
- HNSW(Hierarchical Navigable Small World)
- space_type
- l2、L2ノルム
- engine:faiss
- Facebooks製ベクトル検索エンジン
- type : knn_vector
- これにより、タイトルのキーワード検索
- ベクトルによる意味の近さ
-
-
結果が返されました。acknowledgedがtrueで返されました
{ "acknowledged": true, "shards_acknowledged": true, "index": "my-vector-index" }
データ登録
-
Dev Toolsでデータを登録します(最初の数件を表示していますが100件登録しました)
POST _bulk {"index": {"_index": "my-vector-index"}} {"title": "Distributed Systems in Action", "description": "Distributed Systems in Action - A comprehensive and practical book in the field of software and technology.", "embedding": [0.1038, 0.0859, 0.0608, 0.9006]} {"index": {"_index": "my-vector-index"}} {"title": "Java Cookbook", "description": "Java Cookbook - A comprehensive and practical book in the field of software and technology.", "embedding": [0.0535, 0.1386, 0.0729, 0.9075]} ・・・
-
{ "took": 330, "errors": false, "items": [ { "index": { "_index": "my-vector-index", "_id": "1%3A0%3ABcLMCZYBtRq-o6tNRFQB", "_version": 1, "result": "created", "_shards": { "total": 0, "successful": 0, "failed": 0 }, "_seq_no": 0, "_primary_term": 0, "status": 201 } }, { "index": { "_index": "my-vector-index", "_id": "1%3A0%3ABsLMCZYBtRq-o6tNRFQB", "_version": 1, "result": "created", "_shards": { "total": 0, "successful": 0, "failed": 0 }, "_seq_no": 0, "_primary_term": 0, "status": 201 } }, ・・・・
-
ドキュメントの件数を確認します
GET my-vector-index/_search { "size": 0 }
-
104件返されました。(何回か手動で実行したので、100件ではなく104件になっています)
{ "took": 17, "timed_out": false, "_shards": { "total": 0, "successful": 0, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 104, "relation": "eq" }, "max_score": null, "hits": [] } }
KW検索(match)
titleに「Python」を含む本を検索します
-
下記のコマンドを実行します
GET my-vector-index/_search { "query": { "match": { "title": "Python" } } }
-
下記の結果が返されました
{ "took": 35, "timed_out": false, "_shards": { "total": 0, "successful": 0, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 5, "relation": "eq" }, "max_score": 3.6306305, "hits": [ { "_index": "my-vector-index", "_id": "1%3A0%3AUsLMCZYBtRq-o6tNRFQE", "_score": 3.6306305, "_source": { "title": "Python Essentials", "description": "Python Essentials - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.1098, 0.0933, 0.0612, 0.8775 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AE8LMCZYBtRq-o6tNRFQC", "_score": 3.207633, "_source": { "title": "Python Handbook", "description": "Python Handbook - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0945, 0.0828, 0.0987, 0.8597 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AWcLMCZYBtRq-o6tNRFQF", "_score": 3.207633, "_source": { "title": "Python Guide", "description": "Python Guide - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.1017, 0.0959, 0.1301, 0.9075 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AIcLMCZYBtRq-o6tNRFQC", "_score": 3.160477, "_source": { "title": "Python for Professionals", "description": "Python for Professionals - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.1142, 0.0918, 0.0605, 0.9455 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AL8LMCZYBtRq-o6tNRFQD", "_score": 2.2064066, "_source": { "title": "Python From Zero to Hero", "description": "Python From Zero to Hero - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0737, 0.0824, 0.0896, 0.9137 ] } } ] } }
フレーズ検索(match_phrase)
説明文に完全に一致するフレーズを含む本を探します
-
下記のコマンドを実行します
GET my-vector-index/_search { "query": { "match_phrase": { "description": "comprehensive and practical" } } }
-
下記の結果が返されました
{ "took": 38, "timed_out": false, "_shards": { "total": 0, "successful": 0, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 104, "relation": "eq" }, "max_score": 0.029437244, "hits": [ { "_index": "my-vector-index", "_id": "1%3A0%3ABMLJCZYBtRq-o6tN-VS_", "_score": 0.029437244, "_source": { "title": "Java Cookbook", "description": "Java Cookbook - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0535, 0.1386, 0.0729, 0.9075 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3ABsLMCZYBtRq-o6tNRFQB", "_score": 0.029437244, "_source": { "title": "Java Cookbook", "description": "Java Cookbook - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0535, 0.1386, 0.0729, 0.9075 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AGsLMCZYBtRq-o6tNRFQC", "_score": 0.029437244, "_source": { "title": "Algorithms Practices", "description": "Algorithms Practices - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0501, 0.0905, 0.0558, 0.9357 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AIsLMCZYBtRq-o6tNRFQC", "_score": 0.029437244, "_source": { "title": "DevOps Handbook", "description": "DevOps Handbook - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0669, 0.0532, 0.1023, 0.8631 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3ANcLMCZYBtRq-o6tNRFQD", "_score": 0.029437244, "_source": { "title": "Microservices Handbook", "description": "Microservices Handbook - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.1049, 0.0722, 0.0817, 0.8607 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AP8LMCZYBtRq-o6tNRFQD", "_score": 0.029437244, "_source": { "title": "Kubernetes Practices", "description": "Kubernetes Practices - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0568, 0.0639, 0.1313, 0.9223 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AQ8LMCZYBtRq-o6tNRFQE", "_score": 0.029437244, "_source": { "title": "Kubernetes Guide", "description": "Kubernetes Guide - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.12, 0.1067, 0.1307, 0.9149 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AT8LMCZYBtRq-o6tNRFQE", "_score": 0.029437244, "_source": { "title": "Rust Principles", "description": "Rust Principles - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0849, 0.1196, 0.1029, 0.8612 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AUcLMCZYBtRq-o6tNRFQE", "_score": 0.029437244, "_source": { "title": "Databases Essentials", "description": "Databases Essentials - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0975, 0.0597, 0.0758, 0.8662 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AUsLMCZYBtRq-o6tNRFQE", "_score": 0.029437244, "_source": { "title": "Python Essentials", "description": "Python Essentials - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.1098, 0.0933, 0.0612, 0.8775 ] } } ] } }
類似ベクトル検索(knn)
「専門性が高い本」に近い embedding を持つ上位3件を検索します
-
下記のコマンドを実行します
GET my-vector-index/_search { "size": 3, "query": { "knn": { "embedding": { "vector": [0.1, 0.1, 0.1, 0.9], "k": 3 } } } }
-
下記の結果が返されました
{ "took": 38, "timed_out": false, "_shards": { "total": 0, "successful": 0, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 10, "relation": "eq" }, "max_score": 0.99969876, "hits": [ { "_index": "my-vector-index", "_id": "1%3A0%3AaMLMCZYBtRq-o6tNRFQF", "_score": 0.99969876, "_source": { "title": "Machine Learning Handbook", "description": "Machine Learning Handbook - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.1123, 0.0923, 0.0914, 0.9041 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3ATsLMCZYBtRq-o6tNRFQE", "_score": 0.9994855, "_source": { "title": "Java for Professionals", "description": "Java for Professionals - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0971, 0.0913, 0.0895, 0.9179 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AKsLMCZYBtRq-o6tNRFQD", "_score": 0.99919564, "_source": { "title": "Big Data From Zero to Hero", "description": "Big Data From Zero to Hero - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0913, 0.102, 0.0905, 0.9252 ] } } ] } }
条件付き検索(bool)
title に DevOps を含み、かつ description に beginner を含まない条件で検索します
-
下記を実行します
GET my-vector-index/_search { "query": { "bool": { "must": [ { "match": { "title": "DevOps" } } ], "must_not": [ { "match": { "description": "beginner" } } ] } } }
-
下記の結果が返されました
{ "took": 30, "timed_out": false, "_shards": { "total": 0, "successful": 0, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 5, "relation": "eq" }, "max_score": 3.6306305, "hits": [ { "_index": "my-vector-index", "_id": "1%3A0%3AIsLMCZYBtRq-o6tNRFQC", "_score": 3.6306305, "_source": { "title": "DevOps Handbook", "description": "DevOps Handbook - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0669, 0.0532, 0.1023, 0.8631 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AQsLMCZYBtRq-o6tNRFQE", "_score": 3.207633, "_source": { "title": "DevOps Principles", "description": "DevOps Principles - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0555, 0.133, 0.0582, 0.8911 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3AEsLMCZYBtRq-o6tNRFQC", "_score": 2.7861922, "_source": { "title": "DevOps in Action", "description": "DevOps in Action - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0695, 0.1037, 0.1464, 0.932 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3ARsLMCZYBtRq-o6tNRFQE", "_score": 2.7861922, "_source": { "title": "DevOps for Beginners", "description": "DevOps for Beginners - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0932, 0.0961, 0.0921, 0.8663 ] } }, { "_index": "my-vector-index", "_id": "1%3A0%3ALsLMCZYBtRq-o6tNRFQD", "_score": 2.5103216, "_source": { "title": "DevOps From Zero to Hero", "description": "DevOps From Zero to Hero - A comprehensive and practical book in the field of software and technology.", "embedding": [ 0.0749, 0.1446, 0.0961, 0.8671 ] } } ] } }
考察
OpenSearch Serverlessでさまざまな検索クエリ(match、knnなど)を試しました、
今後は、実際の埋め込みデータを用いたベクトル検索や、検索結果の活用方法についても深掘りしていきたいと思います。
参考