なんの記事?
- AWS上でのAmazonBedrockを利用したRAGパターン実装のためにKendraのIndexを構築する
- この後、LangChainからKendraのAPIとbedrock(Claude-v2)のAPI実行によりRAG実装を目指す
参考ドキュメント
かきBuilders-flashの記事を参考にIndexを作成する
https://aws.amazon.com/jp/builders-flash/202302/kendra-search-system/?awsf.filter-name=*all
Indexの作成
- Indexの作成はコンソールから実行する
- AWS Manegement ConsoleからKendraにアクセス
- リージョンは東京
- Create Indexを押下
- 新規にロールを作成
- 他はデフォルトでNextを押下
- デフォルトでNextを押下
- DeveloperEditionで構築する(今回は最初なので無料枠が適用されるはず)
- Create を押下
- 作成にめっちゃ時間かかる
- 作成成功
DataSouceの作成
- CloudShellからS3バケットの作成とデータ登録を行う
cloudshell .
[cloudshell-user@ip-10-2-77-33 ~]$ BUCKET_NAME=kendra-datasource-bucket-test-20231018
[cloudshell-user@ip-10-2-77-33 ~]$ aws s3 mb s3://${BUCKET_NAME}
make_bucket: kendra-datasource-bucket-test-20231018
[cloudshell-user@ip-10-2-77-33 ~]$ mkdir awsdoc
[cloudshell-user@ip-10-2-77-33 ~]$ pushd awsdoc
~/awsdoc ~
[cloudshell-user@ip-10-2-77-33 awsdoc]$ wget https://docs.aws.amazon.com/ja_jp/amazondynamodb/latest/developerguide/dynamodb-dg.pdf -O DynamoDB.pdf
--2023-10-18 14:31:36-- https://docs.aws.amazon.com/ja_jp/amazondynamodb/latest/developerguide/dynamodb-dg.pdf
Resolving docs.aws.amazon.com (docs.aws.amazon.com)... 99.84.133.100, 99.84.133.122, 99.84.133.24, ...
Connecting to docs.aws.amazon.com (docs.aws.amazon.com)|99.84.133.100|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 34032310 (32M) [application/pdf]
Saving to: ‘DynamoDB.pdf’
100%[==============================================================================================================================================================================================>] 34,032,310 26.4MB/s in 1.2s
2023-10-18 14:31:38 (26.4 MB/s) - ‘DynamoDB.pdf’ saved [34032310/34032310]
[cloudshell-user@ip-10-2-77-33 awsdoc]$ wget https://docs.aws.amazon.com/ja_jp/lambda/latest/dg/lambda-dg.pdf -O Lambda.pdf
--2023-10-18 14:31:47-- https://docs.aws.amazon.com/ja_jp/lambda/latest/dg/lambda-dg.pdf
Resolving docs.aws.amazon.com (docs.aws.amazon.com)... 99.84.133.105, 99.84.133.24, 99.84.133.122, ...
Connecting to docs.aws.amazon.com (docs.aws.amazon.com)|99.84.133.105|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20743829 (20M) [application/pdf]
Saving to: ‘Lambda.pdf’
100%[==============================================================================================================================================================================================>] 20,743,829 28.1MB/s in 0.7s
2023-10-18 14:31:47 (28.1 MB/s) - ‘Lambda.pdf’ saved [20743829/20743829]
[cloudshell-user@ip-10-2-77-33 awsdoc]$ wget https://docs.aws.amazon.com/ja_jp/vpc/latest/userguide/vpc-ug.pdf -O VPC.pdf
--2023-10-18 14:31:57-- https://docs.aws.amazon.com/ja_jp/vpc/latest/userguide/vpc-ug.pdf
Resolving docs.aws.amazon.com (docs.aws.amazon.com)... 99.84.133.100, 99.84.133.105, 99.84.133.24, ...
Connecting to docs.aws.amazon.com (docs.aws.amazon.com)|99.84.133.100|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4310360 (4.1M) [application/pdf]
Saving to: ‘VPC.pdf’
100%[==============================================================================================================================================================================================>] 4,310,360 5.98MB/s in 0.7s
2023-10-18 14:31:57 (5.98 MB/s) - ‘VPC.pdf’ saved [4310360/4310360]
[cloudshell-user@ip-10-2-77-33 awsdoc]$ wget https://docs.aws.amazon.com/ja_jp/kendra/latest/dg/kendra-dg.pdf -O Kendra.pdf
--2023-10-18 14:32:06-- https://docs.aws.amazon.com/ja_jp/kendra/latest/dg/kendra-dg.pdf
Resolving docs.aws.amazon.com (docs.aws.amazon.com)... 99.84.133.122, 99.84.133.100, 99.84.133.105, ...
Connecting to docs.aws.amazon.com (docs.aws.amazon.com)|99.84.133.122|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6436120 (6.1M) [application/pdf]
Saving to: ‘Kendra.pdf’
100%[==============================================================================================================================================================================================>] 6,436,120 8.85MB/s in 0.7s
2023-10-18 14:32:07 (8.85 MB/s) - ‘Kendra.pdf’ saved [6436120/6436120]
[cloudshell-user@ip-10-2-77-33 awsdoc]$ wget https://docs.aws.amazon.com/ja_jp/Route53/latest/DeveloperGuide/route53-dg.pdf -O Route53.pdf
--2023-10-18 14:32:16-- https://docs.aws.amazon.com/ja_jp/Route53/latest/DeveloperGuide/route53-dg.pdf
Resolving docs.aws.amazon.com (docs.aws.amazon.com)... 99.84.133.24, 99.84.133.122, 99.84.133.100, ...
Connecting to docs.aws.amazon.com (docs.aws.amazon.com)|99.84.133.24|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8692485 (8.3M) [application/pdf]
Saving to: ‘Route53.pdf’
100%[==============================================================================================================================================================================================>] 8,692,485 9.30MB/s in 0.9s
2023-10-18 14:32:17 (9.30 MB/s) - ‘Route53.pdf’ saved [8692485/8692485]
[cloudshell-user@ip-10-2-77-33 awsdoc]$ popd
~
[cloudshell-user@ip-10-2-77-33 ~]$ aws s3 cp awsdoc s3://${BUCKET_NAME}/awsdoc/ --recursive
upload: awsdoc/VPC.pdf to s3://kendra-datasource-bucket-test-20231018/awsdoc/VPC.pdf
upload: awsdoc/Kendra.pdf to s3://kendra-datasource-bucket-test-20231018/awsdoc/Kendra.pdf
upload: awsdoc/DynamoDB.pdf to s3://kendra-datasource-bucket-test-20231018/awsdoc/DynamoDB.pdf
upload: awsdoc/Route53.pdf to s3://kendra-datasource-bucket-test-20231018/awsdoc/Route53.pdf
upload: awsdoc/Lambda.pdf to s3://kendra-datasource-bucket-test-20231018/awsdoc/Lambda.pdf
- また下記のようなテキストをS3に配置する
office_info.txt
■オフィス間取りについて
一般受付は2Fの東側にあります。
オフィスフロアは19Fから35Fです。
来客向け会議室は36Fの南側にあります。
展望室は37Fです。
■ワークフロー申請方法について
交通費申請はグループ会社共有Webシステムからワークフローの申請をしてください。
在宅勤務申請は自社Webシステムから申請してください。
部内備品の持ち出しはExcelにて申請してください。
cloudshell .
[cloudshell-user@ip-10-2-77-33 ~]$ mkdir officedoc
[cloudshell-user@ip-10-2-77-33 ~]$ pushd officedoc
~/officedoc ~
[cloudshell-user@ip-10-2-77-33 officedoc]$ touch office_info.txt
[cloudshell-user@ip-10-2-77-33 officedoc]$ vim office_info.txt
[cloudshell-user@ip-10-2-77-33 officedoc]$ popd
~
[cloudshell-user@ip-10-2-77-33 ~]$ aws s3 cp officedoc s3://${BUCKET_NAME}/officedoc/ --recursive
upload: officedoc/office_info.txt to s3://kendra-datasource-bucket-test-20231018/officedoc/office_info.txt
-
名前を入力
-
Default language of source documents を日本語(Japanese)に設定
![image.png](https://qiita-image-store.s3.ap-northeast-
1.amazonaws.com/0/573632/b81c5606-bb5c-fcef-9eba-eea1585eb211.png) -
バケットを指定し、S3バケットのプレフィックスを設定
検索してみる
感想
- 意外とセマンティック検索にならないので、datasourceに配置するドキュメントの作りこみに工夫が必要かもしれない
- セマンティック検索にはまらない場合は単純にRAGの検索先としてはOpenSearchとかでいいと思う
- S3を簡単にdatasourceにできるのが圧倒的な強み
- ただし、高コスト
以上