2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

Amazon Kendra のIndexを構築する

Last updated at Posted at 2023-10-18

なんの記事?

  • AWS上でのAmazonBedrockを利用したRAGパターン実装のためにKendraのIndexを構築する
  • この後、LangChainからKendraのAPIとbedrock(Claude-v2)のAPI実行によりRAG実装を目指す

参考ドキュメント

かきBuilders-flashの記事を参考にIndexを作成する
https://aws.amazon.com/jp/builders-flash/202302/kendra-search-system/?awsf.filter-name=*all

Indexの作成

  • Indexの作成はコンソールから実行する
  • AWS Manegement ConsoleからKendraにアクセス
  • リージョンは東京
    image.png
  • Create Indexを押下
    image.png
  • 新規にロールを作成
  • 他はデフォルトでNextを押下
    image.png
    image.png
  • デフォルトでNextを押下
    image.png
  • DeveloperEditionで構築する(今回は最初なので無料枠が適用されるはず)
    image.png
  • Create を押下
    image.png
  • 作成にめっちゃ時間かかる
    image.png
  • 作成成功
    image.png

DataSouceの作成

  • CloudShellからS3バケットの作成とデータ登録を行う
cloudshell .
[cloudshell-user@ip-10-2-77-33 ~]$ BUCKET_NAME=kendra-datasource-bucket-test-20231018
[cloudshell-user@ip-10-2-77-33 ~]$ aws s3 mb s3://${BUCKET_NAME}
make_bucket: kendra-datasource-bucket-test-20231018
[cloudshell-user@ip-10-2-77-33 ~]$ mkdir awsdoc
[cloudshell-user@ip-10-2-77-33 ~]$ pushd awsdoc
~/awsdoc ~
[cloudshell-user@ip-10-2-77-33 awsdoc]$ wget https://docs.aws.amazon.com/ja_jp/amazondynamodb/latest/developerguide/dynamodb-dg.pdf -O DynamoDB.pdf
--2023-10-18 14:31:36--  https://docs.aws.amazon.com/ja_jp/amazondynamodb/latest/developerguide/dynamodb-dg.pdf
Resolving docs.aws.amazon.com (docs.aws.amazon.com)... 99.84.133.100, 99.84.133.122, 99.84.133.24, ...
Connecting to docs.aws.amazon.com (docs.aws.amazon.com)|99.84.133.100|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 34032310 (32M) [application/pdf]
Saving to: ‘DynamoDB.pdf’

100%[==============================================================================================================================================================================================>] 34,032,310  26.4MB/s   in 1.2s   

2023-10-18 14:31:38 (26.4 MB/s) - ‘DynamoDB.pdf’ saved [34032310/34032310]

[cloudshell-user@ip-10-2-77-33 awsdoc]$ wget https://docs.aws.amazon.com/ja_jp/lambda/latest/dg/lambda-dg.pdf -O Lambda.pdf
--2023-10-18 14:31:47--  https://docs.aws.amazon.com/ja_jp/lambda/latest/dg/lambda-dg.pdf
Resolving docs.aws.amazon.com (docs.aws.amazon.com)... 99.84.133.105, 99.84.133.24, 99.84.133.122, ...
Connecting to docs.aws.amazon.com (docs.aws.amazon.com)|99.84.133.105|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20743829 (20M) [application/pdf]
Saving to: ‘Lambda.pdf’

100%[==============================================================================================================================================================================================>] 20,743,829  28.1MB/s   in 0.7s   

2023-10-18 14:31:47 (28.1 MB/s) - ‘Lambda.pdf’ saved [20743829/20743829]

[cloudshell-user@ip-10-2-77-33 awsdoc]$ wget https://docs.aws.amazon.com/ja_jp/vpc/latest/userguide/vpc-ug.pdf -O VPC.pdf
--2023-10-18 14:31:57--  https://docs.aws.amazon.com/ja_jp/vpc/latest/userguide/vpc-ug.pdf
Resolving docs.aws.amazon.com (docs.aws.amazon.com)... 99.84.133.100, 99.84.133.105, 99.84.133.24, ...
Connecting to docs.aws.amazon.com (docs.aws.amazon.com)|99.84.133.100|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4310360 (4.1M) [application/pdf]
Saving to: ‘VPC.pdf’

100%[==============================================================================================================================================================================================>] 4,310,360   5.98MB/s   in 0.7s   

2023-10-18 14:31:57 (5.98 MB/s) - ‘VPC.pdf’ saved [4310360/4310360]

[cloudshell-user@ip-10-2-77-33 awsdoc]$ wget https://docs.aws.amazon.com/ja_jp/kendra/latest/dg/kendra-dg.pdf -O Kendra.pdf
--2023-10-18 14:32:06--  https://docs.aws.amazon.com/ja_jp/kendra/latest/dg/kendra-dg.pdf
Resolving docs.aws.amazon.com (docs.aws.amazon.com)... 99.84.133.122, 99.84.133.100, 99.84.133.105, ...
Connecting to docs.aws.amazon.com (docs.aws.amazon.com)|99.84.133.122|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6436120 (6.1M) [application/pdf]
Saving to: ‘Kendra.pdf’

100%[==============================================================================================================================================================================================>] 6,436,120   8.85MB/s   in 0.7s   

2023-10-18 14:32:07 (8.85 MB/s) - ‘Kendra.pdf’ saved [6436120/6436120]

[cloudshell-user@ip-10-2-77-33 awsdoc]$ wget https://docs.aws.amazon.com/ja_jp/Route53/latest/DeveloperGuide/route53-dg.pdf -O Route53.pdf
--2023-10-18 14:32:16--  https://docs.aws.amazon.com/ja_jp/Route53/latest/DeveloperGuide/route53-dg.pdf
Resolving docs.aws.amazon.com (docs.aws.amazon.com)... 99.84.133.24, 99.84.133.122, 99.84.133.100, ...
Connecting to docs.aws.amazon.com (docs.aws.amazon.com)|99.84.133.24|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8692485 (8.3M) [application/pdf]
Saving to: ‘Route53.pdf’

100%[==============================================================================================================================================================================================>] 8,692,485   9.30MB/s   in 0.9s   

2023-10-18 14:32:17 (9.30 MB/s) - ‘Route53.pdf’ saved [8692485/8692485]

[cloudshell-user@ip-10-2-77-33 awsdoc]$ popd
~
[cloudshell-user@ip-10-2-77-33 ~]$ aws s3 cp awsdoc s3://${BUCKET_NAME}/awsdoc/ --recursive
upload: awsdoc/VPC.pdf to s3://kendra-datasource-bucket-test-20231018/awsdoc/VPC.pdf
upload: awsdoc/Kendra.pdf to s3://kendra-datasource-bucket-test-20231018/awsdoc/Kendra.pdf
upload: awsdoc/DynamoDB.pdf to s3://kendra-datasource-bucket-test-20231018/awsdoc/DynamoDB.pdf
upload: awsdoc/Route53.pdf to s3://kendra-datasource-bucket-test-20231018/awsdoc/Route53.pdf
upload: awsdoc/Lambda.pdf to s3://kendra-datasource-bucket-test-20231018/awsdoc/Lambda.pdf
  • また下記のようなテキストをS3に配置する
office_info.txt
■オフィス間取りについて
一般受付は2Fの東側にあります。
オフィスフロアは19Fから35Fです。
来客向け会議室は36Fの南側にあります。
展望室は37Fです。

■ワークフロー申請方法について
交通費申請はグループ会社共有Webシステムからワークフローの申請をしてください。
在宅勤務申請は自社Webシステムから申請してください。
部内備品の持ち出しはExcelにて申請してください。
cloudshell .
[cloudshell-user@ip-10-2-77-33 ~]$ mkdir officedoc
[cloudshell-user@ip-10-2-77-33 ~]$ pushd officedoc
~/officedoc ~
[cloudshell-user@ip-10-2-77-33 officedoc]$ touch office_info.txt
[cloudshell-user@ip-10-2-77-33 officedoc]$ vim office_info.txt 
[cloudshell-user@ip-10-2-77-33 officedoc]$ popd
~
[cloudshell-user@ip-10-2-77-33 ~]$ aws s3 cp officedoc s3://${BUCKET_NAME}/officedoc/ --recursive
upload: officedoc/office_info.txt to s3://kendra-datasource-bucket-test-20231018/officedoc/office_info.txt
  • バケット確認
    image.png

  • add data sourcesを押下
    image.png

  • 検索ボックスにS3を入力し、Amazon S3 connectorを選択
    image.png

  • 名前を入力

  • Default language of source documents を日本語(Japanese)に設定
    image.png
    ![image.png](https://qiita-image-store.s3.ap-northeast-
    1.amazonaws.com/0/573632/b81c5606-bb5c-fcef-9eba-eea1585eb211.png)

  • Create New Roleを選択
    image.png

  • バケットを指定し、S3バケットのプレフィックスを設定

  • Sync のスケジュールを Run on demand を設定
    image.png
    image.png

  • Nextを押下
    image.png

  • 内容を確認してadd data source を押下
    image.png

  • できた
    image.png

  • Sync Nowを押下
    image.png

  • Syncにめっちゃ時間がかかる
    image.png
    image.png

検索してみる

  • Search Indexed Content をクリック
    image.png

  • 画面右のスパナアイコンをクリックし、言語を日本語に設定
    image.png
    image.png

  • 検索できた
    image.png

感想

  • 意外とセマンティック検索にならないので、datasourceに配置するドキュメントの作りこみに工夫が必要かもしれない
  • セマンティック検索にはまらない場合は単純にRAGの検索先としてはOpenSearchとかでいいと思う
  • S3を簡単にdatasourceにできるのが圧倒的な強み
  • ただし、高コスト

以上

2
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?