More than 1 year has passed since last update.

Amazon BedrockにCohere Command R と Command R+ が来たよ！RAGがすげーよ！

Posted at 2024-04-30

GW真っ只中の4/30、Amazon BedrockにCohere Command RとCommand R+がやってきました！！🎊🎊🎊

Command R+はただのテキスト生成の枠を超えたAPIになっています！（と勝手に解釈しています！！）

いろいろ特徴がありそうですが、まずは、RAGをやってみました。

なにがすごいの？

Command R+のInvoke ModelのBodyが特徴的で、 documentを渡す専用項目があります 。

{
    "message": string,
    "documents": [
        {"title": string, "snippet": string},
    ]
}

messageに質問を入れて、関連ドキュメントをdocumentsにセットします。
そう、まさしくRAGです。

documentsの中のJSONのキーは上記に固定されているわけではなく、好きな構造でOKです。

A document object is a dictionary containing the content and the metadata of the text. We recommend using a few descriptive keys such as "title", "snippet", or "last updated" and only including semantically relevant data.

やってみた

以下の質問に対する回答生成をやってみました。

AWS上でRAGを構築するときのおすすめのLLMは？検索はどうするといいの？フレームワークも教えて

参考：Cohereの公式ドキュメント

前準備：ライブラリーのインストールなど

ライブラリーのインストール

Boto3とDuckduckgo_searchをインストールします。
```
pip install boto3 duckduckgo-search
```

ライブラリーのインポート

import json

import boto3
from duckduckgo_search import DDGS

そのたもろもろ

client = boto3.client("bedrock-runtime")

model_id = "cohere.command-r-plus-v1:0"

question = "AWS上でRAGを構築するときのおすすめのLLMは？検索はどうするといいの？フレームワークも教えて"

モデルIDがClaudeより覚えやすい！

ステップ１：検索クエリを作成する

ユーザーの質問文をそのまま使っても検索がうまくいくとは限りません。複数のことを一度に聞いてくるかもしれません。
そのため、Command R / Command R+には、 検索クエリの生成 に特化したAPIの呼び出し方があります。

search_queries_onlyをTrueにして呼び出します。

def generate_search_queries(question: str):
    body = json.dumps(
        {
            "message": question,
            "search_queries_only": True,
        }
    )

    response = client.invoke_model(modelId=model_id, body=body)
    response_body = json.loads(response.get("body").read())

    return response_body["search_queries"]

response_bodyはこのようなJSONで返却されます。

{
  "chat_history": [],
  "finish_reason": "COMPLETE",
  "generation_id": "a1988f49-f20c-478a-be69-1508739417cb",
  "is_search_required": true,
  "response_id": "440d1b90-9d9d-492c-9e46-03267312a029",
  "search_queries": [
    {
      "generation_id": "a1988f49-f20c-478a-be69-1508739417cb",
      "text": "AWS上でRAGを構築するおすすめのLLM"
    },
    {
      "generation_id": "a1988f49-f20c-478a-be69-1508739417cb",
      "text": "RAG フレームワーク"
    }
  ],
  "text": ""
}

生成される検索クエリは、1つとは限りません。今回の例の場合は、「AWS上でRAGを構築するおすすめのLLM」と「RAG フレームワーク」のふたつが返却されています。

response_bodyのsearch_queriesだけを受け取ります。

search_queries = generate_search_queries(question)

search_queries

[
  {
    "generation_id": "8fdfec93-fc8c-4261-a277-29aa34e603d2",
    "text": "AWS上でRAGを構築するおすすめのLLM"
  },
  {
    "generation_id": "8fdfec93-fc8c-4261-a277-29aa34e603d2",
    "text": "RAG フレームワーク"
  }
]

ステップ２：ドキュメントを検索する

検索クエリを生成したのでドキュメントを検索して取得します。今回はDuckDuckGoを使用します。

検索関数を作って、検索クエリごとに呼び出します。

def search(query: str):
    results = DDGS().text(query, max_results=5)

    return results

documents = []

for search_query in search_queries:
    text = search_query["text"]

    search_results = search(text)
    documents.extend(search_results)

documentsの内容（長いので折りたたみました）

[
  {
    "title": "【Bedrock×Lambda】高精度なハイブリッド検索RAGをサーバレスで実装（Slack連携も可） #AWS - Qiita",
    "href": "https://qiita.com/Naoki_Ishihara/items/662d70a9bd0dc3a8c9ce",
    "body": "はじめに. 近年、大規模言語モデル（LLM） の発展により、LLMを活用する機会が増加しています。 その中でも、LLMを組み込んだ仕組みの一つとして、RAG（Retrieval-Augmented Generation） が注目を集めています。 本記事では、はじめに、検索精度を向上させたRAGアーキテクチャを紹介します。"
  },
  {
    "title": "【Bedrock / Claude】AWSオンリーでRAGを使った生成AIボットを構築してみた【Kendra】",
    "href": "https://dev.classmethod.jp/articles/implement-rag-with-aws-services/",
    "body": "Amazon Bedrockを利用しました。. 以下のように 2種類のモデルを利用 しました。. 情報抽出（Read処理）：Claude Instant V1（v1.2）. 回答生成（Compose処理）：Claude V2（v2）. 情報抽出部分もClaude V2にしても良いですが、Claude Instant V1を使うことで回答が始まるまでの時間 ..."
  },
  {
    "title": "RAGの実案件に取り組んできた今までの知見をまとめてみた",
    "href": "https://dev.classmethod.jp/articles/rag-knowledge-on-real-projects/",
    "body": "RAGを使った社内情報を回答できる生成AIボットで業務効率化してみた | DevelopersIO. 【Bedrock / Claude】AWSオンリーでRAGを使った生成AIボットを構築してみた【Kendra】 | DevelopersIO. また、ありがたいことにお客様からお仕事もいただき、実際の案件としてRAGを使った ..."
  },
  {
    "title": "AWSとGoogle Cloudで、とりあえず簡単にRAGを構築してみたい!（API呼び出しもあるよ） - Qiita",
    "href": "https://qiita.com/ckw-1227/items/d8f89757349b90e35fa7",
    "body": "AWS、Google Cloudともにストレージバケットからテキストファイルをアップして、その情報をコンソール上で指定するだけでRAGのデータソースにすることができるという手軽さ。 コードを全く書かずにRAGを構築することができ、コンソール上でテストまでできる。"
  },
  {
    "title": "Ragを活用した生成aiチャットボット提供。Llm選定やインフラを1カ月で構築 | クラスメソッド株式会社",
    "href": "https://classmethod.jp/cases/kusurinomadoguchi/",
    "body": "プロジェクトのキックオフは2023年10月。. その後、約1カ月の開発期間を経てRAGを活用した生成AIチャットボットのPoC検証用の環境を構築しました。. プロジェクト期間中は、週次定例会やツールを活用し、コミュニケーションを密に取るようにしました ..."
  },
  {
    "title": "Retrieval-Augmented Generation(RAG)とは？ - IBM",
    "href": "https://www.ibm.com/blogs/solutions/jp-ja/retrieval-augmented-generation-rag/",
    "body": "Retrieval-augmented Generation（RAG、検索により強化した文章生成）は、LLMが持つ知識の内部表現を補うために外部の知識ソースにモデルを接地させる（グラウンドさせる）ことで、LLMが生成する回答の質を向上するAIのフレームワークです。LLMベースの質問応答 ..."
  },
  {
    "title": "What is retrieval-augmented generation? | IBM Research Blog",
    "href": "https://research.ibm.com/blog/retrieval-augmented-generation-RAG",
    "body": "Retrieval-augmented generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM's internal representation of information. Implementing RAG in an LLM-based question answering system has two main benefits: It ensures that the model has access to the most current, reliable facts, and that ..."
  },
  {
    "title": "Practical Guide to Crafting your first LLM-powered App Using RAG Framework",
    "href": "https://deepchecks.com/practical-guide-to-crafting-your-first-llm-powered-app-using-rag-framework/",
    "body": "The RAG framework operates in two distinct phases: retrieval and generation. During the retrieval phase, it employs algorithms to identify and gather pertinent information snippets based on the user's input or query. This process taps into various data sources, including documents, databases, or APIs, to obtain the needed context. ..."
  },
  {
    "title": "What is RAG? - Retrieval-Augmented Generation Explained - AWS",
    "href": "https://aws.amazon.com/what-is/retrieval-augmented-generation/",
    "body": "Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to generate original output ..."
  },
  {
    "title": "GitHub - pinecone-io/canopy: Retrieval Augmented Generation (RAG ...",
    "href": "https://github.com/pinecone-io/canopy",
    "body": "Canopy is an open-source Retrieval Augmented Generation (RAG) framework and context engine built on top of the Pinecone vector database. Canopy enables you to quickly and easily experiment with and build applications using RAG. Start chatting with your documents or text data with a few simple commands."
  }
]

1つの検索クエリに対して5件結果が返ってくるので、全部で10件の検索結果があります。

回答生成

ドキュメントを取得できたので回答を生成させます。

messageが質問文で、documentsが検索結果です。

def generate_response(question: str, documents: dict):
    body = json.dumps({"message": question, "documents": documents})

    response = client.invoke_model(modelId=model_id, body=body)
    response_body = json.loads(response.get("body").read())

    print_json(response_body)
    return response_body

generate = generate_response(question, documents)

APIのレスポンスにchat_historyが含まれます。いいですね！（開発者のこと考えてくれてる感じが嬉しい！）

{
  "chat_history": [
    {
      "message": "AWS上でRAGを構築するときのおすすめのLLMは？検索はどうするといいの？フレームワークも教えて",
      "role": "USER"
    },
    {
      "message": "RAG (Retrieval-Augmented Generation) は、LLMが生成する回答の質を向上させるための AI フレームワークです。RAG では、LLM が持つ知識の内部表現を補うために、モデルを外部の知識ソースに接地 (グラウンド) させます。\n\nAWS 上で RAG を構築するおすすめの LLM は、Amazon Bedrock と Claude です。これらのモデルは、情報抽出 (Read 処理) と回答生成 (Compose 処理) という 2 種類の処理に使用されます。\n\n検索については、AWS と Google Cloud では、ストレージバケットからテキストファイルをアップロードして、その情報をコンソール上で指定するだけで RAG のデータソースにすることができます。\n\nRAG のフレームワークは、情報取得 (retrieval) と生成 (generation) という 2 つの異なるフェーズで構成されています。情報取得フェーズでは、ユーザーの入力またはクエリに基づいて関連性の高い情報スニペットを識別して収集するために、アルゴリズムが使用されます。このプロセスでは、文書、データベース、API などのさまざまなデータソースにアクセスして、必要なコンテキストを取得します。",
      "role": "CHATBOT"
    }
  ],
  "citations": [],
  "documents": [],
  "finish_reason": "COMPLETE",
  "generation_id": "cd8234e5-0717-43bc-9a2e-e5ec4e67c94e",
  "response_id": "17e4448e-0b55-4e95-bf06-fedbb96bc8ce",
  "text": "RAG (Retrieval-Augmented Generation) は、LLMが生成する回答の質を向上させるための AI フレームワークです。RAG では、LLM が持つ知識の内部表現を補うために、モデルを外部の知識ソースに接地 (グラウンド) させます。\n\nAWS 上で RAG を構築するおすすめの LLM は、Amazon Bedrock と Claude です。これらのモデルは、情報抽出 (Read 処理) と回答生成 (Compose 処理) という 2 種類の処理に使用されます。\n\n検索については、AWS と Google Cloud では、ストレージバケットからテキストファイルをアップロードして、その情報をコンソール上で指定するだけで RAG のデータソースにすることができます。\n\nRAG のフレームワークは、情報取得 (retrieval) と生成 (generation) という 2 つの異なるフェーズで構成されています。情報取得フェーズでは、ユーザーの入力またはクエリに基づいて関連性の高い情報スニペットを識別して収集するために、アルゴリズムが使用されます。このプロセスでは、文書、データベース、API などのさまざまなデータソースにアクセスして、必要なコンテキストを取得します。"
}

生成した回答はこちらです。

RAG (Retrieval-Augmented Generation) は、LLMが生成する回答の質を向上させるための AI フレームワークです。RAG では、LLM が持つ知識の内部表現を補うために、モデルを外部の知識ソースに接地 (グラウンド) させます。\n\nAWS 上で RAG を構築するおすすめの LLM は、Amazon Bedrock と Claude です。これらのモデルは、情報抽出 (Read 処理) と回答生成 (Compose 処理) という 2 種類の処理に使用されます。\n\n検索については、AWS と Google Cloud では、ストレージバケットからテキストファイルをアップロードして、その情報をコンソール上で指定するだけで RAG のデータソースにすることができます。\n\nRAG のフレームワークは、情報取得 (retrieval) と生成 (generation) という 2 つの異なるフェーズで構成されています。情報取得フェーズでは、ユーザーの入力またはクエリに基づいて関連性の高い情報スニペットを識別して収集するために、アルゴリズムが使用されます。このプロセスでは、文書、データベース、API などのさまざまなデータソースにアクセスして、必要なコンテキストを取得します。

チューニングは必要そうですが、それっぽい回答が返ってきました。

おまけ：引用を活用する

APIのレスポンスにはcitationsという項目が含まれています。これは情報の引用元を示す情報です。

citationsを一部抜粋

{
  "document_ids": [
    "doc_5",
    "doc_6",
    "doc_8"
  ],
  "end": 71,
  "start": 39,
  "text": "LLMが生成する回答の質を向上させるための AI フレームワーク"
}

この場合、生成した文書のtextについては、doc_5、doc_6、doc_8を引用して生成したよ（文字列の位置はstartからendだよ）という意味合いになります。

この情報を駆使して、Markdownの引用を作ってみました。
（ちからわざ。。。どなたかプロっぽく書いてください）

def create_text_with_citation(generate):
    citations = generate["citations"]

    answer_with_citation: str = generate["text"]
    citation_documents = []
    for citation in citations:
        document_ids = citation["document_ids"]

        citation_text = ""
        for document_id in document_ids:
            id = document_id.replace("doc_", "")
            citation_text = citation_text + f"[^{id}] "

        citation_documents.extend(document_ids)

        answer_with_citation = answer_with_citation.replace(citation["text"], f"{citation["text"]}{citation_text}")

    citation_text = ""
    for document in generate["documents"]:
        document_id = document["id"] # doc_0など
        if document_id in citation_documents:
            id = document_id.replace("doc_", "")
            citation_text = citation_text + f"[^{id}]: {document['title']} {document['href']} \n"

    return {
        "answer_with_citation": answer_with_citation,
        "citation_text": citation_text
    }

実行

text_with_citation = create_text_with_citation(generate)

print("")
print(text_with_citation["answer_with_citation"])
print("")
print(text_with_citation["citation_text"])

実行結果（コードブロックで表示）

RAG (Retrieval-Augmented Generation)[^0] [^5] [^6] [^8]  は、LLMが生成する回答の質を向上させるための AI フレームワーク[^5] [^6] [^8] です。RAG では、LLM が持つ知識の内部表現を補うために、モデルを外部の知識ソースに接地 (グラウンド) させます。[^5] [^6] 

AWS 上で RAG を構築するおすすめの LLM は、Amazon Bedrock と Claude[^1] [^2]  です。これらのモデルは、情報抽出 (Read 処理) と回答生成 (Compose 処理) という 2 種類の処理に使用[^1] されます。

検索については、AWS と Google Cloud では、ストレージバケットからテキストファイルをアップロードして、その情報をコンソール上で指定するだけで RAG のデータソースにすることができます。[^3] 

RAG のフレームワークは、情報取得 (retrieval) と生成 (generation) という 2 つの異なるフェーズで構成[^7] されています。情報取得フェーズでは、ユーザーの入力またはクエリに基づいて関連性の高い情報スニペットを識別して収集するために、アルゴリズムが使用[^7] されます。このプロセスでは、文書、データベース、API などのさまざまなデータソースにアクセスして、必要なコンテキストを取得[^7] します。

[^0]: 【Bedrock×Lambda】高精度なハイブリッド検索RAGをサーバレスで実装（Slack連携も可） #AWS - Qiita https://qiita.com/Naoki_Ishihara/items/662d70a9bd0dc3a8c9ce 
[^5]: Retrieval-Augmented Generation(RAG)とは？ - IBM https://www.ibm.com/blogs/solutions/jp-ja/retrieval-augmented-generation-rag/ 
[^6]: What is retrieval-augmented generation? | IBM Research Blog https://research.ibm.com/blog/retrieval-augmented-generation-RAG 
[^8]: What is RAG? - Retrieval-Augmented Generation Explained - AWS https://aws.amazon.com/what-is/retrieval-augmented-generation/ 
[^1]: 【Bedrock / Claude】AWSオンリーでRAGを使った生成AIボットを構築してみた【Kendra】 https://dev.classmethod.jp/articles/implement-rag-with-aws-services/ 
[^2]: RAGの実案件に取り組んできた今までの知見をまとめてみた https://dev.classmethod.jp/articles/rag-knowledge-on-real-projects/ 
[^3]: AWSとGoogle Cloudで、とりあえず簡単にRAGを構築してみたい!（API呼び出しもあるよ） - Qiita https://qiita.com/ckw-1227/items/d8f89757349b90e35fa7 
[^7]: Practical Guide to Crafting your first LLM-powered App Using RAG Framework https://deepchecks.com/practical-guide-to-crafting-your-first-llm-powered-app-using-rag-framework/

Markdownをそのまま貼り付けたら、こんな感じです！！～～　引用部分もいい感じ　↓↓

RAG (Retrieval-Augmented Generation)¹ ² ³ ⁴ は、LLMが生成する回答の質を向上させるための AI フレームワーク² ³ ⁴ です。RAG では、LLM が持つ知識の内部表現を補うために、モデルを外部の知識ソースに接地 (グラウンド) させます。² ³

AWS 上で RAG を構築するおすすめの LLM は、Amazon Bedrock と Claude⁵ ⁶ です。これらのモデルは、情報抽出 (Read 処理) と回答生成 (Compose 処理) という 2 種類の処理に使用⁵ されます。

検索については、AWS と Google Cloud では、ストレージバケットからテキストファイルをアップロードして、その情報をコンソール上で指定するだけで RAG のデータソースにすることができます。⁷

RAG のフレームワークは、情報取得 (retrieval) と生成 (generation) という 2 つの異なるフェーズで構成⁸ されています。情報取得フェーズでは、ユーザーの入力またはクエリに基づいて関連性の高い情報スニペットを識別して収集するために、アルゴリズムが使用⁸ されます。このプロセスでは、文書、データベース、API などのさまざまなデータソースにアクセスして、必要なコンテキストを取得⁸ します。

【Bedrock×Lambda】高精度なハイブリッド検索RAGをサーバレスで実装（Slack連携も可） #AWS - Qiita https://qiita.com/Naoki_Ishihara/items/662d70a9bd0dc3a8c9ce ↩
Retrieval-Augmented Generation(RAG)とは？ - IBM https://www.ibm.com/blogs/solutions/jp-ja/retrieval-augmented-generation-rag/ ↩ ↩² ↩³
What is retrieval-augmented generation? | IBM Research Blog https://research.ibm.com/blog/retrieval-augmented-generation-RAG ↩ ↩² ↩³
What is RAG? - Retrieval-Augmented Generation Explained - AWS https://aws.amazon.com/what-is/retrieval-augmented-generation/ ↩ ↩²
【Bedrock / Claude】AWSオンリーでRAGを使った生成AIボットを構築してみた【Kendra】 https://dev.classmethod.jp/articles/implement-rag-with-aws-services/ ↩ ↩²
RAGの実案件に取り組んできた今までの知見をまとめてみた https://dev.classmethod.jp/articles/rag-knowledge-on-real-projects/ ↩
AWSとGoogle Cloudで、とりあえず簡単にRAGを構築してみたい!（API呼び出しもあるよ） - Qiita https://qiita.com/ckw-1227/items/d8f89757349b90e35fa7 ↩
Practical Guide to Crafting your first LLM-powered App Using RAG Framework https://deepchecks.com/practical-guide-to-crafting-your-first-llm-powered-app-using-rag-framework/ ↩ ↩² ↩³

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up