More than 1 year has passed since last update.

RAGを実装したチャットでソースを取得する方法

Last updated at 2024-04-24Posted at 2024-04-18

はじめに

RAGによる文書検索でソースを取得する方法についてまとめる。
langchain.chains.RetrievalQAWithSourcesChainを使うのもひとつ。
以下では、LCEL表記でどのように書くかを紹介する。

環境

langchain_core: 0.1.44
langchain_community: 0.0.33

ドキュメント確認

調べると、公式ドキュメントに紹介されていた。
https://python.langchain.com/docs/use_cases/question_answering/sources/
抜粋すると以下の通り。RAGで得られたチャンクとソースがそれぞれ取得できていることがわかる。

from langchain_core.runnables import RunnableParallel

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

rag_chain_with_source.invoke("What is Task Decomposition")

{'context': [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1585}),
  Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 2192}),
  Document(page_content='The AI assistant can parse user input to several tasks: [{"task": task, "id", task_id, "dep": dependency_task_ids, "args": {"text": text, "image": URL, "audio": URL, "video": URL}}]. The "dep" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag "-task_id" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can\'t be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 17804}),
  Document(page_content='Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\nThe system comprises of 4 stages:\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\nInstruction:', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 17414}),
  Document(page_content='Resources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 29630}),
  Document(page_content="(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.", metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 19373})],
 'question': 'What is Task Decomposition',
 'answer': 'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks, allowing for a more systematic and organized approach to problem-solving. Thanks for asking!'}

実装してみる

実際にソースを出力するRAGを構築してみる。
検索対象として、今回はWikipediaのページを参照してみることにする。

参照先: https://ja.wikipedia.org/wiki/ロバート・オッペンハイマー

モデルの用意

LLMとEmbeddingモデルを取得する。
Azureで行う場合は、例えば以下のリンク先のものを用いると良い。

Langchainを使うので、以下のような関数でモデルを取得する。

from langchain_openai import AzureOpenAIEmbedding

def get_embedding():
    embedding = AzureOpenAIEmbedding(
        openai_api_type='azure',
        openai_api_version='{API_VERSION}',
        azure_endpoint='{API_ENDPOINT}',
        openai_api_key='{API_KEYS}',
        model='{EMBEDDING_MODEL}',
        azure_deployment='{MODEL_NAME}'
    )
    return embedding

from langchain_openai import AzureOpenAIEmbedding

def get_embedding():
    embedding = AzureOpenAIEmbedding(
        openai_api_type='azure',
        openai_api_version='{API_VERSION}',
        azure_endpoint='{API_ENDPOINT}',
        openai_api_key='{API_KEYS}',
        model='{EMBEDDING_MODEL}',
        azure_deployment='{MODEL_NAME}'
    )
    return embedding

モデルを取得する。

# get models
llm = get_llm()
embedding = get_embedding()

`retriever`用意

チャンク調整

適当な長さに区切るよう設定する。
Wikipediaなので短めに区切ったほうが良い？と思い、以下のように設定してみる。

from langchain.text_splitter import CharacterTextSplitter
# chunk setting
text_splitter = CharacterTextSplitter(
    separator='\n\n',
    # separator = '。',
    chunk_size=1000,
    chunk_overlap=50
)

`documents`取得

以下のように取得する。

from langchain_community.document_loaders import WebBaseLoader
# get documents
document_url = 'https://ja.wikipedia.org/wiki/ロバート・オッペンハイマー'
raw_documents = WebBaseLoader(document_url).load()
documents = text_splitter.split_documents(raw_documents)

データベース用意

ベクトルデータベースを用意する。今回FAISSを使う。

from langchain_community.vectorstores import FAISS
db = FAISS.from_documents(
      documents=documents,
      embedding=embedding
  )

もしローカルに保存してあとから使いたい場合は以下を付け加える。

db.save_local(folder_path='{SAVE_PATH}')

`retriever`取得

最後にretriever取得

retriever = db.as_retriever()

通常のRAG

ソースを参照しない場合、以下のようになる。

from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# get templates
## prompt
prompt = PromptTemplate.from_template("""
あなたはcontextを参考に、questionに回答します。
<context>{context}</context>
<question>{question}</question>
""")

# get chain
chain = (
    {"context":retriever, "question":RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# rag
query = '映画オッペンハイマーについて教えて下さい。'
answer = chain.invoke(query)
print(answer)

結果

映画「オッペンハイマー」はクリストファー・ノーラン監督による作品で、2024年に日本で公開される予定です。配給はビターズ・エンドが行うとの情報があります。この映画は、ロバート・オッペンハイマーという科学者に関連していると思われますが、提供されたコンテキストからは、映画の詳細な内容やキャスト、ストーリーについての具体的な情報は得られません。より詳しい情報を得るためには、映画の公式発表やプレスリリース、映画の予告編などを参照する必要があります。

実際にWikipediaで紹介されいている内容を反映していることがわかる。

ちなみに、chainを以下のように変更して応答を見てみるとこの様になる。

chain = (
    {"context":retriever, "question":RunnablePassthrough()}
    | prompt
    | llm
)

content='映画「オッペンハイマー」は、クリストファー・ノーラン監督による作品で、2024年に日本で公開されることが決定しています。配給はビターズ・エンドが行うとの情報があります。この映画は、ロバート・オッペンハイマーという実在の物理学者を題材にしており、彼は「原爆の父」として知られています。オッペンハイマーはマンハッタン計画において原子爆弾の開発を指揮しましたが、戦後は水爆の開発に反対する活動を行いました。映画は、彼の科学者としての業績や、核兵器開発に対する複雑な感情、公職追放などの波乱に満ちた人生を描くと考えられます。公開日は2024年3月29日とされています。' response_metadata={'token_usage': {'completion_tokens': 291, 'prompt_tokens': 3150, 'total_tokens': 3441}, 'model_name': 'gpt-4', 'system_fingerprint': 'fp_2f57f81c11', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}} id='run-2b4741cf-c4cc-4249-9906-8a1b37a42337-0'

このように、様々なメタデータが追加されて返答が得られていることがわかる。
特にlangchain_core 0.1.44からは消費したtokenも取得できるようになっている。

参照先を追加

公式ドキュメントに従って、参照元を表示できるようにする。
テンプレートに則った回答が得られるようにしてみる。

返答のテンプレートは以下のようにする。

## answer
completion = PromptTemplate.from_template("""
question:{question}
answer:{content}

source:{source}
prompt_tokens:{prompt_tokens}
completion_tokens:{completion_tokens}
total_tokens:{total_tokens}
""")

RAG実装部は以下のようになる。

from langchain_core.runnables import RunnableParallel

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# get chain with source
chain_rag_from_docs = (
    RunnablePassthrough.assign(content=(lambda x: format_docs(x["context"])))
    | prompt
    | llm
)
chain_rag_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=chain_rag_from_docs)
# get response chain
chain_answer = completion

# rag
query = '映画オッペンハイマーについて教えて下さい。'
answer = chain_rag_with_source.invoke(query)

response = chain_answer.invoke({
    "question":answer['question'],
    "content":answer['answer'].content,
    "source":answer['context'][0].metadata['source'],
    "prompt_tokens":answer['answer'].response_metadata['token_usage']['prompt_tokens'],
    "completion_tokens":answer['answer'].response_metadata['token_usage']['completion_tokens'],
    "total_tokens":answer['answer'].response_metadata['token_usage']['total_tokens']
})
print(response.text)

question:映画オッペンハイマーについて教えて下さい。

answer:映画「オッペンハイマー」は、クリストファー・ノーラン監督による作品で、2024年に日本での公開が決定しています。配給はビターズ・エンドが行うとのことです。この映画は、ロバート・オッペンハイマーという実在の物理学者を題材にしており、彼は「原爆の父」として知られています。オッペンハイマーは、第二次世界大戦中のマンハッタン計画において原子爆弾の開発を指揮しましたが、戦後は水爆開発に反対する活動を行いました。

映画の具体的な内容やキャストについての情報は、提供された文書からは読み取れませんが、ノーラン監督の作品であることから、高い期待が寄せられていることが予想されます。また、映画の公開日は2024年3月29日であることが確認できます。

source:https://ja.wikipedia.org/wiki/ロバート・オッペンハイマー
prompt_tokens:3150
completion_tokens:322
total_tokens:3472

テンプレートに沿った回答が得られた！

ちなみに、prompt_tokensはLLMに投入したtoken数、completion_tokensはGPT4での回答のtoken数。
RAGを使うことでtokenを結構消費してるんだなぁ、、、

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up