More than 1 year has passed since last update.

【Azure OpenAI開発】 Azureサービスを組み合わせてPDFと対話する簡単なアプリケーションを構築する

Last updated at 2023-05-18Posted at 2023-05-17

完成品

動作イメージ

以下のようなPDFファイルをユーザがアップロードし、
この内容を基に回答を得たいという状況を想定しています。
社内ナレッジ検索などのユースケースで使用されている構成かと思います。

一回目の実行では、PDFをアップロードせず、「分かりやすい例えを使って説明して」といった役割を与えているのみです。
そのため、モデルの学習データを基に回答がされます。

二回目の実行では、PDF(text_p13.pdf)をアップロードした後、「PDFファイルの内容にある情報を基に回答する」といった役割を与えています。
そのため、PDFファイルにある「設定ミス」といったキーワードや「どのPDFファイルを参照したか」が回答に含まれています。
※画面録画ソフトの仕様上、ファイルドロップが写っていませんがPDFファイルをアップロードしています。

全体像

概念図は以下の通りです。

a.ユーザがアップロードしたファイルをベクトルとして保存する

b.ユーザの質問に対して保存したベクトルを用いて回答する

Prerequisites

こちらの記事で構築した環境が既にプロビジョニングされている前提で進めていきます。
概念図は以下の通りです。
フロントエンドをStatic Web Apps、バックエンドをAzure Functionsで構築した簡易的なチャットアプリケーションです

また、今回はローカルに保存されたPDFを対象とします。
次回以降、Azure Filesなど外部に保存されたファイルを対象とするアプリケーションを構築します。

今回対象とするPDFファイルは以下です。
クラウドサービス提供における情報セキュリティ対策ガイドライン（第3版)

1. Azure OpenAIのEmbeddingモデルをデプロイ

初めにモデルをデプロイします。
Azure OpenAI Studioに移動します。

本環境でデプロイしたモデルはtext-embedding-ada-002です。

エンドポイント・モデルデプロイ名・APIバージョンをStatic Web Appsの環境変数に設定します。
(※ APIキーは前回の構築作業にて設定済みであることを想定しています。)

APIのバージョンはAzureポータル上で表示できない(?)ため、公式ドキュメントに記載されていた値を設定しています。
以下のようなソースコードが紹介されていたため、2022-12-01をEMBEDDING_API_VERSIONとして設定しました。

openai.api_type = "azure"
openai.api_key = API_KEY
openai.api_base = RESOURCE_ENDPOINT
openai.api_version = "2022-12-01"

2. API経由でEmbeddingモデルを実行

モデルの実行にあたり、必要なパッケージ群をインストールします。

pip install num2words scipy pandas transformers tiktoken  matplotlib plotly scikit-learn

api/requirements.txt

# DO NOT include azure-functions-worker in this file
# The Python Worker is managed by Azure Functions platform
# Manually managing azure-functions-worker may cause unexpected issues

azure-functions
flask
openai
pypdf
num2words
scipy 
pandas 
transformers 
tiktoken
matplotlib
plotly 
scikit-learn

デプロイしたモデルをAPI経由で呼び出せるかテストを行います。
ローカルで実行するために環境変数を設定します。
(※お使いの環境に合わせて値を変更します。)

Static Web Appsの環境変数にENDPOINT追加しました。
(今更ですが、、エンドポイントも環境変数に格納しておきべきだと気付きました、、)

Windowsの場合

$Env:ENDPOINT = "your endpoint"
$Env:EMBEDDING_API_VERSION="2022-12-01"
$Env:EMBEDDING_MODEL_NAME="embedding-for-app"
$Env:OPEN_API_KEY="your api key"

環境変数の設定後、以下ソースコードを実行します。

import openai
import os

# Azure OpenAIの基本設定
openai.api_type = "azure"
openai.api_base = os.getenv('ENDPOINT')
openai.api_version = os.getenv('EMBEDDING_API_VERSION')

# Azure OpenAIのAPIキーとモデルデプロイ名を設定
openai.api_key = os.getenv('OPENAI_API_KEY')
self.model_name = os.getenv('EMBEDDING_MODEL_NAME')


def text_embedding(self, text_list):
    embeddings = []
    for text in text_list:
        res = openai.Embedding.create(
            input=text,
            engine=self.model_name
        )

        embeddings.append(res['data'][0]['embedding'])

    return embeddings


if __name__ == "__main__":
    ### Embeddingのテスト ###
    test_text_list = ["私はたけのこの里派閥です。", "これはテストです。"]
    embeddings = text_embedding(test_text_list)

出力例

[[-0.011617557145655155, -0.004620939493179321, -0.02420794405043125, -0.020217932760715485, -0.02161373198032379, 0.012160367332398891...

3. PDFファイルを読み込みベクトルをデータベースに保存

モデルを呼び出してtextをベクトル化することができたので、以下の処理を実装します。

PDFファイルを読み込む
デプロイしたEmbeddingモデルを使ってベクトル化
ベクトルをデータベースに保存

①PDFファイルを読み込む

まずは読み込むの部分です。

本環境では、LangChainを使用してChromaDBにベクトルを保存します。
(※物凄いスピードでバージョンアップしています。現時点ではLangChain=v0.0.170でした)

ローカル環境、Azure Functionsにパッケージを追加します。

pip install langchain

api/requirements.txt

# DO NOT include azure-functions-worker in this file
# The Python Worker is managed by Azure Functions platform
# Manually managing azure-functions-worker may cause unexpected issues

azure-functions
flask
openai
pypdf
num2words
scipy 
pandas 
transformers 
tiktoken
matplotlib
plotly 
scikit-learn
chromadb
langchain

読み込む部分を実装します。

PDFの読み込み

from langchain.document_loaders import PyPDFLoader
import os

data_folder = "../../../data"
file_name = "クラウドサービス提供における情報セキュリティ対策ガイドライン_ver3.pdf"

loader = PyPDFLoader(os.path.join(data_folder,file_name))
pages = loader.load_and_split()
pages[13]

### 出力 ###
Document(page_content='13 \n Ⅰ．1．はじめに  \nクラウドサービスの普及 及び高度化 並びに IoTの実用化に伴い、クラウドサービスの利用が拡\n⼤し、社会経済活動を⽀え

Document型のリストが返却されます。Documnet型は以下の属性を持っています。

page_content: 読み込んだ文字列
source: どのファイルから読み込んだか
page_num: 何ページ目か

上記ソースコードでは指定したファイルの内容が全て読み込まれるため、240ページ分がモデルへの入力となってしまいます。
デバッグ用に、13ページ目だけを切り出したテスト用PDFを作ります。

② 読み込んだ内容のベクトル化 & 保存

PDFファイルを読み込んで生成されたDocument型のリストをベクトルデータベースに保存します。
LangChainの使い方を調べる際、ドキュメントよりもソースコード読んだ方がいい気がしています、、

今回実現したいことは、「テキスト・ベクトル」の組をファイル名をキーとして保存することです。
実現できそうなクラス・関数を探っていきます。
ベクトルデータベースとしてChromaを使用するため、当たりをつけて読んでいきました。

するとfrom_documents関数を発見しました。
以下のような引数を与え、この関数を実行すればよさそうです。

documents: Documentのリスト
embedding: Embeddingsクラス (ここをAzure OpenAIのモデルに変更する)

langchain/vectorestores/chroma.py:385~

 def from_documents(
        cls: Type[Chroma],
        documents: List[Document],
        embedding: Optional[Embeddings] = None,
        ids: Optional[List[str]] = None,
        collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
        persist_directory: Optional[str] = None,
        client_settings: Optional[chromadb.config.Settings] = None,
        client: Optional[chromadb.Client] = None,  # Add this line
        **kwargs: Any,
    ) -> Chroma:
        """Create a Chroma vectorstore from a list of documents.

        If a persist_directory is specified, the collection will be persisted there.
        Otherwise, the data will be ephemeral in-memory.

        Args:
            collection_name (str): Name of the collection to create.
            persist_directory (Optional[str]): Directory to persist the collection.
            ids (Optional[List[str]]): List of document IDs. Defaults to None.
            documents (List[Document]): List of documents to add to the vectorstore.
            embedding (Optional[Embeddings]): Embedding function. Defaults to None.
            client_settings (Optional[chromadb.config.Settings]): Chroma client settings
        Returns:
            Chroma: Chroma vectorstore.
        """

以下のようなソースコードを実行します。

import os
import openai
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma


openai.api_type = "azure"
openai.api_base = os.getenv('ENDPOINT')
openai.api_version = os.getenv('EMBEDDING_API_VERSION')

# Azure OpenAIのAPIキーとモデルデプロイ名を設定
openai.api_key = os.getenv('OPENAI_API_KEY')
model_name = os.getenv('EMBEDDING_MODEL_NAME')


# PDFファイルを読み込む
data_folder = "../../../data"
loader = PyPDFLoader(os.path.join(data_folder, "test.pdf"))
documents = loader.load_and_split()

# ベクトル生成する際のモデルをAzure OpenAIに指定
embedding=OpenAIEmbeddings(
    deployment=model_name,
    openai_api_type="azure",
    openai_api_key=os.getenv('OPENAI_API_KEY'),
    openai_api_base=os.getenv('ENDPOINT'),
    openai_api_version=os.getenv('EMBEDDING_API_VERSION')
)

# ファイル名をキーとして、テキストとEmbeddingを保存する
db = Chroma.from_documents(
    documents=documents,
    embedding=embedding
)


### 出力 ###
Using embedded DuckDB without persistence: data will be transient

永続化オプションを指定しない場合、上記のような出力となりますが、
必要なデータがメモリ上に保存されています。

ドキュメント&ソースコードを読んでいると、
上記の流れをさらに抽象化するVectorstoreIndexCreatorというクラスがありました。

from langchain.indexes import VectorstoreIndexCreator

# ローダーを作成
data_folder = "../../../data"
loader = PyPDFLoader(os.path.join(data_folder, "test.pdf"))

# モデルを指定
embedding=OpenAIEmbeddings(
    deployment=model_name,
    openai_api_type="azure",
    openai_api_key=os.getenv('OPENAI_API_KEY'),
    openai_api_base=os.getenv('ENDPOINT'),
    openai_api_version=os.getenv('EMBEDDING_API_VERSION')
)

# モデルを指定しインデックス作成クラスをインスタンス化
index = VectorstoreIndexCreator(embedding=embedding).from_loaders([loader])

実質3行で、

PDFファイルを読み込み、
ベクトルに変換し、
ベクトルデータベースに保存

といった処理を実現してくれています。

4. ベクトルデータベースを用いた検索を試す

検索処理を実装します。流れは以下の通りです。
ユーザから質問が来た場合、

質問をベクトル化
似ているベクトルをベクトルデータベースから探す
検索結果のベクトルを文字列に変換
モデルに、1&3を入力し出力を得る
ユーザに出力を返却

ただ、LangChainの場合は上記の流れを抽象化しているため、
実際のソースコードはとても簡潔です。

from langchain.chat_models import AzureChatOpenAI
from langchain.chains import RetrievalQAWithSourcesChain

# 検索・回答を生成するモデルを指定
llm = AzureChatOpenAI(
    deployment_name="gpt-35-turbo-for-app",
    openai_api_type="azure",
    openai_api_key=os.getenv('OPENAI_API_KEY'),
    openai_api_base=os.getenv('ENDPOINT'),
    openai_api_version = os.getenv('API_VERSION'),
    temperature=0,
)

# 検索・参照先ファイルを出力するチェーンを作成
retriever = db.as_retriever()
chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever
)

# チェーン実行
query = "クラウドセキュリティについて50文字でまとめて"
chain({chain.question_key: query})


### 出力 ###
{'question': 'クラウドセキュリティについて50文字でまとめて',
 'answer': 'クラウドセキュリティは重要であり、情報漏洩や障害が多く報告されている。設定ミスや不十分な変更管理も問題。 (50 characters)\n',
 'sources': '../../../data\\test_p13.pdf'}

sourcesのドキュメント内容に基づいてまとめて出力しています。
PDFには、設定ミスや不十分な変更管理であるといった記述を確認できます。

5. Webアプリケーションに統合

① Static Web AppsにAPI呼び出し機能を追加

具体的には、以下の機能を実装します。

ファイルをドラッグ&ドロップする
ドロップしたファイルを用いて/api/save_pdfにPOSTリクエストを送信
PDFに対する質問を行う際は、/api/question_to_pdfにPOSTリクエストを送信

※フロントエンドのデザインは9割ほどChatGPTに任せています。

dev-app/src/components/ChatOpenAI.vue

<template>
  <div>
    <div class="chat-container">
      <div 
        v-for="message in prev_messages" 
        :key="message.id" 
        :class="['chat-message', message.isResponse ? 'response' : 'user']"
        >
        <span class="chat-message-text">{{ message.text }}</span>
      </div>
    </div>
    <div class="input-and-dropzone">
      <div class="input-container">
        <input
          v-model="question"
          type="text"
          @keydown.enter="ExecOpenAIChat(question)"
          placeholder="質問を入力してください"
          class="chat-input"
        />
        <button 
          @click="ExecOpenAIChat(question)" 
          class="chat-button"
          :class="{ 'button-hover': isButtonHovered }"
          @mouseover="isButtonHovered = true"
          @mouseout="isButtonHovered = false"
        >
          送信</button>
      </div>

      <button class="toggle-dropzone-button"
         @click="toggleDropzone"
         :class="{'dropzone-enable': showDropzone}"
      >
        ChatPDF
      </button>
      
      <div v-if="showDropzone" class="dropzone"
        :class="{ active: isDragging }"
        @dragover.prevent="handleDragOver"
        @dragenter.prevent="handleDragEnter"
        @dragleave.prevent="handleDragLeave"
        @drop.prevent="handleDrop"
      >
        ファイルをドラッグ&ドロップ
      </div>

      <div v-if="showDropzone" class="filename-area">
        <pre>{{ file_name }}</pre>
      </div>
  </div>
</div>
</template>

<script>
    const axios = require('axios').create();

    export default{
        name: "ExecOpenAI",
        data:() => ({
            post_data:{
                question: ""
            },
            response: "",
            prev_messages:[],
            message:{
                id: 0,
                text: "",
                isResponse: false
            },
            isButtonHovered: false,
            showDropzone: false,
            file_name: "現在対話中のファイル名:\nNot-Found"
        }),
        methods: {
            ExecOpenAIChat: async function (question){
              // Azure OpenAIの呼び出し
              this.post_data.question = question;
              // ユーザの入力データを会話履歴に追加
              const UserMessage = {
                  id: Date.now(),
                  text: question,
                  isResponse: false
              }
              this.prev_messages.push(UserMessage);
              
              let res = ""
              // PDFモードがONになっている場合は/api/question_to_pdfに流す
              if (this.showDropzone){
                res = await axios.post("/api/question_to_pdf", this.post_data)
              }
              else{
                res = await axios.post("/api/question", this.post_data)
              }

              console.log(res)

              // 回答を会話履歴に追加
              const ResponseMessage = {
                    id: Date.now(),
                    text: res.data['message'],
                    isResponse: true
              }
      
              this.prev_messages.push(ResponseMessage);
            },
          toggleDropzone() {
            this.showDropzone = !this.showDropzone;
          },
          handleDrop: async function(e){
                this.file_name = "Uploading..."
                
                // ドロップされたファイルを基にEmbeddingを生成する
                const formData = new FormData();
                formData.append("file", e.dataTransfer.files[0])
                const res = await axios.post("/api/save_pdf", formData);

                this.file_name = "現在対話中のファイル名:\n" + res.data["message"]
          }
        }
    }
</script>


<style scoped>
.chat-container {
  max-height: 600px;
  overflow-y: scroll;
  border: 1px solid #6d6a6a;
  background-color: #555252; /* ダークモードの背景色 */
  padding: 10px;
}

.chat-message {
  padding: 10px;
  margin-bottom: 10px;
  display: flex;
}

.response {
  justify-content: flex-start;
}

.user {
  justify-content: flex-end;
}

.chat-message-text {
  word-wrap: break-word;
  max-width: 80%;
  padding: 8px;
  border-radius: 10px;
  background-color: #353333;
  color: #dad5d5;
  text-align: left;
  font-family: "Yu Gothic", "ヒラギノ角ゴシック", "Hiragino Sans", "メイリオ", Meiryo, sans-serif;
  font-size: 18px
}

.response .chat-message-text {
  background-color: #2e2d2d;
}

.user .chat-message-text {
  background-color: #2e2d2d;
}

.input-and-dropzone {
  display: flex;
  align-items: center;
  width: 100%;
}

.input-container {
  display: flex;
  margin-top: auto;
  position: sticky;
  bottom: 0;
  left: 30px;
  right: 0;
  padding: 20px;
  height: 60px;
  width: 60%;
  background-color: #1a1919; /* ダークモードの背景色 */
}

.chat-input {
  flex: 1;
  padding: 5px;
  border: 1px solid #3b3b3b; /* 入力欄の枠線の色 */
  color: #0e0d0d; /* 入力欄の文字色 */
}

.chat-button {
  margin-left: 5px;
  padding: 8px 15px;
  font-size: 16px;
  background-color: #4477c9; /* ボタンの背景色 */
  color: #ffffff;
  border: none;
  border-radius: 5px;
  transition: background-color 0.3s ease;
}

.chat-button:hover {
  background-color: #9fb2da; /* マウスが重なった時の背景色 */
}


.dropzone {
  width: 15%;
  height: 50px;
  border: 2px dashed #ccc;
  display: flex;
  align-items: center;
  justify-content: center;
  font-size: 12px;
  text-align: center;
  color: #cccccc;
  margin: 10px 10px;
  background-color: #5a5c5e;
}

.dropzone.active {
    background-color: #1065e4;
}

.toggle-dropzone-button {
  background-color: #b9b4b4;
  color: #0f0f0f;
  border: none;
  border-radius: 5px;
  padding: 10px 10px;
  font-size: 14px;
  cursor: pointer;
  margin: 10px 30px;
}

.dropzone-enable{
  background-color: #403eca;
  color: #f1efef;
  border: none;
  border-radius: 5px;
  padding: 10px 10px;
  font-size: 14px;
  cursor: pointer;
  margin: 10px 30px;
}

.filename-area {
  display: flex;
  align-items: center;
  justify-content: center;
  width: 15%;
  height: 50%;
  background-color: #f0f5ff;
}
</style>

② Azure Functionsに機能を追加

Azure Functionsに以下の機能を実装します。

ファイルがアップロードされた場合 (a)
- 1.ファイルを読み込み文字列に変換
- 2.文字列をベクトル化
- 3.ベクトルをベクトルデータベースに保存
ユーザから質問が来た場合 (b)
- 1.質問をベクトル化
- 2.似ているベクトルをベクトルデータベースから探す
- 3.検索結果のベクトルを文字列に変換
- 4.モデルに、1+3を入力し出力を得る
- 5.ユーザに出力を返却

実際の作業は手順4までで作成したものを統合していく形になります。

/api/questionのパスでPOSTリクエストが来た際に、(a)の処理を実行
/api/question_to_pdfのパスでPOSTリクエストが来た際に、(b)の処理を実行

api\flask_app\modules\pdf_chat.py

import openai
import os
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import RetrievalQAWithSourcesChain

class PdfChatClient:
    def __init__(self):
        # Azure OpenAIの基本設定
        openai.api_type = "azure"
        openai.api_base = os.getenv('ENDPOINT')
        openai.api_version = os.getenv('EMBEDDING_API_VERSION')
        
        # Azure OpenAIのAPIキーとモデルデプロイ名を設定
        openai.api_key = os.getenv('OPENAI_API_KEY')
        self.model_name = os.getenv('EMBEDDING_MODEL_NAME')

        # ベクトルデータベースの初期化
        self.db = None


    def create_vector(self, file_path):
        try:
            loader = PyPDFLoader(file_path)
            documents = loader.load_and_split()

            # ベクトル生成する際のモデルをAzure OpenAIに指定
            embedding=OpenAIEmbeddings(
                deployment=self.model_name
            )

            # ファイル名をキーとして、テキストとEmbeddingを保存する
            self.db = Chroma.from_documents(
                documents=documents,
                embedding=embedding
            )

            print(file_path)
            return f'{os.path.basename(file_path)}'
         
        except Exception as e:
            print(str(e))
            return 'Server Error.'
        
    def answer_from_pdf(self, question):
        try:
            # 検索・回答を生成するモデルを指定
            llm = AzureChatOpenAI(
                deployment_name=os.getenv('MODEL_NAME'),
                openai_api_version=os.getenv('API_VERSION'),
                openai_api_base=os.getenv('ENDPOINT'),
                temperature=0,
            )

            # 検索を実行&参照先ファイルを出力するチェーンを作成
            retriever = self.db.as_retriever()
            chain = RetrievalQAWithSourcesChain.from_chain_type(
                llm=llm,
                chain_type="stuff",
                retriever=retriever
            )

            # チェーン実行
            res = chain({chain.question_key: question})
            print(res)
            answer = res['answer']

            return answer
        
        except Exception as e:
            print(str(e))
            return 'Server Error.'    


if __name__ == "__main__":

    test = PdfChatClient()

トラブルシューティング

OpenAIEmbeddings()の仕様？

以下のソースコードを実行した際、openai_api_keyやopenai_api_versionなどが反映されていませんでした。
（現状、この現象を確認しているのは私の環境のみですが、、）

openai.api_type = "azure"
openai.api_base = os.getenv('ENDPOINT')
openai.api_version = os.getenv('EMBEDDING_API_VERSION')

# Azure OpenAIのAPIキーとモデルデプロイ名を設定
openai.api_key = os.getenv('OPENAI_API_KEY')
model_name = os.getenv('EMBEDDING_MODEL_NAME')

embedding=OpenAIEmbeddings(deployment=model_name)
print(embedding)

そのため、以下のように編集した結果うまく動作しました。

OpenAIEmbeddings()の引数を編集

embedding=OpenAIEmbeddings(
    deployment=model_name,
    openai_api_type="azure",
    openai_api_key=os.getenv('OPENAI_API_KEY'),
    openai_api_base=os.getenv('ENDPOINT'),
    openai_api_version=os.getenv('EMBEDDING_API_VERSION')
)

その他・ソースコード解読

どんなプロンプトが実行されているの？

以下ソースコードに検索結果から回答を生成する部分のプロンプトがありました。
「こんな感じのドキュメントが与えられて、ユーザの質問が来たら、こう答えるんですよ」
といった指示をしているみたいです。
検索実行の度に毎回入力されるってことですよね、、

# flake8: noqa
from langchain.prompts import PromptTemplate

template = """Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES"). 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.

QUESTION: Which state/country's law governs the interpretation of the contract?
=========
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in  relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an  injunction or other relief to protect its Intellectual Property Rights.
Source: 28-pl
Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other)  right or remedy.\n\n11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation  in force of the remainder of the term (if any) and this Agreement.\n\n11.8 No Agency. Except as expressly stated otherwise, nothing in this Agreement shall create an agency, partnership or joint venture of any  kind between the parties.\n\n11.9 No Third-Party Beneficiaries.
Source: 30-pl
Content: (b) if Google believes, in good faith, that the Distributor has violated or caused Google to violate any Anti-Bribery Laws (as  defined in Clause 8.5) or that such a violation is reasonably likely to occur,
Source: 4-pl
=========
FINAL ANSWER: This Agreement is governed by English law.
SOURCES: 28-pl

QUESTION: What did the president say about Michael Jackson?
=========
Content: Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.
Source: 0-pl
Content: And we won’t stop. \n\nWe have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life. \n\nLet’s use this moment to reset. Let’s stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease.  \n\nLet’s stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans.  \n\nWe can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.
Source: 24-pl
Content: And a proud Ukrainian people, who have known 30 years  of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards.  \n\nTo all Americans, I will be honest with you, as I’ve always promised. A Russian dictator, invading a foreign country, has costs around the world. \n\nAnd I’m taking robust action to make sure the pain of our sanctions  is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers. \n\nTonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.  \n\nAmerica will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies.  \n\nThese steps will help blunt gas prices here at home. And I know the news about what’s happening can seem alarming. \n\nBut I want you to know that we are going to be okay.
Source: 5-pl
Content: More support for patients and families. \n\nTo get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health. \n\nIt’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more.  \n\nARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more. \n\nA unity agenda for the nation. \n\nWe can do this. \n\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. \n\nIn this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. \n\nWe have fought for freedom, expanded liberty, defeated totalitarianism and terror. \n\nAnd built the strongest, freest, and most prosperous nation the world has ever known. \n\nNow is the hour. \n\nOur moment of responsibility. \n\nOur test of resolve and conscience, of history itself. \n\nIt is in this moment that our character is formed. Our purpose is found. Our future is forged. \n\nWell I know this nation.
Source: 34-pl
=========
FINAL ANSWER: The president did not mention Michael Jackson.
SOURCES:

QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER:"""
PROMPT = PromptTemplate(template=template, input_variables=["summaries", "question"])

EXAMPLE_PROMPT = PromptTemplate(
    template="Content: {page_content}\nSource: {source}",
    input_variables=["page_content", "source"],
)

どこから呼び出されているの？

呼び出し元はここです。

def _load_stuff_chain(
    llm: BaseLanguageModel,
    prompt: BasePromptTemplate = stuff_prompt.PROMPT,
    document_prompt: BasePromptTemplate = stuff_prompt.EXAMPLE_PROMPT,
    document_variable_name: str = "summaries",
    verbose: Optional[bool] = None,
    **kwargs: Any,
) -> StuffDocumentsChain:
    llm_chain = LLMChain(llm=llm, prompt=prompt, verbose=verbose)
    return StuffDocumentsChain(
        llm_chain=llm_chain,
        document_variable_name=document_variable_name,
        document_prompt=document_prompt,
        verbose=verbose,
        **kwargs,
    )

llm_chainについて

ドキュメントが与えられたときにどのように振舞うかを決めていますね。
先ほど上述したプロンプトです。

document_promptについて

こちらが、ドキュメントをどのような形式でプロンプトに埋め込むかを定義しているみたいですね。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up