データブリックス・ジャパン株式会社

LlamaParseを用いたマルチモーダルレポート生成をDatabricksで動かしてみる

Posted at 2025-08-18

こちらの動画が興味深かったです。説明もわかりやすいです。

コードがこちらにありましたので、Databricksで動かしてみました。

マルチモーダルレポート生成エージェント

このクックブックでは、研究レポートのバンクからマルチモーダルレポート生成エージェントを構築する方法を紹介します。ICLR論文セット（DeepLearning.aiのコースでも使用されたデータセット）を利用します。

ワークフロー抽象化を用いて、2つの主要フェーズを持つエージェントシステムを定義します：

リサーチフェーズ：チャンクレベルまたはファイルレベルで関連ファイルを取得
ブログ生成フェーズ：最終レポートを合成

セットアップ

import nest_asyncio

nest_asyncio.apply()

!pip install -U llama-index

まずは、下記でOpenAIとLlamaCloudのAPIキーを設定します。

# LLAMA_CLOUD_API_KEYを読み込みます
from dotenv import load_dotenv
load_dotenv()

import os

# OpenAIのAPIキーをDatabricksシークレットから取得して環境変数に設定します
os.environ["OPENAI_API_KEY"] = dbutils.secrets.get(scope="demo-token-takaaki.yayoi", key="openai_api_key")

モデルのセットアップ

下流のオーケストレーションで使用するモデルをセットアップします。

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# 埋め込みモデルとLLMを設定します
embed_model = OpenAIEmbedding(model="text-embedding-3-large")
llm = OpenAI(model="gpt-4o")

Settings.embed_model = embed_model
Settings.llm = llm

論文のロード・解析・インデックス化

ここでは、人気のICLR 2024論文11本をロードし、LlamaParseで解析します。

urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=LzPWWPAdY4",
    "https://openreview.net/pdf?id=VTF8yNQM66",
    "https://openreview.net/pdf?id=hSyW5go0v8",
    "https://openreview.net/pdf?id=9WD9KwssyT",
    "https://openreview.net/pdf?id=yV6fD7LYkF",
    "https://openreview.net/pdf?id=hnrB5YHoYu",
    "https://openreview.net/pdf?id=WbWtOYIzIK",
    "https://openreview.net/pdf?id=c5pwL0Soay",
    "https://openreview.net/pdf?id=TpD2aG1h0D",
]

# NOTE: より多くの論文でリサーチしたい場合はコメントアウトを外してください
papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "loftq.pdf",
    # "swebench.pdf",
    # "selfrag.pdf",
    # "zipformer.pdf",
    # "values.pdf",
    # "finetune_fair_diffusion.pdf",
    # "knowledge_card.pdf",
    # "metra.pdf",
    # "vr_mcl.pdf",
]

data_dir = "iclr_docs"

!mkdir "{data_dir}"
# Download each paper PDF
for url, paper in zip(urls, papers):
    !wget "{url}" -O "{data_dir}/{paper}"

mkdir: cannot create directory ‘iclr_docs’: File exists
--2025-08-18 01:27:30--  https://openreview.net/pdf?id=VtmBAGCN7o
Resolving openreview.net (openreview.net)... 35.184.86.251
Connecting to openreview.net (openreview.net)|35.184.86.251|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16911937 (16M) [application/pdf]
Saving to: ‘iclr_docs/metagpt.pdf’

iclr_docs/metagpt.p 100%[===================>]  16.13M  7.71MB/s    in 2.1s    

2025-08-18 01:27:33 (7.71 MB/s) - ‘iclr_docs/metagpt.pdf’ saved [16911937/16911937]

--2025-08-18 01:27:34--  https://openreview.net/pdf?id=6PmJoRfdaK
Resolving openreview.net (openreview.net)... 35.184.86.251
Connecting to openreview.net (openreview.net)|35.184.86.251|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1168720 (1.1M) [application/pdf]
Saving to: ‘iclr_docs/longlora.pdf’

iclr_docs/longlora. 100%[===================>]   1.11M   922KB/s    in 1.2s    

2025-08-18 01:27:36 (922 KB/s) - ‘iclr_docs/longlora.pdf’ saved [1168720/1168720]

--2025-08-18 01:27:36--  https://openreview.net/pdf?id=LzPWWPAdY4
Resolving openreview.net (openreview.net)... 35.184.86.251
Connecting to openreview.net (openreview.net)|35.184.86.251|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 366134 (358K) [application/pdf]
Saving to: ‘iclr_docs/loftq.pdf’

iclr_docs/loftq.pdf 100%[===================>] 357.55K   583KB/s    in 0.6s    

2025-08-18 01:27:37 (583 KB/s) - ‘iclr_docs/loftq.pdf’ saved [366134/366134]

from llama_cloud_services import LlamaParse

# LlamaParseの設定
parser = LlamaParse(
    parse_mode="parse_page_with_agent",
    model="anthropic-sonnet-3.5",
    high_res_ocr=True,
    take_screenshot=True,
    extract_charts=True,
)

※この処理は時間がかかる場合があります。結果をJSONで保存しておくことで、再解析せずに再利用できます。

既にテキストノードを保存済みの場合は、「LOAD」と記載されたセルまでスキップしてください。

from pathlib import Path
import nest_asyncio

nest_asyncio.apply()
import asyncio

out_image_dir = "out_iclr_images"
!rm -rf "{out_image_dir}"
!mkdir "{out_image_dir}"

# 論文ごとに解析結果と画像を保存する辞書
paper_dicts = {}

for paper_path in papers:
    paper_base = Path(paper_path).stem
    full_paper_path = str(Path(data_dir) / paper_path)
    parse_result = await parser.aparse(full_paper_path)
    image_path = str(Path(out_image_dir) / paper_base)
    await parse_result.asave_all_images(image_path)
    paper_dicts[paper_path] = {
        "paper_path": full_paper_path,
        "json_dicts": parse_result.pages,
        "image_path": image_path,
    }

Started parsing the file under job_id 64dd3d6c-71da-4d62-bb06-b60884dd78d5
Started parsing the file under job_id 77a481b0-f36d-4338-9599-0e2c11663478
Started parsing the file under job_id f5e8293b-b5c8-4e75-8bcc-accbb64929a8

テキストノードの取得

上記の辞書をTextNodeオブジェクトに変換し、ベクトルストアに格納できるようにします。

from llama_index.core.schema import TextNode
from typing import Optional

# このコードは、TextNodeクラスとOptional型をインポートします。

# 画像ファイル名からページ番号を抽出するユーティリティ関数
import re


def get_page_number(file_name):
    match = re.search(r"page_(\d+)\.(?:jpg|png)", str(file_name))
    match_2 = re.search(r"img_p(\d+)\_(\d+)\.(?:jpg|png)", str(file_name))
    if match:
        return int(match.group(1))
    elif match_2:
        return int(match_2.group(1))
    return 0


def _get_sorted_image_files(image_dir):
    """ページごとに画像ファイルをソートして取得します。"""
    raw_files = [f for f in list(Path(image_dir).iterdir()) if f.is_file()]
    images_by_page = {}

    for image_file in raw_files:
        page_num = get_page_number(image_file)
        idx = page_num - 1
        if idx not in images_by_page:
            images_by_page[idx] = []
        images_by_page[idx].append(str(image_file))
    return images_by_page

from copy import deepcopy
from pathlib import Path

# テキストノードに画像メタデータを付与します
def get_text_nodes(json_dicts, paper_path, image_dir=None):
    """ドキュメントをセパレータでノードに分割します。"""
    nodes = []
    sorted_image_files = (
        _get_sorted_image_files(image_dir) if image_dir is not None else None
    )
    md_texts = [d.md for d in json_dicts]

    for idx, md_text in enumerate(md_texts):
        chunk_metadata = {
            "page_num": idx + 1,
            "parsed_text_markdown": md_text,
            "paper_path": paper_path,
        }
        if sorted_image_files is not None:
            image_files = sorted_image_files[idx]
            chunk_metadata["image_paths"] = image_files
        chunk_metadata["parsed_text_markdown"] = md_text
        node = TextNode(
            text="",
            metadata=chunk_metadata,
        )
        nodes.append(node)

    return nodes

# すべての論文のノードを1つのリストにまとめます
all_text_nodes = []
text_nodes_dict = {}
for paper_path, paper_dict in paper_dicts.items():
    json_dicts = paper_dict["json_dicts"]
    text_nodes = get_text_nodes(
        json_dicts, paper_dict["paper_path"], image_dir=paper_dict["image_path"]
    )
    all_text_nodes.extend(text_nodes)
    text_nodes_dict[paper_path] = text_nodes

必要に応じて、これらのノードを保存できます。

# SAVE
import pickle

# テキストノード辞書を保存します
pickle.dump(text_nodes_dict, open("iclr_text_nodes.pkl", "wb"))

LOAD: 既にノードを保存済みの場合は、下記セルで既存ファイルからロードしてください。

# LOAD
import pickle

# 保存済みのテキストノード辞書を読み込みます
text_nodes_dict = pickle.load(open("iclr_text_nodes.pkl", "rb"))

# すべての論文ノードを1つのリストにまとめます
all_text_nodes = []
for paper_path, text_nodes in text_nodes_dict.items():
    all_text_nodes.extend(text_nodes)

# 例：4番目のノードの内容（メタデータ含む）を表示します
print(all_text_nodes[3].get_content(metadata_mode="all"))

結果

page_num: 4
parsed_text_markdown: 

Preprint

<table>
<tr>
<td colspan="2">

<table>
<tr>
<th>Name</th>
<td>Alex</td>
<th rowspan="4">Agent Profile</th>
</tr>
<tr>
<th>Profile</th>
<td>Engineer</td>
</tr>
<tr>
<th>Goal</th>
<td>Write elegant, readable,extensible, efficient code</td>
</tr>
<tr>
<th>Constraint</th>
<td>The code you write should conform to code standard like PEP8, be modular,easy to read and maintain</td>
</tr>
</table>

</td>
</tr>
<tr>
<td>

<table>
<tr>
<td>Architect<br>diagram tool</td>
<td>Project Manager<br>diagram tool</td>
</tr>
<tr>
<td>
msgA<br>
msgB<br>
msgC<br>
msgD
</td>
<td>
content: {Architect: Implementation appro...}<br>
instruct_content: "Data structures and in..."<br>
cause_by: WriteTasks<br>
sent_from: ProjectManager<br>
send_to: Engineer
</td>
</tr>
<tr>
<td colspan="2" align="center">Shared Message Pool</td>
</tr>
<tr>
<td>Product Manager<br>web search tool</td>
<td>QA Engineer</td>
</tr>
<tr>
<td colspan="2">Tools: web search tool debugging tool diagram tool</td>
</tr>
</table>

</td>
<td>

<table>
<tr>
<td>Engineer</td>
<td>Memory</td>
<td>Memory Retrieval</td>
</tr>
<tr>
<td colspan="3">Iterative Programming</td>
</tr>
<tr>
<td>
Write<br>
game.py
</td>
<td colspan="2">
## Code: game.py<br>
## game.py<br>
import random<br>
class Game:<br>
  def __init__(self, size=4):<br>
    self.size = size<br>
    self.score = 0<br>
    self.high_score = 0<br>
    self.board = [[0]*size for _ in range(size)]<br>
    self.game_over = False<br>
    self.start()
</td>
</tr>
<tr>
<td>Feedback</td>
<td colspan="2">
History Message<br>
PRD Document<br>
System Design<br>
Code
</td>
</tr>
<tr>
<td>Debug</td>
<td>Execution</td>
<td>Add Code</td>
</tr>
<tr>
<td colspan="3">Executable Feedback</td>
</tr>
<tr>
<td colspan="3">Structured Message(...)</td>
</tr>
</table>

</td>
</tr>
</table>

Figure 2: An example of the communication protocol (left) and iterative programming with executable feedback (right). Left: Agents use a shared message pool to publish structured messages. They can also subscribe to relevant messages based on their profiles. Right: After generating the initial code, the Engineer agent runs and checks for errors. If errors occur, the agent checks past messages stored in memory and compares them with the PRD, system design, and code files.

## 3 METAGPT: A META-PROGRAMMING FRAMEWORK

MetaGPT is a meta-programming framework for LLM-based multi-agent systems. Sec. 3.1 provides an explanation of role specialization, workflow and structured communication in this framework, and illustrates how to organize a multi-agent system within the context of SOPs. Sec. 3.2 presents a communication protocol that enhances role communication efficiency. We also implement structured communication interfaces and an effective publish-subscribe mechanism. These methods enable agents to obtain directional information from other roles and public information from the environment. Finally, we introduce executable feedback—a self-correction mechanism for further enhancing code generation quality during run-time in Sec. 3.3.

### 3.1 AGENTS IN STANDARD OPERATING PROCEDURES

**Specialization of Roles** Unambiguous role specialization enables the breakdown of complex work into smaller and more specific tasks. Solving complex tasks or problems often requires the collaboration of agents with diverse skills and expertise, each contributing specialized outputs tailored to specific issues.

In a software company, a Product Manager typically conducts business-oriented analysis and derives insights, while a software engineer is responsible for programming. We define five roles in our software company: Product Manager, Architect, Project Manager, Engineer, and QA Engineer, as shown in Figure 1. In MetaGPT, we specify the agent's profile, which includes their name, profile, goal, and constraints for each role. We also initialize the specific context and skills for each role. For instance, a Product Manager can use web search tools, while an Engineer can execute code, as shown in Figure 2. All agents adhere to the React-style behavior as described in Yao et al. (2022).

Every agent monitors the environment (i.e., the message pool in MetaGPT) to spot important observations (e.g., messages from other agents). These messages can either directly trigger actions or assist in finishing the job.

**Workflow across Agents** By defining the agents' roles and operational skills, we can establish basic workflows. In our work, we follow SOP in software development, which enables all agents to work in a sequential manner.
paper_path: iclr_docs/metagpt.pdf
image_paths: ['out_iclr_images/metagpt/page_4.jpg', 'out_iclr_images/metagpt/img_p4_1.png']

インデックスの構築

テキストノードが準備できたら、ベクトルストアインデックス抽象化に投入し、インメモリのベクトルストアにインデックス化します（40以上のベクトルストア統合もぜひご覧ください！）。

ベクトルインデックスに加え、論文パスとサマリーインデックスのマッピングも保存します。これにより、ドキュメント単位の検索（関連するチャンクを論文単位で取得）が可能です。

import os
from llama_index.core import (
    StorageContext,
    SummaryIndex,
    VectorStoreIndex,
    load_index_from_storage,
)

# ベクトルインデックスの構築と保存
if not os.path.exists("storage_nodes_papers"):
    index = VectorStoreIndex(all_text_nodes)
    # インデックスをディスクに保存
    index.set_index_id("vector_index")
    index.storage_context.persist("./storage_nodes_papers")
else:
    # ストレージコンテキストを再構築
    storage_context = StorageContext.from_defaults(persist_dir="storage_nodes_papers")
    # インデックスをロード
    index = load_index_from_storage(storage_context, index_id="vector_index")

# サマリーインデックスの辞書 - 論文パスごとにサマリーインデックスを保存
paper_summary_indexes = {
    paper_path: SummaryIndex(text_nodes_dict[paper_path]) for paper_path in papers
}

ツールの定義

エージェントで使用する2つのツール（チャンクレベル検索・ドキュメント検索）を定義します。

from llama_index.core.tools import FunctionTool
from llama_index.core.schema import NodeWithScore
from typing import List

# 関数ツール

def chunk_retriever_fn(query: str) -> List[NodeWithScore]:
    """コーパスから関連するドキュメントチャンクを少数取得します。

    知識コーパスから特定の事実を調べたいリサーチクエスチョンにのみ使用してください。
    ドキュメント全体が不要な場合に推奨されます。
    """
    retriever = index.as_retriever(similarity_top_k=5)
    nodes = retriever.retrieve(query)
    return nodes


def _get_document_nodes(
    nodes: List[NodeWithScore], top_n: int = 2
) -> List[NodeWithScore]:
    """チャンクノードからドキュメントノードを取得します。

    チャンクノードからドキュメントを「デリファレンス」し、単純な重み付け関数（累積合計）で順序を決定します。
    top_nでカットオフします。
    """
    paper_paths = {n.metadata["paper_path"] for n in nodes}
    paper_path_scores = {f: 0 for f in paper_paths}
    for n in nodes:
        paper_path_scores[n.metadata["paper_path"]] += n.score

    # スコア順にpaper_path_scoresを降順ソート
    sorted_paper_paths = sorted(
        paper_path_scores.items(), key=itemgetter(1), reverse=True
    )
    # 上位top_nの論文パスを取得
    top_paper_paths = [path for path, score in sorted_paper_paths[:top_n]]

    # サマリーインデックスを使って各論文パスからノードを取得
    all_nodes = []
    for paper_path in top_paper_paths:
        # NOTE: retrieverへの入力は空でもOK
        all_nodes.extend(
            paper_summary_indexes[Path(paper_path).name].as_retriever().retrieve("")
        )

    return all_nodes


def doc_retriever_fn(query: str) -> float:
    """コーパスからドキュメント全体を取得するドキュメントリトリーバー。

    ドキュメント全体の検索が必要なリサーチクエスチョンにのみ使用してください。
    チャンクレベルの検索より遅く高コストですが、必要な場合があります。
    """
    retriever = index.as_retriever(similarity_top_k=5)
    nodes = retriever.retrieve(query)
    return _get_document_nodes(nodes)


chunk_retriever_tool = FunctionTool.from_defaults(fn=chunk_retriever_fn)
doc_retriever_tool = FunctionTool.from_defaults(fn=doc_retriever_fn)

ワークフローの構築

インデックスができたので、レポート生成ワークフローを構築します。

ワークフローの主な流れ：

リサーチ収集：エージェントがどのツール（チャンク・ドキュメント）を使うべきか推論し、情報を収集。収集した情報は辞書にまとめて各ステップで共有。十分な情報が集まったら次のフェーズへ。
レポート生成：収集したリサーチを元にレポートを生成。現状はサマリーインデックスを使い、できるだけ多くの情報をコンテキストウィンドウに詰め込む。

この実装はFunction Calling Agent workflowを参考にしています。

from llama_index.llms.openai import OpenAI

from pydantic import BaseModel, Field
from typing import List
from IPython.display import display, Markdown, Image


class TextBlock(BaseModel):
    """テキストブロック。"""
    text: str = Field(..., description="このブロックのテキスト。")


class ImageBlock(BaseModel):
    """画像ブロック（ファイル名に 'img' を含む画像を優先）"""
    file_path: str = Field(..., description="画像のファイルパス。")


class ReportOutput(BaseModel):
    """レポートのデータモデル。
    テキストと画像ブロックを混在させることができます。必ず1つ以上の画像ブロックを含めてください。
    """
    blocks: List[TextBlock | ImageBlock] = Field(
        ..., description="テキストと画像ブロックが交互に並ぶリスト。"
    )

    def render(self) -> None:
        """HTMLとしてページにレンダリングします。"""
        for b in self.blocks:
            if isinstance(b, TextBlock):
                display(Markdown(b.text))
            else:
                display(Image(filename=b.file_path))

report_gen_system_prompt = """\
あなたは、解析済みコンテキストを元に、フォーマットされたレポートを生成するアシスタントです。

1つ以上のレポートから得られる解析済みテキストコンテキストが与えられます。

あなたは、テキストと画像ブロックが交互に並ぶレポートを生成する責任があります。
画像ブロックにはファイルパスを記載してください。

どの画像を使うべきかは、各コンテキストチャンクのメタデータに画像のファイルパスが含まれています。
テーブルが多いなど、視覚的要素が強いチャンクの画像のみを含めてください。
必ず1つ以上の画像ブロックを含めてください。

必ずツールコール形式で出力してください。通常のテキストは返さないでください。
"""
report_gen_llm = OpenAI(model="gpt-4o", system_prompt=report_gen_system_prompt)
report_gen_sllm = llm.as_structured_llm(output_cls=ReportOutput)

from llama_index.core.workflow import Workflow

from typing import Any, List
from operator import itemgetter

from llama_index.core.llms.function_calling import FunctionCallingLLM
from llama_index.core.llms.structured_llm import StructuredLLM
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.llms import ChatMessage
from llama_index.core.tools.types import BaseTool
from llama_index.core.tools import ToolSelection
from llama_index.core.workflow import Workflow, StartEvent, StopEvent, Context, step
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import CompactAndRefine
from llama_index.core.workflow import Event


class InputEvent(Event):
    input: List[ChatMessage]


class ChunkRetrievalEvent(Event):
    tool_call: ToolSelection


class DocRetrievalEvent(Event):
    tool_call: ToolSelection


class ReportGenerationEvent(Event):
    pass


class ReportGenerationAgent(Workflow):
    """レポート生成エージェント。"""

    def __init__(
        self,
        chunk_retriever_tool: BaseTool,
        doc_retriever_tool: BaseTool,
        llm: FunctionCallingLLM | None = None,
        report_gen_sllm: StructuredLLM | None = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(**kwargs)
        self.chunk_retriever_tool = chunk_retriever_tool
        self.doc_retriever_tool = doc_retriever_tool

        self.llm = llm or OpenAI()
        self.summarizer = CompactAndRefine(llm=self.llm)
        assert self.llm.metadata.is_function_calling_model

        self.report_gen_sllm = report_gen_sllm or self.llm.as_structured_llm(
            ReportOutput, system_prompt=report_gen_system_prompt
        )
        self.report_gen_summarizer = CompactAndRefine(llm=self.report_gen_sllm)

        self.memory = ChatMemoryBuffer.from_defaults(llm=llm)
        self.sources = []

    @step
    async def prepare_chat_history(self, ctx: Context, ev: StartEvent) -> InputEvent:
        # ソースをクリア
        self.sources = []

        await ctx.store.set("stored_chunks", [])
        await ctx.store.set("query", ev.input)

        # ユーザー入力を取得
        user_input = ev.input
        user_msg = ChatMessage(role="user", content=user_input)
        self.memory.put(user_msg)

        # チャット履歴を取得
        chat_history = self.memory.get()
        return InputEvent(input=chat_history)

    @step
    async def handle_llm_input(
        self, ctx: Context, ev: InputEvent
    ) -> ChunkRetrievalEvent | DocRetrievalEvent | ReportGenerationEvent | StopEvent:
        chat_history = ev.input

        response = await self.llm.achat_with_tools(
            [self.chunk_retriever_tool, self.doc_retriever_tool],
            chat_history=chat_history,
        )
        self.memory.put(response.message)

        tool_calls = self.llm.get_tool_calls_from_response(
            response, error_on_no_tool_call=False
        )
        if not tool_calls:
            # すべてのコンテンツはコンテキストに保存されているので、そのままinputを渡す
            return ReportGenerationEvent(input=ev.input)

        for tool_call in tool_calls:
            if tool_call.tool_name == self.chunk_retriever_tool.metadata.name:
                return ChunkRetrievalEvent(tool_call=tool_call)
            elif tool_call.tool_name == self.doc_retriever_tool.metadata.name:
                return DocRetrievalEvent(tool_call=tool_call)
            else:
                return StopEvent(result={"response": "無効なツールです。"})

    @step
    async def handle_retrieval(
        self, ctx: Context, ev: ChunkRetrievalEvent | DocRetrievalEvent
    ) -> InputEvent:
        """検索処理。

        取得したチャンクを保存し、エージェントの推論ループに戻ります。
        """
        query = ev.tool_call.tool_kwargs["query"]
        if isinstance(ev, ChunkRetrievalEvent):
            retrieved_chunks = self.chunk_retriever_tool(query).raw_output
        else:
            retrieved_chunks = self.doc_retriever_tool(query).raw_output
        stored_chunks = await ctx.store.get("stored_chunks") or []
        stored_chunks.extend(retrieved_chunks)
        new_stored = await ctx.store.set("stored_chunks", stored_chunks)

        # クエリに対して回答を合成し、LLMに返します。
        response = self.summarizer.synthesize(query, nodes=retrieved_chunks)
        self.memory.put(
            ChatMessage(
                role="tool",
                content=str(response),
                additional_kwargs={
                    "tool_call_id": ev.tool_call.tool_id,
                    "name": ev.tool_call.tool_name,
                },
            )
        )

        # 更新されたチャット履歴でInputEventを返す
        return InputEvent(input=self.memory.get())

    @step
    async def generate_report(
        self, ctx: Context, ev: ReportGenerationEvent
    ) -> StopEvent:
        """レポートを生成します。"""
        # すべてのコンテキストからクエリを生成
        query = await ctx.store.get("query")
        nodes = await ctx.store.get("stored_chunks")
        response = self.report_gen_summarizer.synthesize(query, nodes=nodes)

        return StopEvent(result={"response": response})

# レポート生成エージェントのインスタンスを作成します
agent = ReportGenerationAgent(
    chunk_retriever_tool,
    doc_retriever_tool,
    llm=report_gen_llm,
    report_gen_sllm=report_gen_sllm,
    verbose=True,
    timeout=120.0,
)

# MetaGPTの実験手法を分析するレポート生成を依頼します
ret = await agent.run(
    input="MetaGPTの実験手法を分析するレポートを作成してください"
)

Running step prepare_chat_history
Step prepare_chat_history produced event InputEvent
Running step handle_llm_input
Step handle_llm_input produced event ChunkRetrievalEvent
Running step handle_retrieval
Step handle_retrieval produced event InputEvent
Running step handle_llm_input
Step handle_llm_input produced event ReportGenerationEvent
Running step generate_report
Step generate_report produced event StopEvent

# レポートのレンダリング結果を表示します
ret["response"].response.render()

原文のレポートは当然英語ですが、それ以外の説明文は日本語で返ってきています。

個人的にはレポート生成はエージェント活用の有望なユースケースだと思っているので、もっと勉強してみます。

はじめてのDatabricks

Databricks無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up