OpenAI Agents SDK で人間に問い合わせる UI を作ってみた(Gradio版)

Last updated at 2025-04-01Posted at 2025-04-01

はじめに

OpenAI Agents SDK が 2025 年 3 月 11 日にリリースされました。この記事では、公式ドキュメントを参考にして実際に動作を確認しながら、特に重要なポイントや興味深い点を深掘りしていきます。

検証に使用したライブラリのバージョンは openai-agents==0.0.6 です。

OpenAI Agents SDK の紹介

別のページでご紹介します。

いきなり結論

AI エージェントが人間を呼び出す仕組みを作ります。

コンソールのメッセージです。見やすいように整形しています。

$ uv run python src/tools_human_with_gradio.py
* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
観測された人間の応答: むりです。知らないもん。
交流のコンテキスト: HumanInteractContext(
    system_to_human_message='あなたに質問です。\n今の東京の天気を教えていただけますか？',
    human_to_system_message='むりです。知らないもん。',
    status='human_responded',
    event=<asyncio.locks.Event object at 0x781e753d33d0 [set]>
)
処理終了です。Ctrl-C で抜けてください

目的

AI エージェントに対する期待は高まっていますが、全ての判断を任せるのは難しいと思います。例えば人生を左右するような重要な判断やお金が関わる判断は、人間の判断を入れたくなると思います。

この記事では、AI エージェントから人間を呼び出し、呼び出された側では Gradio UI でメッセージを授受するサンプルを紹介します。いわゆる Human-in-the-Loop です。呼び出された人間は、AI エージェントの動きを望ましいものにするため、適切な判断や情報提供を行います。

OpenAI Agents SDK には Tool 機能があり、function_tool デコレーターや FunctionTool クラスで任意の関数を実行することができます。Agent が関数を利用するタイミングや関数の引数は、LLM の Function Calling 機能を使って取得します。

LangChain にも AI エージェントから人間を利用する機能があり、公式ドキュメントの Human as a tool では「人間は AGI なので、AI エージェントが混乱したときにそれを支援するツールとして確実に使用できます。(意訳)」とのことです。

どちらも、人間を「入出力をする何か」という意味合いでツールの一種として扱われているのが興味深いです。自律動作する AI エージェントからみると人間は入出力する LLM の一種に見えるのかもしれません。

設計

動作フローをシーケンス図のようにしました。HumanInteractManager クラスが Agent と Gradio(人間側)の仲介をします。

対話のステータスの状態遷移です。HumanInteractContext がステータスを保持します。

実装

サンプルコード(長いので折りたたんでます)

src/tools_human_with_gradio.py

import asyncio
import functools
from dataclasses import dataclass, field
from typing import Any, Dict, Literal, Optional

import gradio as gr
from agents import (
    Agent,
    ModelSettings,
    RunContextWrapper,
    Runner,
    function_tool,
)

HUMAN_INPUT_TIMEOUT = 300


@dataclass
class HumanInteractContext:
    """
    人間とシステムが交流するためのコンテキストオブジェクト
    System -> human -> System のメッセージのやり取りと状態を保持
    """

    system_to_human_message: Optional[str] = None
    human_to_system_message: Optional[str] = None
    status: Literal["idle", "waiting_for_human", "human_responded"] = "idle"
    event: asyncio.Event = field(default_factory=asyncio.Event)


class HumanInteractManager:
    """
    人間とシステムが交流するための方法と対話の状態を管理。
    """

    def __init__(self, loop: asyncio.AbstractEventLoop) -> None:
        self.context: HumanInteractContext = HumanInteractContext()
        self._loop: asyncio.AbstractEventLoop = loop

    def send_system_to_human(self, message: str) -> None:
        """システムから人間へメッセージを送る"""
        self.context.system_to_human_message = message
        self.context.status = "waiting_for_human"

    def receive_system_to_human(self) -> str:
        """システムから人間へのメッセージを受けとる"""
        return self.context.system_to_human_message

    def send_human_to_system(self, message: str) -> None:
        """人間からシステムへメッセージを送る"""
        self.context.human_to_system_message = message
        self.context.status = "human_responded"
        if self._loop and self._loop.is_running():
            self._loop.call_soon_threadsafe(self.context.event.set)

    def receive_human_to_system(self) -> str:
        """人間からシステムへのメッセージを受けとる"""
        message = self.context.human_to_system_message
        self.set_status_idle()
        return message

    def set_status_idle(self) -> None:
        """会話のステイタスを idle に変更する"""
        self.context.status = "idle"


class HumanToSystemTimeoutException(Exception):
    """人間の入力がタイムアウトした場合に raise される例外"""

    def __init__(self, message: str = ""):
        super().__init__(message)


@function_tool
async def ask_to_human(run_ctx: RunContextWrapper[Any], question: str) -> str:
    """システムから人間へ問い合わせを行い結果を取得する関数

    Args:
        question: システムから人間への質問の文字列
    """
    interact_manager: HumanInteractManager = run_ctx.context
    interact_manager.context.event.clear()
    interact_manager.send_system_to_human(f"あなたに質問です。\n{question}")
    try:
        await asyncio.wait_for(
            interact_manager.context.event.wait(), timeout=HUMAN_INPUT_TIMEOUT
        )
        response = interact_manager.receive_human_to_system()
    except asyncio.TimeoutError:
        print(
            "人間の応答がありませんでした: Timeout "
            f"{HUMAN_INPUT_TIMEOUT} 秒"
        )
        interact_manager.set_status_idle()
        # funcion_tool は例外を吸収するようので適当な例外を飛ばす
        raise HumanToSystemTimeoutException("人間の応答がありませんでした")
    return response


class GradioUserInterface:
    """Gradio UI の定義"""

    def __init__(self):
        # カレントスレッドを取得
        self.loop = asyncio.get_running_loop()

        # Human Input のコンテキストを定義
        self.interact_manager = HumanInteractManager(self.loop)

    def _update_ui_components(self, ctx: HumanInteractContext):
        """現在のコンテキストに基づいて UI コンポーネントを返す"""
        is_waiting = ctx.status == "waiting_for_human"

        # 入力用 Textbox の設定
        input_textbox_kwargs = {
            "interactive": is_waiting,
            "placeholder": "回答を入力してください" if is_waiting else "",
        }
        if not is_waiting:
            input_textbox_kwargs["value"] = ctx.human_to_system_message

        # UI コンポーネントを作成して返す
        return (
            gr.Textbox(value=ctx.system_to_human_message, interactive=False),
            gr.Textbox(**input_textbox_kwargs),
            gr.Button(interactive=is_waiting),
            gr.Textbox(value=ctx.status, interactive=False),
        )

    def set_components(self) -> gr.Blocks:
        """Gradio UI の定義とコールバックの設定"""

        async def check_interact_update():
            """context の更新を確認"""
            return self._update_ui_components(self.interact_manager.context)

        async def submit_message(human_to_system_message: str):
            """Human Input を System へ送信"""

            if (
                self.interact_manager.context.status == "waiting_for_human"
                and str(human_to_system_message) != ""
            ):
                # 人間の入力内容を tool へ送信
                self.interact_manager.send_human_to_system(
                    human_to_system_message
                )

            return self._update_ui_components(self.interact_manager.context)

        # Gradio の UI を定義
        with gr.Blocks() as demo:
            output_text = gr.Textbox(label="システム -> 人間")
            input_text = gr.Textbox(label="人間 -> システム")
            submit_button = gr.Button("送信")
            status_text = gr.Textbox(label="コンテキストのステイタス")

            outputs = [output_text, input_text, submit_button, status_text]

            # gr.State() で監視したいところだが、event のポインタが
            # 渡らないため、やむを得ずポーリングしている
            gr.Timer(value=0.5).tick(
                fn=check_interact_update,
                inputs=[],
                outputs=outputs,
            )

            # 送信ボタン押下時に発火
            submit_button.click(
                submit_message,
                inputs=[input_text],
                outputs=outputs,
            )

        return demo

    def run_background(
        self, launch_config: Dict[str, Any] = {"share": False}
    ) -> None:
        """Gradio バックエンドをバックグラウンドで起動"""

        # Gradio コンポーネントを配置
        # demo にグローバルでアクセスできないので自動リロードが無効
        demo = self.set_components()

        # Gradio をバックグラウンドで起動
        self.loop.run_in_executor(
            None, functools.partial(demo.launch, launch_config)
        )


async def main():
    # Gradio Components を配置
    gradio_user_interface = GradioUserInterface()

    # Gradio をバックグラウンドで実行
    gradio_user_interface.run_background()

    # Gradio の起動待ちで軽くスリープ(簡易的な実装)
    await asyncio.sleep(3)

    # エージェントを定義
    human_agent = Agent(
        name="人間に質問するエージェント",
        instructions="人間に質問の見解を問います。"
        "相手には感情があるのでできる限り丁寧に聞いてください。",
        tools=[ask_to_human],
        tool_use_behavior="stop_on_first_tool",
        model_settings=ModelSettings(tool_choice="required"),
    )

    # 人間へ問い合わせるエージェントを実行
    interact_manager = gradio_user_interface.interact_manager
    result = await Runner.run(
        human_agent,
        input="今の東京の天気を言いやがれ？",
        context=interact_manager,
    )

    # 結果を表示
    print(f"観測された人間の応答: {result.final_output}")
    print(f"交流のコンテキスト: {interact_manager.context}")
    print("処理終了です。Ctrl-C で抜けてください")


if __name__ == "__main__":
    asyncio.run(main())

実装のポイント

対話状態の管理(HumanInteractManager、HumanInteractContext)
- HumanInteractContext
  - Agent と人間との間のメッセージと対話状態を保持するメモリー
- HumanInteractManager
  - HumanInteractContext のインスタンスを保持
  - Tool と Gradio とのインタフェース
  - 対話の動機をとるため asyncio.Event を利用
    - wait() でブロッキング
    - set() でブロッキング解除
Agent が呼び出す Tool(ask_to_human)
- function_tool デコレーターで Python の関数を Tool 化
- RunContextWrapper 経由で GradioUserInterface が持つ HumanInteractContext を参照
- タイムアウト処理で応答がない場合に Exception を出す
  - function_tool 内の Exception は「雑多なエラー」みたいな感じに吸収されてしまう
Gradio UI(GradioUserInterface)
- gradio.Blocks で UI コンポーネントを定義
- gradio.Timer で会話の状態をポーリング
  - 人間が呼び出されていたら入力欄と送信ボタンを有効化
- run_background は苦肉の策って感じ
  - gradio コマンドで起動した際の AutoReload が効かなくなる
メイン処理(main)
- 非同期処理を管理できるように定義
- Gradio UI をバックグラウンドで実行
- Agent を定義して Runner.run で実行
  - 実行時に対話コンテキストを Agent に渡す
  - Agent は乱暴な口調で最初の「問い」を持っているが、LLM によって適切な言葉を使う
- 結果の表示

実行方法

パッケージをインストールして実行します。

$ pip install openai-agents gradio
$ epoxrt OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
$ python -m src/tools_human_with_gradio.py

結果

最初の画像と同じです。内部構造に思いを馳せながら、改めて味わってください。

標準出力です。

$ python src/tools_human_with_gradio.py
* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
観測された人間の応答: むりです。知らないもん。
交流のコンテキスト: HumanInteractContext(
    system_to_human_message='あなたに質問です。\n今の東京の天気を教えていただけますか？',
    human_to_system_message='むりです。知らないもん。', 
    status='human_responded', 
    event=<asyncio.locks.Event object at 0x781e753d33d0 [set]>)
処理終了です。Ctrl-C で抜けてください

考察

Human-in-the-Loop を実装した AI エージェントが作れました。

この仕組みは AI だけには任せられない様々なケースに応用できると改めて感じました。

承認ワークフロー：エージェントが重要な操作を行う前に人間の承認を求める
情報の補完：エージェントが十分な情報を持っていない場合に人間が情報を提供する
曖昧さの解消：人間の指示やアクションが曖昧な場合にエージェントが人間に意図を確認する
共同作業：エージェントと人間が対話しながら文章作成などを行う
デバッグや監視：エージェントの内部状態や判断理由について人間が対話的に確認する

Gradio と Agent を通信させる部分が意外に重かったです。もう少し本格的にやるなら Gradio と Agent を別のプロセスで立ててソケット通信させるほうが良いのかもしれません。
マルチユーザーや高負荷環境に対してはもう少し工夫が必要かもしれません。

ちなみに Gradio じゃなくて標準入出力で対話させと非常にシンプルなコードになります。

標準入出力版のサンプルコード(短いけど折りたたんでます)

import asyncio

from agents import Agent, ModelSettings, Runner, function_tool


@function_tool
async def ask_to_human(question: str) -> str:
    """システムから人間へ問い合わせを行い結果を取得する関数

    Args:
        question: システムから人間への質問の文字列
    """
    return input(question)


async def main():
    # エージェントを定義
    human_agent = Agent(
        name="人間に質問するエージェント",
        instructions="人間に質問の見解を問います。"
        "相手には感情があるのでできる限り丁寧に聞いてください。",
        tools=[ask_to_human],
        tool_use_behavior="stop_on_first_tool",
        model_settings=ModelSettings(tool_choice="required"),
    )

    # 人間へ問い合わせるエージェントを実行
    result = await Runner.run(
        human_agent,
        input="今の東京の天気を言いやがれ？",
    )

    # 結果を表示
    print(f"観測された人間の応答: {result.final_output}")


if __name__ == "__main__":
    asyncio.run(main())

まとめ

ポイントは、対話状態管理、非同期連携、Tool による人間の呼び出し、Gradui UI など盛りだくさんです。Gardio と Agent を連携させるのは大変でしたが、Web UI を使った自然な Human-in-the-Loop を実現できました。

LLM の精度が高まっても、倫理、業務、法令等による制約で、人間の判断を入れるべきケースはあると思います。適切な UI で人間に問い合わせる手法は継続的にニーズがあるものと考えられます。

この記事が Human-in-the-Loop のサンプルとして妄想を広げるネタになってくれるとありがたいです。ぜひお手元で試してみてください。

最後までお読み頂きありがとうございました！

参照

公式サンプルコード集

一通り眺めると色々と詳しくなれそうです。

公式ドキュメント

それなりに充実しています。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up