OpenAI の Structured Outputs に Pydantic モデルを入力する

Last updated at 2024-12-09Posted at 2024-08-12

8月6日、OpenAI 社は新機能 Structured Outputs を公開しました。これにより Function calling 等でも確実に出力を構造化できるようになり、プログラムとの連携をより確実に行えます。今回、Pydantic モデルも直接入出力できるようになり可読性、型安全性の向上やバリデーションの自動化等のメリットが得られます。さらに生成テキストのフォーマットを指定するためのプロンプトを減らせるという効果もあります。

Azure OpenAI Service の `gpt-4o(2024-08-06)/gpt-4o-mini(2024-07-18)` も対応

11/21 Structured Outputs は GA しました。API サポートは、API バージョン 2024-08-01-preview で初めて追加されました。最新のプレビュー API と最新の GA API 2024-10-21 で利用可能です。

chat.completions での利用

from pydantic import BaseModel

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "イベント情報を抽出する"},
        {"role": "user", "content": "アリスとボブは金曜日に科学フェアに行く。"},
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed

CalendarEvent(name='科学フェア', date='金曜日', participants=['アリス', 'ボブ'])

OpenAI の Docs にもあるこのエンティティ抽出コードでは、新たに追加された client.beta.chat.completions.parse というメソッドを使用しています。このメソッドに Pydantic モデルを渡すと、モデルを自動的に JSON スキーマに変換して API に送信し、レスポンスの内容を解析して指定したモデルに戻します。

この parse メソッドは client.chat.completions.create() メソッドのラッパーで、標準の ChatCompletion クラスのサブクラスである ParsedChatCompletion オブジェクトを返します。

Pydantic モデルは OpenAI Python SDK を使用した場合、内部的に to_strict_json_schema によって専用のスキーマに加工して送信します。

import json

# strict JSON schemaに変換
strict_schema = to_strict_json_schema(CalendarEvent)
print(json.dumps(strict_schema, indent=4))

{
    "properties": {
        "name": {
            "title": "Name",
            "type": "string"
        },
        "date": {
            "title": "Date",
            "type": "string"
        },
        "participants": {
            "items": {
                "type": "string"
            },
            "title": "Participants",
            "type": "array"
        }
    },
    "required": [
        "name",
        "date",
        "participants"
    ],
    "title": "CalendarEvent",
    "type": "object",
    "additionalProperties": false
}

Pydantic モデルは CalendarEvent.model_json_schema() で JSON スキーマに変換できますが、さらに"additionalProperties": false が追加されていたりしますね。

Chain of Thought (CoT) の Pydantic モデル

class Step(BaseModel):
    explanation: str
    output: str

class MathResponse(BaseModel):
    steps: list[Step]
    final_answer: str

OpenAI の Docs には Chain of Thought (CoT) を Structured Outputs で出力する例があり、とても面白いと思いました。

{
    "steps": [
        {
            "explanation": "The first step is to isolate the term with the variable on one side of the equation. We do this by subtracting 7 from both sides of the equation.",
            "output": "8x + 7 - 7 = -23 - 7",
        },
        {
            "explanation": "The 7 on the left side cancels out, simplifying to 8x on the left-hand side. On the right-hand side, calculate -23 - 7.",
            "output": "8x = -30",
        },
        {
            "explanation": "Next, solve for x by dividing both sides of the equation by 8 to isolate x.",
            "output": "8x / 8 = -30 / 8",
        },
        {
            "explanation": "This simplifies to x = -30 / 8. The fraction can be further simplified by dividing both the numerator and the denominator by their greatest common divisor, which is 2.",
            "output": "x = -15 / 4",
        }
    ],
    "final_answer": "x = -\\frac{15}{4}"
}

これは COOL。思考過程を別の UI に表示できそう。

Function calling での利用

Structured Outputs は Function calling でも利用できます。以前、Assistants API によるエージェント開発で出張申請アシスタントを設計しましたが、その時作った tools を Pydantic で表現してみましょう。

JSON スキーマ

飲食店検索 API とホテル検索 API の function 定義です。

{
    "type": "function",
    "function": {
        "name": "search_hotpepper_shops",
        "description": "ホットペッパーグルメAPIを利用し、キーワードや個室の有無などのオプションフィルターで飲食店を検索できます。",
        "parameters": {
            "type": "object",
            "properties": {
                "keyword": {
                    "type": "string",
                    "description": "飲食店を検索するためのキーワード。店名、住所、駅名、お店ジャンルなどを指定できる。ユーザーメッセージから検索キーワードとなる文字を抽出して検索クエリーにしてください。例: ###大阪駅 和食###"
                },
                "private_room": {
                    "type": "integer",
                    "description": "個室ありの店舗のみを検索, 0:絞り込まない, 1:絞り込む。オプション",
                    "enum": [0, 1]
                }
            },
            "required": ["keyword"]
        }
    }
},
{
    "type": "function",
    "function": {
        "name": "search_vacant_hotels",
        "description": "楽天トラベルのAPIを使って、場所、チェックイン日、チェックアウト日、予算、大人の人数など、さまざまなフィルターで空室ホテルを検索できます。",
        "parameters": {
            "type": "object",
            "properties": {
                "latitude": {
                    "type": "number",
                    "description": "ホテル検索場所の緯度(WGS84), ex:35.6065914"
                },
                "longitude": {
                    "type": "number",
                    "description": "ホテル検索場所の経度(WGS84), ex:139.7513225"
                },
                "searchRadius": {
                    "type": "number",
                    "description": "緯度経度検索時の検索半径(単位km), 0.1 to 3.0"
                },
                "checkinDate": {
                    "type": "string",
                    "description": "yyyy-MM-dd 形式のチェックイン日。年の指定がない場合は2024年を指定する。"
                },
                "checkoutDate": {
                    "type": "string",
                    "description": "yyyy-MM-dd 形式のチェックアウト日。年の指定がない場合は2024年を指定する。"
                },
                "maxCharge": {
                    "type": "integer",
                    "description": "上限金額, int 0 to 999999999"
                },
                "adultNum": {
                    "type": "integer",
                    "description": "宿泊者数, int 1 to 99"
                }
            },
            "required": ["latitude", "longitude", "searchRadius", "checkinDate", "checkoutDate"]
        }
    }
}

Pydantic モデル化

可読性いいなぁ…

from pydantic import BaseModel, Field
from typing import Literal

class SearchHotpepperShopsModel(BaseModel):
    """
    ホットペッパーグルメAPIを利用し、キーワードや個室の有無などのオプションフィルターで飲食店を検索できます。
    """
    keyword: str = Field(..., description="飲食店を検索するためのキーワード。店名、住所、駅名、お店ジャンルなどを指定できる。ユーザーメッセージから検索キーワードとなる文字を抽出して検索クエリーにしてください。例: ###大阪駅 和食###")
    private_room: Literal[0, 1] = Field(..., description="個室ありの店舗のみを検索, 0:絞り込まない, 1:絞り込む。オプション")

class SearchVacantHotelsModel(BaseModel):
    """
    楽天トラベルのAPIを使って、場所、チェックイン日、チェックアウト日、予算、大人の人数など、さまざまなフィルターで空室ホテルを検索できます。
    """
    latitude: float = Field(..., description="ホテル検索場所の緯度(WGS84), ex:35.6065914")
    longitude: float = Field(..., description="ホテル検索場所の経度(WGS84), ex:139.7513225")
    searchRadius: float = Field(..., description="緯度経度検索時の検索半径(単位km), 0.1 to 3.0")
    checkinDate: str = Field(..., description="yyyy-MM-dd 形式のチェックイン日。年の指定がない場合は2024年を指定する。")
    checkoutDate: str = Field(..., description="yyyy-MM-dd 形式のチェックアウト日。年の指定がない場合は2024年を指定する。")
    maxCharge: int = Field(..., description="上限金額, int 0 to 999999999")
    adultNum: int = Field(..., description="宿泊者数, int 1 to 99")

SearchHotpepperShopsModel と SearchVacantHotelsModel を定義しました。class の直下に記述した文字列はクラスの description として扱われます。OpenAI によると function とパラメータには直感的な名前を付け、詳細な説明を加えることで正しく選択されるようになります。

Function calling の実行

tools に openai.pydantic_function_tool(SearchHotpepperShopsModel) のように　Pydantic モデルを指定して parse メソッドで送信します。

instructions="""
あなたは Contoso 社の社員の出張を支援するためのアシスタントです。あなたは以下の業務を遂行します。
 - 旅程を作成します
 - ホテルを検索したり予約します
 - 交通機関を検索します
 - 出張で行くべきレストランや居酒屋を提案します
 - 出張にかかる概算費用を計算します

#制約事項
 - ユーザーからのメッセージは日本語で入力されます
 - ユーザーからのメッセージから忠実に情報を抽出し、それに基づいて応答を生成します。
 - ユーザーからのメッセージに勝手に情報を追加したり、不要な改行文字を追加してはいけません
"""

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {
            "role": "system",
            "content": instructions,
        },
        {
            "role": "user",
            "content": "有楽町駅近辺でイタリアンのレストランを探しています。メンバーは二人で個室ありでお願い。",
        },
    ],
    tools=[
        openai.pydantic_function_tool(SearchHotpepperShopsModel),
        openai.pydantic_function_tool(SearchVacantHotelsModel)
    ]
)

parsed = completion.choices[0].message.tool_calls[0].function
print(parsed)

ParsedFunction(arguments='{"keyword":"有楽町駅イタリアン","private_room":1}', name='SearchHotpepperShopsModel', parsed_arguments=SearchHotpepperShopsModel(keyword='有楽町駅イタリアン', private_room=1))

parse メソッドを使用した場合、ParsedFunction クラスが返り、 arguments と parsed_arguments が得られます。

SearchHotpepperShopsModel(keyword='有楽町駅イタリアン', private_room=1)

SearchHotpepperShopsModel が得られました！ Pydantic モデルで返却されることで、データの構造が明確に定義されており、型安全性が保証されます。

可能な場合は関数の引数に列挙型を使用する

OpenAI によると、列挙型を使用して引数の可能な値を制限することでハルシネーション（幻覚）を軽減できるとしています。今回、SearchHotpepperShopsModel モデルの private_room: Literal[0, 1] プロパティのような制限ですね。private_room は、to_strict_json_schema によって、以下のように変換されます。

"private_room": {
    "description": "個室ありの店舗のみを検索, 0:絞り込まない, 1:絞り込む。オプション",
    "enum": [
        0,
        1
    ],
    "title": "Private Room",
    "type": "integer"
}

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up