OpenAIの構造化出力（Structured Outputs）について調査結果を整理

Posted at 2024-08-22

はじめに

ChatGPT で有名な OpenAI 社が、API のレスポンスをリクエスト側で定義したスキーマに従って出力する「構造化出力（Structured Outputs）」を導入したと発表したそうです。（公式記事）

この「構造化出力」が何者なのか、どんなメリットがあるのか、キャッチアップのため調査した結果をまとめました。

構造化出力（Structured Outputs）とは

そもそもの話として、gpt モデル等の生成 AI を組み込んだアプリケーションを構築しようとする場合、リクエストに対する出力内容が不安定（ChatGPT でいうと、同じ内容を聞いたときに、前回と若干回答内容が違う場合があるなど）であればあるほど、レスポンスを受け取ったアプリケーション側での処理（加工、整形）が難しいという課題があったと認識しています。

OpenAI の API には JSON モードというものが既に存在していて、「JSON 形式で返してよ」とリクエスト時に指定することで、ある程度レスポンス形式の統一性を担保することはできていましたが、必ずしも求める形式が担保されるとは保証されていませんでした。

今回登場した「構造化出力（Structured Outputs）」によって、リクエストを投げる側（つまり開発者）がリクエスト時に JSON スキーマを指定することで、出力の形式を厳密に制御することが可能になります。

非構造化入力（例：フリーフォーマットのテキストデータ）から構造化出力を生成する（例：特定の要素を抽出する）ことは生成 AI のベタな使われ方であり、ここに特化した強力な機能を提供してくれた！という感じのようです。

例えば、日々煩雑なフォーマットのメール情報を大量に処理して対応を行う営業担当者向けの業務効率化ツールをアプリケーションとして構築する際など、利用用途はけっこうありそうな気がします。

なんとなく何をしてくれる機能かは理解したので、サンプルプログラムを書いて挙動の理解を深めたいと思います。

動かしてみる

テキストだけで挙動を理解するのは無理があるので、簡単なプログラムで出力結果を確認してみます。

環境

なるべく手間のかからない方法で検証するために、Google Colaboratoryを利用します。

事前にOpenAIのアカウント作成とクレジット購入（とりあえず 10 ドルぶんとか）を行い、API Key を発行する必要があります。

コード

「構造化出力（Structured Outputs）」は、２通りの使い方があるようです。

関数呼び出し（Fuction Calling）
レスポンスフォーマットパラメータ（response_format）

公式ページにあるサンプルにならい、とりあえずresponse_formatのほうで試してみます。

まず openai のライブラリをインストール

!pip install openai

次に環境変数に API キーをセット（環境変数名はOPENAI_API_KEYである必要があるっぽい）

import os
os.environ["OPENAI_API_KEY"] = "あなたが発行したAPIキーをここに入力してください"

そしてresponse_formatを指定してリクエストを送信し、結果を出力

from pydantic import BaseModel
from openai import OpenAI

class Step(BaseModel):
  explanation: str
  output: str

class MathResponse(BaseModel):
  steps: list[Step]
  final_answer: str

client = OpenAI()

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role":"system", "content":"You are a helpful math tutor."},
        {"role":"user", "content":"solve 8x + 31 = 2"},
    ],
    response_format=MathResponse,
)

message = completion.choices[0].message
if message.parsed:
  print(message.parsed.steps)
  print(message.parsed.final_answer)
else:
  print(message.refusal)

結果

レスポンスの出力結果はこちら

[Step(explanation='Subtract 31 from both sides to isolate terms with x on one side.', output='8x + 31 - 31 = 2 - 31'), Step(explanation='This simplifies to 8x = -29.', output='8x = -29'), Step(explanation='Divide both sides by 8 to solve for x.', output='x = \\frac{-29}{8}')]
x = \frac{-29}{8}

JSON をもう少し整形してみる

[
  Step(explanation='Subtract 31 from both sides to isolate terms with x on one side.', output='8x + 31 - 31 = 2 - 31'),
  Step(explanation='This simplifies to 8x = -29.', output='8x = -29'),
  Step(explanation='Divide both sides by 8 to solve for x.', output='x = \\frac{-29}{8}')
]
x = \frac{-29}{8}

python コードで定義したスキーマに従って、計算過程を Step の配列で表現し、最終的な答えを提示してくれてる！（fracは分数を表しているので、計算結果も合ってそう）

毎回この形式で返してくれるとわかっていれば、計算過程をうまいこと UI で表現することもできそうですね。

response_format 指定しなかったらどうなる？

先ほどのコードからresponse_formatパラメータを除外した状態で、同じリクエストを送信し、結果を見てみます。（不要な箇所をコメントアウトし、結果の出力部分だけ少しコードいじってます）

#from pydantic import BaseModel
from openai import OpenAI

#class Step(BaseModel):
#  explanation: str
#  output: str

#class MathResponse(BaseModel):
#  steps: list[Step]
#  final_answer: str

client = OpenAI()

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role":"system", "content":"You are a helpful math tutor."},
        {"role":"user", "content":"solve 8x + 31 = 2"},
    ],
#    response_format=MathResponse,
)

message = completion.choices[0].message
print(message)

結果

結果はこんな感じ

ParsedChatCompletionMessage[NoneType](content='To solve the equation \\(8x + 31 = 2\\), you need to isolate the variable \\(x\\). Here are the steps:\n\n1. Subtract 31 from both sides of the equation to get rid of the constant term on the left side:\n\n   \\[\n   8x + 31 - 31 = 2 - 31\n   \\]\n\n   Simplifying both sides gives:\n\n   \\[\n   8x = -29\n   \\]\n\n2. Now, divide both sides by 8 to solve for \\(x\\):\n\n   \\[\n   \\frac{8x}{8} = \\frac{-29}{8}\n   \\]\n\n   \\[\n   x = -\\frac{29}{8}\n   \\]\n\nSo, the solution to the equation is \\(x = -\\frac{29}{8}\\).', refusal=None, role='assistant', function_call=None, tool_calls=[], parsed=None)

contentの部分に絞って整形してみると

ParsedChatCompletionMessage[NoneType](content='To solve the equation (8x + 31 = 2), you need to isolate the variable (x). Here are the steps:

1. Subtract 31 from both sides of the equation to get rid of the constant term on the left side:
  [
    8x + 31 - 31 = 2 - 31
  ]
Simplifying both sides gives:
  [
    8x = -29
  ]

2. Now, divide both sides by 8 to solve for (x):
  [
    frac{8x}{8} = frac{-29}{8}
  ]
  [
    x = -frac{29}{8}
  ]
So, the solution to the equation is (x = -frac{29}{8}).

計算過程も含めてテキストで説明してくれていますが、文章形式で出力されているため、これを受け取ったアプリケーション側でいい感じに加工する手間はこちらのほうがかかりそう。

おわりに

「構造化出力（Structured Outputs）」はアプリケーションコード側で指定したスキーマ定義に沿って回答してくれる OpenAI の新機能で、サンプルの出力結果を比較しても明らかにアプリケーション側で後続利用しやすくできることがわかりました！

次は、この機能を利用して LLM を組み込んだ生成 AI アプリケーションを実際に形にしてみたいと思います。

参考

Introducing Structured Outputs in the API

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up