Open AI ファイルサーチの結果を取得してみた

Last updated at 2025-04-10Posted at 2025-04-09

目的

Responses APIのFile searchで取得した結果を確認する

1. 使用する学習データ

OpenAIのDocsに記載されているサンプルURLデータを使用します

2. 使用するコードの概要

どのような技術を使うのか
- Python
コードの流れ
- ファイルをアップロードする
- アップロード後のファイル ID を取得する
- 検索用のクエリを投げる
- 結果を取得する

3. 実際のコード

・「deep research」についてのデータのため、これについて質問してみます

例）Python コード

import requests
from io import BytesIO
from openai import OpenAI

client = OpenAI()

def create_file(client, file_path):
    if file_path.startswith("http://") or file_path.startswith("https://"):
        # Download the file content from the URL
        response = requests.get(file_path)
        file_content = BytesIO(response.content)
        file_name = file_path.split("/")[-1]
        file_tuple = (file_name, file_content)
        result = client.files.create(
            file=file_tuple,
            purpose="assistants"
        )
    else:
        # Handle local file path
        with open(file_path, "rb") as file_content:
            result = client.files.create(
                file=file_content,
                purpose="assistants"
            )
    return result.id

# Replace with your own file path or URL
file_id = create_file(client, "https://cdn.openai.com/API/docs/deep_research_blog.pdf")

vector_store = client.vector_stores.create(
    name="knowledge_base"
)

client.vector_stores.files.create(
    vector_store_id=vector_store.id,
    file_id=file_id
)

result = client.vector_stores.files.list(
    vector_store_id=vector_store.id
)

print(result)

vector_store_id=vector_store.id

print(vector_store_id)

response = client.responses.create(
    model="gpt-4o-mini-2024-07-18",
    input="""Deepリサーチについて日本語で簡単に説明して""",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store_ids]
    }],
    include=["file_search_call.results"]
)

print(response)

取得した結果(クエリ、サーチ結果、回答)

queries=['Deepリサーチについて', 'Deepリサーチ 日本語 説明', 'Deepリサーチ 概要'],

Result(attributes={}, file_id='file-BRZt9Bc3Bdke4uY4etax4w', filename='deep_research_blog.pdf', score=0.7112, text="Introducing deep research | OpenAI\n\n\nFebruary 2, 2025 Release\n\nIntroducing deep research\nAn agent that uses reasoning to synthesize large amounts of\nonline information and complete multi-step research tasks\nfor you. Available to Pro users today, Plus and Team next.\n\nTry on ChatGPT\n\nListen to article 8:18 Share\n\n21/02/2025, 19:58 Introducing deep research | OpenAI\n\nhttps://openai.com/index/introducing-deep-research/ 1/38\n\nhttps://openai.com/research/index/release/\nhttps://chatgpt.com/\nhttps://openai.com/\n\n\nToday we’re launching deep research in ChatGPT, a new agentic capability that\nconducts multi-step research on the internet for complex tasks. It accomplishes in\ntens of minutes what would take a human many hours....

回答
text='**Deepリサーチ**は、OpenAIが提供する新しいエージェント機能で、大量のオンライン情報を分析・統合し、複雑な研究課題に対してマルチステップのリサーチ を実施することができます。この機能は、ユーザーが与えたプロンプトに基づいて、数百のソースを調査し、研究アナ リストレベルの包括的な報告を生成します。\n\n### 主な特長\n- **多角的な情報検索**: Deepリサーチは、インター ネット上で豊富な情報を探し出し、理解・解析する能力を持っています。\n- **自動化されたレポート作成**: 複数の オンラインソースからの情報を基に、文書化された出力を生成します。\n- **分野特化型リサーチ**: 金融、科学、政 策、エンジニアリングなどの専門的な知識作業を行う人々に特に有用です。\n- **引用機能**: すべての出力は明確な 引用が付与されており、情報の確認が容易です。\n\nこの技術は、特に時間を節約したいディスカッショントピックや 意思決定プロセスを効率化するために設計されています。例えば、時間のかかるオンライン調査をたった一つのクエリ で迅速に進めることが可能です。

4. おまけ

出力形式
- 今の出力では確認しずらいのでJson形式に出力してみました

追加コード

import json

# 変換できないオブジェクトがあった場合、その __dict__ を返す
def custom_encoder(o):
    if hasattr(o, '__dict__'):
        return o.__dict__
    # __dict__ を持たない場合は文字列に変換
    return str(o)

# Response オブジェクトを辞書に変換
response_dict = vars(response)

# default パラメータを使って、再帰的にカスタムオブジェクトを辞書に変換してから JSON 文字列として出力
print(json.dumps(response_dict, indent=4, ensure_ascii=False, default=custom_encoder))

結果

ファイルサーチ結果
"results": [
     {
         "attributes": {},
         "file_id": "file-X5mEbjcshJiMMT3nKD1U5g",
         "filename": "deep_research_blog.pdf",
         "score": 0.6838,
         "text": "Deep research\n\nHelp me find iOS and android adoption rates,...}

回答
        "text": "Deep Researchの主要なポイントは以下の通り です：\n\n1. **モデルのトレーニング**: Deep Researchは、強化学習を用い て多様なドメインの難解なブラウジングと推論タスクを学習。必要なデータを 見つけるための多段階の計画と実行が可能。\n\n2. **実世界問題での評価**: このモデルは、その他のAIモデルと比較しても高い精度を示し、特に人文学や 社会科学、数学において顕著な成果を上げている。\n\n3. **GAIAベンチマークでの成功**: Deep Researchは、GAIAという公的なベンチマークにおいて新たな最先端（SOTA）を達成。タスクは推論やマルチモーダルの流暢さ、ツール使用 能力を要求し、特に高い合格率を記録。\n\n4. **多機能性**: ユーザーがアップロードしたファイルをブラウズし、グラフを描画・反復する機能に加え、特 定の文や段落を引用する能力を備えている。\n\nこれらの特徴により、Deep Researchは高度なニーズに応えるために設計されており、様々な分野での応用が 期待されています。",
        "type": "output_text"}

5. まとめ

Responses APIのFile searchで取得した結果を確認できることが分かったので、RAGの活用に活かしていきたいですね。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Open AI ファイルサーチの結果を取得してみた

目次

1. 使用する学習データ

2. 使用するコードの概要

3. 実際のコード

4. おまけ

5. まとめ