Amazon Bedrock の Fable 5 で PDF チャート画像の認識精度を検証する

Posted at 2026-06-15

はじめに

2026 年 4 月に Opus 4.7 が公開された際、Amazon Bedrock の Claude モデルで PDF チャート画像の認識精度を検証するという記事を書きました。この記事では、Sonnet 4.6、Opus 4.6、Opus 4.7 の 3 つのモデルで PDF チャート画像の認識精度を検証しています。その後、2026 年 5 月に Opus 4.8、6 月に Claude Fable 5 が公開されました。そこで、これらの新しいモデルでも同じ検証を行いました。

AWS News Blog「Anthropic Claude Fable 5 on AWS: Mythos-class capabilities with built-in safeguards now available」では、以下のように紹介されており、PDF 内の図やチャートの認識精度が期待できます。

Advanced vision capabilities — Claude Fable 5 understands diagrams, charts, and tables nested in files and PDFs. This opens up research and document-heavy work in finance, legal, analytics, architecture, and gaming.

高度なビジョン機能 — Claude Fable 5 は、ファイルや PDF にネストされた図、チャート、表を理解します。これにより、財務、法務、分析、建築、ゲームにおけるリサーチや文書を多用する作業が可能になります。

前回の記事との比較を行うため、同一の PDF ファイルを使用します。スクリプトは、今回使用するモデルを呼び出す部分と DPI 変更を行う部分のみを変更しました。

2026 年 6 月 12 日、アメリカ政府の輸出管理規制により Anthropic は Claude Fable 5 へのアクセスを停止しました。このため、十分な検証が行えませんでした。本記事では、取得した数回分のデータで前回の記事との比較を行います。

参考情報

検証内容

本記事では、次の内容を検証します。

Claude Opus 4.8 と Claude Fable 5 は、Opus 4.7 と比べて、PDF 内のチャート画像を具体的にどの程度詳細に読めるのか
PDF を丸ごと LLM に渡す方式（Approach 1）と、画像化して RAG を構築する方式（Approach 2）でどう変わるのか
Approach 2 の場合、前回の記事の DPI 300 と今回の記事の DPI 500 で精度にどの程度影響するのか

検証対象

検証対象 PDF は、前回の記事で使用したものと同じです。総務省「情報通信白書令和7年版 PDF版」第Ⅰ部第 1 章第 2 節「AIの爆発的な進展の動向」の 12 ページを対象とします。主要チャートとして「図表Ⅰ-1-2-4 AI活力ランキング上位10カ国（2023年）」を使用します。このチャートは 8 つの評価カテゴリー（R&D、Responsible AI、Economy、Education、Diversity、Policy and Governance、Public Opinion、Infrastructure）で構成された横棒スタックチャートです。細部の読み取りが問われる題材です。

本記事の検証結果は単発実行によるものです。LLM の出力には確率的な揺らぎがあり、再実行で結果が変わる可能性があります。Opus 4.7 以降では temperature（および top_p、top_k）が廃止され、thinking と effort で制御するようになりました（Amazon Bedrock のドキュメントClaude Opus 4.7 参照）。また、Fable 5 では temperature は 1.0 もしくは未設定のみ許可されています（top_p は 0.99 以上 1.0 未満または未設定、top_k は非対応。Amazon Bedrock のドキュメントClaude Fable 5 参照）。そのため、本記事では temperature を設定せず、thinking と effort を設定しています。

図表Ⅰ-1-2-4 AI活力ランキング上位10カ国（2023年）

検証環境

使用サービスとモデル

項目	値
リージョン	us-east-1
推論モデル	`us.anthropic.claude-opus-4-7`, `us.anthropic.claude-opus-4-8`, `us.anthropic.claude-fable-5`
埋め込みモデル	`amazon.titan-embed-text-v2:0`
ベクトルストア	Amazon S3 Vectors

スクリプト配置

Bedrock-Claude/
├── data/
│   └── n1120000.pdf  # 情報通信白書令和7年版 PDF版
├── scripts/
│   ├── common.py
│   ├── questions.json
│   ├── approach1_pdf_direct.py
│   ├── approach2_ingest.py
│   ├── approach2_query.py
│   └── build_summary.py
├── results/
│   ├── approach1-pdf-direct.{md,json}
│   ├── approach2-rag_ingest-*_answer-*.{md,json}
│   └── summary.md
├── requirements.txt
└── README.md

Python 環境

requirements.txt

boto3>=1.35
pdf2image>=1.17
Pillow>=10.0

pdf2image がシステム依存として poppler-utils を必要とします。

sudo apt install poppler-utils
uv venv
uv pip install -r requirements.txt

S3 Vectors バケットの作成

Approach 2 の事前準備として、S3 Vectors バケットを 1 つ作成しておきます。インデックスは Python スクリプト内で作成・削除します。

aws s3vectors create-vector-bucket \
  --vector-bucket-name bedrock-claude-chart \
  --profile YOUR_PROFILE --region us-east-1

Data Retention API の有効化

Fable 5 モデルを呼び出す前に、Data Retention API を使用して provider_data_share を設定し、データ共有を有効にする必要があります。Bedrock の API Keys から Generate Short-Term API Key をクリックして、短期的な API キーを発行します。そのキーを使って、以下のように Data Retention API を実行します。

curl -X PUT https://bedrock.us-east-1.amazonaws.com/data-retention \
  -H "Authorization: Bearer YOUR_SHORT_TERM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "mode": "provider_data_share" }'

評価質問

評価質問は 1 問に絞り、人手で回答の正誤と粒度を判定します。複数の評価質問で検証を行う場合は q02、q03 … というように追加します。

questions.json

[
  {
    "id": "q01",
    "question": "AI活力ランキングで5位の国はどこですか? そのほかの順位も教えてください。また、それぞれの国はどのような分野で優れているかも教えてください。",
    "notes": "AI活力ランキングの基本読み取り"
  }
]

共通コード

scripts/common.py に Bedrock/S3 Vectors クライアントの初期化と、Converse/埋め込み/チャンク分割のユーティリティをまとめます。
DPI=300 で、PDF を画像化する際の DPI を設定しています。DPI 500 で画像化する場合は DPI=500 に変更します。

scripts/common.py

"""Bedrock-Claude PDFチャート認識検証の共通ユーティリティ。"""
import json
import time
from pathlib import Path

import boto3

REGION = "us-east-1"
PROFILE = "YOUR_PROFILE"

MODELS = {
    "opus-4-7":   "us.anthropic.claude-opus-4-7",
    "opus-4-8":   "us.anthropic.claude-opus-4-8",
    "fable-5":    "us.anthropic.claude-fable-5",
}

EMBED_MODEL_ID = "amazon.titan-embed-text-v2:0"
EMBED_DIM = 1024
CHUNK_SIZE = 1024
CHUNK_OVERLAP = CHUNK_SIZE // 10  # 10%オーバーラップでチャンク境界の文脈を保持
DPI = 300

ROOT = Path(__file__).resolve().parent.parent
PDF_PATH = ROOT / "data" / "n1120000.pdf"
QUESTIONS_PATH = ROOT / "scripts" / "questions.json"
RESULTS_DIR = ROOT / "results"


def session():
    """AWS SSO プロファイルを使ったセッションを返す。"""
    return boto3.Session(profile_name=PROFILE, region_name=REGION)


def bedrock_runtime():
    """Bedrock Runtime クライアントを返す。"""
    return session().client("bedrock-runtime")


def s3vectors():
    """S3 Vectors クライアントを返す。"""
    return session().client("s3vectors")


def load_questions():
    """questions.json から評価質問リストを読み込む。"""
    return json.loads(QUESTIONS_PATH.read_text(encoding="utf-8"))


def load_pdf_bytes():
    """検証対象 PDF をバイト列で返す。"""
    return PDF_PATH.read_bytes()


def converse(client, model_id, content, max_tokens=2048):
    """Bedrock Converse API を呼び出し、テキスト回答とトークン使用量を返す。

    Opus 4.8 は temperature が廃止されているため additionalModelRequestFields で
    thinking と effort を設定する。他モデルは temperature=0 で出力を固定する。
    失敗時は指数バックオフで最大3回リトライする。
    """
    inference_config = {"maxTokens": max_tokens}
    extra = {}
    if any(key in model_id for key in ("opus-4-8", "opus-4-7", "fable-5")):
        # Opus 4.7以降: temperature・top_p・top_k は Breaking Change で廃止。thinking と effort で制御する
        extra["additionalModelRequestFields"] = {
            "thinking": {"type": "adaptive"},
            "output_config": {"effort": "high"},
        }
    else:
        # Opus 4.7より古いモデル(Sonnet4.6含む): temperature=0 で出力のランダム性を抑制する
        inference_config["temperature"] = 0

    started = time.monotonic()
    for attempt in range(3):
        try:
            resp = client.converse(
                modelId=model_id,
                messages=[{"role": "user", "content": content}],
                inferenceConfig=inference_config,
                **extra,
            )
            break
        except Exception:
            if attempt == 2:
                raise
            time.sleep(2 ** attempt)  # 指数バックオフ: 1s → 2s
    blocks = resp["output"]["message"]["content"]
    text = next(b["text"] for b in blocks if "text" in b)
    usage = resp.get("usage", {})
    return text, {
        "input_tokens": usage.get("inputTokens", 0),
        "output_tokens": usage.get("outputTokens", 0),
        "latency_sec": round(time.monotonic() - started, 2),
    }


def embed(client, text):
    """Titan Embed Text v2 でテキストを 1024 次元のベクトルに変換する。"""
    resp = client.invoke_model(
        modelId=EMBED_MODEL_ID,
        body=json.dumps({"inputText": text, "dimensions": EMBED_DIM, "normalize": True}),
    )
    return json.loads(resp["body"].read())["embedding"]


def chunk_text(text):
    """テキストを CHUNK_SIZE 文字ごとに分割し、CHUNK_OVERLAP 文字のオーバーラップを持たせる。"""
    step = CHUNK_SIZE - CHUNK_OVERLAP
    return [text[i:i + CHUNK_SIZE] for i in range(0, len(text), step)]


def write_results(name, md, payload):
    """結果を Markdown と JSON の2形式で results/ ディレクトリに書き出す。"""
    RESULTS_DIR.mkdir(exist_ok=True)
    (RESULTS_DIR / f"{name}.md").write_text(md, encoding="utf-8")
    (RESULTS_DIR / f"{name}.json").write_text(
        json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8"
    )

Approach 1: PDF 直接読み取り

Bedrock Converse API の document ブロックに PDF のバイナリを直接渡して質問する最小構成です。

scripts/approach1_pdf_direct.py

"""Approach 1: Bedrock Converse API の document ブロックでPDFを直接読ませる。"""
import common


def ask(client, model_id, pdf_bytes, question):
    """PDF バイナリを document ブロックに渡してモデルに質問する。"""
    content = [
        # PDF を document ブロックとして直接渡す（Bedrock は最大 4.5MB をサポート、Claude モデルは最大 100 ページをサポート）
        {"document": {"format": "pdf", "name": "target", "source": {"bytes": pdf_bytes}}},
        {"text": question},
    ]
    return common.converse(client, model_id, content)


def to_markdown(results):
    """全モデルの回答結果を1つの Markdown レポートにまとめる。"""
    lines = ["# Approach 1: PDF直接読み取り", ""]
    any_key = next(iter(results))
    for i, row in enumerate(results[any_key]):
        q = row["question"]
        lines += [f"## {q['id']}: {q['question']}", "", f"観点: {q.get('notes', '')}", ""]
        for key, rows in results.items():
            r = rows[i]
            header = f"### {key} (in/out={r['input_tokens']}/{r['output_tokens']}, {r['latency_sec']}s)"
            lines += [header, "", r["answer"], ""]
    return "\n".join(lines)


def main():
    """3モデル × 全質問で PDF 直接読み取りを実行し、結果を保存する。"""
    client = common.bedrock_runtime()
    pdf_bytes = common.load_pdf_bytes()
    questions = common.load_questions()

    results = {}
    for key, model_id in common.MODELS.items():
        print(f"== {key} ==")
        rows = []
        for q in questions:
            print(f"  {q['id']}")
            answer, m = ask(client, model_id, pdf_bytes, q["question"])
            rows.append({"question": q, "answer": answer, **m})
        results[key] = rows

    common.write_results("approach1-pdf-direct", to_markdown(results), results)
    print("results/approach1-pdf-direct.md")


if __name__ == "__main__":
    main()

実行します。

uv run python scripts/approach1_pdf_direct.py

Approach 1 の結果

モデル	input tokens	output tokens	latency
Opus 4.7	77,200	1,555	50.72s
Opus 4.8	77,195	1,241	29.39s
Fable 5	77,195	1,471	37.38s

前回の記事での Approach 1 の結果と比較すると、Opus 4.7 の input tokens は 77,200 で、まったく同じでした。output tokens は、出力内容の揺れから今回の方が 1.5 倍ほど多い結果でした。Opus 4.8 と Fable 5 も同様の結果でした。いずれも画像を高い解像度で処理しており、その分が input tokens に反映されているようです。

回答の精度をみると、Opus 4.7 がチャートのカテゴリーごとの内容を分析して回答しているのに対し、Opus 4.8 と Fable 5 はそれが行えなかったようです。

Opus 4.7の結果

# AI活力ランキング（2023年）について

## 5位の国
**アラブ首長国連邦（United Arab Emirates / UAE）** です。

## 上位10カ国の順位
スタンフォード大学HAI（Human-Centered Artificial Intelligence）が2024年11月に発表した2023年AI活力ランキングは以下のとおりです（図表Ⅰ-1-2-4）。

| 順位 | 国 |
|------|-----|
| 1位 | 米国（United States） |
| 2位 | 中国（China） |
| 3位 | 英国（United Kingdom） |
| 4位 | インド（India） |
| 5位 | アラブ首長国連邦（United Arab Emirates） |
| 6位 | フランス（France） |
| 7位 | 韓国（South Korea） |
| 8位 | ドイツ（Germany） |
| 9位 | 日本（Japan） |
| 10位 | シンガポール（Singapore） |

## 評価分野について

このランキングは、以下の8つの分野（指標）の合計で評価されています。

- **R&D**（研究開発）
- **Responsible AI**（責任あるAI）
- **Economy**（経済）
- **Education**（教育）
- **Diversity**（多様性）
- **Policy and Governance**（政策とガバナンス）
- **Public Opinion**（世論）
- **Infrastructure**（インフラ）

## 各国が優れている分野（図表より読み取れる傾向）

文書中のグラフから読み取れる範囲でお伝えします（細かい数値は本文には記載されていません）。

- **米国（1位）**：すべての指標で突出しており、特に **R&D（研究開発）、Economy（経済）、Infrastructure（インフラ）** の比重が大きく、圧倒的なリードを示しています。
- **中国（2位）**：**R&D、Economy** の領域で特に強さを発揮しており、米国と並ぶAI大国です。
- **英国（3位）**：R&D、Economy、Policy and Governanceなどでバランスの取れた高評価。
- **インド（4位）**：R&DやEconomyに加え、**Public Opinion（世論）** の面でも強みを見せています。
- **アラブ首長国連邦（5位）**：**Policy and Governance（政策とガバナンス）** や **Infrastructure** の比重が大きく、国家戦略としてAIに注力している姿勢が反映されていると考えられます。
- **日本（9位）**：米国・中国・英国といった上位国からは水をあけられている、と本文で指摘されています。

## 補足

なお、AI研究力（論文数等）でランキングしている **AIRankings** では、近年の上位は米国・中国・英国・ドイツの順で、日本は11〜12位で推移しているとされています。

※ 各国の指標別の詳細な数値や順位については、本文中には記載されていないため、グラフからの読み取りに基づく傾向の説明となります。正確な値は出典のスタンフォード大学HAIの「Global AI Vibrancy Tool」を参照する必要があります。

Opus 4.8の結果

# AI活力ランキング（2023年）について

図表Ⅰ-1-2-4（スタンフォード大学HAI「Global AI Vibrancy Tool」）によると、**5位はアラブ首長国連邦（United Arab Emirates）**です。

## 上位10カ国の順位

| 順位 | 国名 |
|------|------|
| 1位 | アメリカ（United States） |
| 2位 | 中国（China） |
| 3位 | イギリス（United Kingdom） |
| 4位 | インド（India） |
| **5位** | **アラブ首長国連邦（United Arab Emirates）** |
| 6位 | フランス（France） |
| 7位 | 韓国（South Korea） |
| 8位 | ドイツ（Germany） |
| 9位 | 日本（Japan） |
| 10位 | シンガポール（Singapore） |

## 各国が優れている分野について

この図表は、以下の8つの評価分野（カテゴリー）ごとの値を積み上げた棒グラフで表現されています。

- R&D（研究開発）
- Responsible AI（責任あるAI）
- Economy（経済）
- Education（教育）
- Diversity（多様性）
- Policy and Governance（政策・ガバナンス）
- Public Opinion（世論）
- Infrastructure（インフラ）

ただし、**ご提示の資料（図表）からは、各国がどの分野で具体的にどれだけ優れているかという数値は読み取れません**。グラフ上では各カテゴリーが色分けされて積み上げられていますが、文章中には個別分野の詳細な評価結果についての記述はなく、視覚的な棒グラフのみで示されています。

おおまかにグラフから見て取れる傾向としては、**1位の米国が全分野にわたって突出して長い棒（特にInfrastructure＝インフラや R&D＝研究開発の部分が大きい）**を持ち、各国を大きく引き離していることが特徴的です。

正確な分野別の評価値が必要な場合は、出典元であるスタンフォード大学HAIの「Global AI Vibrancy Tool」（https://hai.stanford.edu/ai-index/global-vibrancy-tool）をご参照いただくことをおすすめします。

なお、本文では、日本は総合9位で米国・中国・英国などから「水をあけられている」とされ、また論文数を基にした別のランキング（AIRankings）では日本は11〜12位で推移していると記載されています。

Fable 5の結果

## AI活力ランキング（2023年）上位10カ国

スタンフォード大学のHAI（Human-Centered Artificial Intelligence）が2024年11月に発表した、2023年のAI活力ランキング（図表Ⅰ-1-2-4）によると、**5位はアラブ首長国連邦（UAE）** です。

全体の順位は以下のとおりです。

| 順位 | 国名 |
|---|---|
| 1位 | 米国 |
| 2位 | 中国 |
| 3位 | 英国 |
| 4位 | インド |
| 5位 | アラブ首長国連邦（UAE） |
| 6位 | フランス |
| 7位 | 韓国 |
| 8位 | ドイツ |
| 9位 | 日本 |
| 10位 | シンガポール |

なお、日本は総合9位に位置付けられており、白書では「米国、中国、英国といった国から水をあけられている」と評価されています。

## 各国が優れている分野について

このランキングは、以下の**8つの評価分野**のスコアを合算して算出されています。

- R&D（研究開発）
- Responsible AI（責任あるAI）
- Economy（経済）
- Education（教育）
- Diversity（多様性）
- Policy and Governance（政策・ガバナンス）
- Public Opinion（世論）
- Infrastructure（インフラ）

ただし、本資料（白書）には各国がどの分野で何点を獲得したかという具体的な数値の記載はありません。図表のグラフ（積み上げ棒グラフ）から読み取れる範囲では、以下のような傾向が見られます。

- **米国**：全分野にわたってバランスよく高いスコアを獲得しており、特にR&D、経済、政策・ガバナンス、インフラの分野で他国を大きく引き離しています。
- **中国**：R&D（研究開発）とインフラの分野のスコアが大きく、これらが総合2位を支えています。
- **3位以下の国々**：米中と比較するとスコアの差が大きく、R&Dや政策・ガバナンスなどを中心に得点している様子がうかがえます。

より詳細な分野別スコアについては、出典であるStanford Universityの「Global AI Vibrancy Tool」で確認できます。

Approach 2: RAG

Approach 2 は ingest と query の2ステップに分けます。

ingest: ページを画像化し、Claude で Markdown に変換。Titan Embed v2 で埋め込んで S3 Vectors に格納する
query: 質問を埋め込んで S3 Vectors から top-5 を検索し、その文脈で Claude に回答させる

Ingest 側

scripts/approach2_ingest.py

"""Approach 2 / Step 1: PDFをベクトル化してS3 Vectorsへ登録する。"""
import argparse
import io
import json
import uuid

from pdf2image import convert_from_bytes

import common

IMAGE_PROMPT = """\
Extract the content from an image page and output in Markdown syntax. Enclose the content in the <markdown></markdown> tag and do not use code blocks. If the image is empty then output a <markdown></markdown> without anything in it.

Follow these steps:

1. Examine the provided page carefully.

2. Identify all elements present in the page, including headers, body text, footnotes, tables, images, captions, and page numbers, etc.

3. Use markdown syntax to format your output:
    - Headings: # for main, ## for sections, ### for subsections, etc.
    - Lists: * or - for bulleted, 1. 2. 3. for numbered
    - Do not repeat yourself

4. If the element is an image (not table)
    - If the information in the image can be represented by a table, generate the table containing the information of the image
    - Otherwise provide a detailed description about the information in image
    - For charts specifically, work through the following steps in order before writing the final table:
        - Step 1 - Structure: Identify the chart type (bar, stacked bar, line, pie, etc.), read the title, axis labels with units, and list all legend categories with their colors or patterns
        - Step 2 - Scale: Note the numerical range of each axis (min, max, major gridlines) so you can derive estimates
        - Step 3 - Per-item extraction: For every item (row/country/data point), read or estimate the value of EACH series separately. Do not skip any item
        - Step 4 - Estimate unlabeled values: When no label is printed, estimate by comparing bar length to the axis scale. Mark each estimate with a leading ~. Provide numeric estimates (e.g., ~85) rather than qualitative labels (e.g., "large")
        - Step 5 - Table: Output a table with one row per item and one column per series. Add a total column when totals are visible. Do not omit any item or series
    - Classify the element as one of: Chart, Diagram, Logo, Icon, Natural Image, Screenshot, Other. Enclose the class in <figure_type></figure_type>
    - Enclose <figure_type></figure_type>, the table or description, and the figure title or caption (if available), in <figure></figure> tags
    - Do not transcribe text in the image after providing the table or description

5. If the element is a table
    - Create a markdown table, ensuring every row has the same number of columns
    - Maintain cell alignment as closely as possible
    - Do not split a table into multiple tables
    - If a merged cell spans multiple rows or columns, place the text in the top-left cell and output ' ' for other
    - Use | for column separators, |-|-| for header row separators
    - If a cell has multiple items, list them in separate rows
    - If the table contains sub-headers, separate the sub-headers from the headers in another row

6. If the element is a paragraph
    - Transcribe each text element precisely as it appears

7. If the element is a header, footer, footnote, page number
    - Transcribe each text element precisely as it appears

Output Example:
<markdown>
<figure>
<figure_type>Chart</figure_type>
Figure 3: This chart shows annual sales in millions. The year 2020 was significantly down due to the COVID-19 pandemic.
A bar chart showing annual sales figures, with the y-axis labeled "Sales ($Million)" and the x-axis labeled "Year". The chart has bars for 2018 ($12M), 2019 ($18M), 2020 ($8M), and 2021 ($22M).
</figure>

<figure>
<figure_type>Chart</figure_type>
Figure 3: This chart shows annual sales in millions. The year 2020 was significantly down due to the COVID-19 pandemic.
| Year | Sales ($Million) |
|-|-|
| 2018 | $12M |
| 2019 | $18M |
| 2020 | $8M |
| 2021 | $22M |
</figure>

# Annual Report

## Financial Highlights

<figure>
<figure_type>Logo</figure_type>
The logo of Apple Inc.
</figure>

* Revenue: $40M
* Profit: $12M
* EPS: $1.25

| | Year Ended December 31, | |
| | 2021 | 2022 |
|-|-|-|
| Cash provided by (used in): | | |
| Operating activities | $ 46,327 | $ 46,752 |
| Investing activities | (58,154) | (37,601) |
| Financing activities | 6,291 | 9,718 |

</markdown>
"""


def render_pages(pdf_bytes, dpi=common.DPI):
    """PDF をページごとに PNG 画像に変換して返す。DPI が高いほど細部まで読める。"""
    pages = []
    for img in convert_from_bytes(pdf_bytes, dpi=dpi):
        buf = io.BytesIO()
        img.convert("RGB").save(buf, format="PNG")
        pages.append(buf.getvalue())
    return pages


def image_to_markdown(client, model_id, png_bytes):
    """PNG 画像を Claude に渡して Markdown テキストに変換する。トークン使用量も返す。"""
    content = [
        {"image": {"format": "png", "source": {"bytes": png_bytes}}},
        {"text": IMAGE_PROMPT},
    ]
    md, usage = common.converse(client, model_id, content, max_tokens=3500)
    return md, usage


def main():
    """PDF の全ページを画像化して Markdown に変換し、S3 Vectors インデックスに登録する。"""
    ap = argparse.ArgumentParser()
    ap.add_argument("--model", required=True, choices=list(common.MODELS))
    ap.add_argument("--bucket", required=True)
    args = ap.parse_args()

    bedrock = common.bedrock_runtime()
    vectors = common.s3vectors()
    pdf_bytes = common.load_pdf_bytes()

    # 全ページを画像化し、Claude で Markdown に変換してチャンク分割する
    chunks = []
    page_usages = []
    for i, png in enumerate(render_pages(pdf_bytes)):
        print(f"  page {i}: image->markdown")
        md, usage = image_to_markdown(bedrock, common.MODELS[args.model], png)
        page_usages.append(usage)
        chunks.extend(common.chunk_text(md))

    # S3 Vectors は create_index が冪等でないため、再実行時は先に削除してから作成する
    index_name = f"chart-rag-{args.model}"
    try:
        vectors.delete_index(vectorBucketName=args.bucket, indexName=index_name)
    except vectors.exceptions.NotFoundException:
        pass
    vectors.create_index(
        vectorBucketName=args.bucket,
        indexName=index_name,
        dataType="float32",
        dimension=common.EMBED_DIM,
        distanceMetric="cosine",
        metadataConfiguration={"nonFilterableMetadataKeys": ["source_text"]},
    )

    # チャンクを埋め込みベクトルに変換し、500件ずつバッチ投入する（API の上限）
    records = [
        {
            "key": str(uuid.uuid4()),
            "data": {"float32": common.embed(bedrock, c)},
            "metadata": {"source_text": c[:4000]},
        }
        for c in chunks
    ]
    for i in range(0, len(records), 500):
        vectors.put_vectors(
            vectorBucketName=args.bucket,
            indexName=index_name,
            vectors=records[i:i + 500],
        )
    print(f"indexed {len(records)} vectors into {index_name}")

    # ページごとのトークン使用量を集計して JSON に保存する
    total_input = sum(u["input_tokens"] for u in page_usages)
    total_output = sum(u["output_tokens"] for u in page_usages)
    n = len(page_usages)
    ingest_stats = {
        "model": args.model,
        "pages": n,
        "total_input_tokens": total_input,
        "total_output_tokens": total_output,
        "avg_input_tokens_per_page": round(total_input / n),
        "avg_output_tokens_per_page": round(total_output / n),
        "per_page": page_usages,
    }
    out_path = common.RESULTS_DIR / f"approach2-rag_ingest-{args.model}.json"
    common.RESULTS_DIR.mkdir(exist_ok=True)
    out_path.write_text(json.dumps(ingest_stats, ensure_ascii=False, indent=2))
    print(f"ingest stats -> {out_path}")
    print(f"total input: {total_input}, total output: {total_output}, avg/page: {total_input//n}/{total_output//n}")


if __name__ == "__main__":
    main()

IMAGE_PROMPT は元記事 Amazon S3 Vectors を使って図表を含む PDF ファイルでシンプルな RAG を構築の instruction_for_image_parsing() をベースにしています。Step 4 の「画像判定」部分をチャート向けに拡張しており、後述の比較実験を経て最終的に Step 1〜5 の手順指示形式に至りました。

S3 Vectors の create_index は同名インデックスが既存の場合にエラーを返すため、再実行を考慮して delete_index を先に試みる構成にしています。put_vectors は 1 リクエスト 500 件が上限なので、バッチに分けて投入します。

Query 側

scripts/approach2_query.py

"""Approach 2 / Step 2: S3 Vectorsへクエリし、Claudeで回答する。"""
import argparse

import common

ANSWER_PROMPT = """\
あなたはPDFドキュメントのアナリストです。以下の検索結果のみを根拠に質問に回答してください。
検索結果に情報がない場合は「資料内に該当する情報がありません」と答えてください。

## 検索結果
{context}

## 質問
{question}
"""


def retrieve(vectors, bucket, index, query_vec, top_k=5):
    """質問ベクトルで S3 Vectors を検索し、上位 top_k チャンクのテキストを返す。"""
    resp = vectors.query_vectors(
        vectorBucketName=bucket,
        indexName=index,
        topK=top_k,
        queryVector={"float32": query_vec},
        returnMetadata=True,
    )
    return [v["metadata"]["source_text"] for v in resp.get("vectors", [])]


def answer(client, model_id, question, contexts):
    """検索コンテキストを組み込んだプロンプトで Claude に回答させる。"""
    # 複数チャンクを "---" で区切って1つのコンテキストブロックにまとめる
    prompt = ANSWER_PROMPT.format(context="\n---\n".join(contexts), question=question)
    return common.converse(client, model_id, [{"text": prompt}])


def to_markdown(args, rows):
    """回答結果を Markdown レポート形式に変換する。"""
    lines = [f"# Approach 2: RAG (ingest={args.ingest_model}, answer={args.answer_model})", ""]
    for r in rows:
        q = r["question"]
        lines += [
            f"## {q['id']}: {q['question']}",
            "",
            f"観点: {q.get('notes', '')}",
            "",
            f"- in/out tokens: {r['input_tokens']}/{r['output_tokens']}",
            f"- latency: {r['latency_sec']}s",
            "",
            "### 回答",
            r["answer"],
            "",
            "### 検索コンテキスト（先頭120文字）",
        ]
        for i, c in enumerate(r["contexts"]):
            lines.append(f"- [{i}] {c[:120]}...")
        lines.append("")
    return "\n".join(lines)


def main():
    """質問を埋め込み→ベクトル検索→Claude 回答の流れで RAG クエリを実行する。"""
    ap = argparse.ArgumentParser()
    ap.add_argument("--ingest-model", required=True, choices=list(common.MODELS))
    ap.add_argument("--answer-model", required=True, choices=list(common.MODELS))
    ap.add_argument("--bucket", required=True)
    args = ap.parse_args()

    bedrock = common.bedrock_runtime()
    vectors = common.s3vectors()
    # ingest 時に使ったモデルと対応するインデックスを参照する
    index_name = f"chart-rag-{args.ingest_model}"

    rows = []
    for q in common.load_questions():
        print(f"  {q['id']}")
        qv = common.embed(bedrock, q["question"])          # 質問をベクトル化
        contexts = retrieve(vectors, args.bucket, index_name, qv)  # 類似チャンクを取得
        text, m = answer(bedrock, common.MODELS[args.answer_model], q["question"], contexts)
        rows.append({"question": q, "contexts": contexts, "answer": text, **m})

    name = f"approach2-rag_ingest-{args.ingest_model}_answer-{args.answer_model}"
    common.write_results(name, to_markdown(args, rows), rows)
    print(f"results/{name}.md")


if __name__ == "__main__":
    main()

--ingest-model と --answer-model は独立に指定できるようにしています。本記事では両者を揃えて実行します。

実行コマンド

for m in opus-4-7 opus-4-8 fable-5; do
  uv run python scripts/approach2_ingest.py \
    --model $m --bucket bedrock-claude-chart
done

for m in opus-4-7 opus-4-8 fable-5; do
  uv run python scripts/approach2_query.py \
    --ingest-model $m --answer-model $m \
    --bucket bedrock-claude-chart
done

モデルと DPI の組み合わせ比較

Approach 2 の精度を、モデルと DPI の組み合わせで比較します。Ingest 側では、PDF 全ページを画像化し、各モデルで Markdown に変換してチャンク分割します。Query 側では、質問を埋め込んで S3 Vectors から top_k = 5 で検索し、モデルに回答させます。PDF を画像化する際の DPI を 300 と 500 の 2 パターンで比較します。

前回の記事では、DPI を 150 と 300 の 2 パターンで比較しました。今回は、Opus 4.7 以降のモデルで DPI を 300 と 500 の 2 パターンで比較します。

各パターンの回答比較

モデル	DPI	input tokens	output tokens	latency
Opus 4.7	300	3,903	770	10.25s
Opus 4.8	300	3,856	1,478	18.6s
Fable 5	300	4,258	1,341	19.61s
Opus 4.7	500	4,172	984	22.57s
Opus 4.8	500	4,150	1,162	20.86s
Fable 5	500	4,212	861	14.72s

Opus 4.7 + DPI=300の回答

# AI活力ランキング(2023年)について

## 5位の国
**アラブ首長国連邦（United Arab Emirates）** が5位にランクインしています。

## 順位一覧と各国の特徴

| 順位 | 国 | 特に優れている分野 |
|------|-----|-------------------|
| 1位 | アメリカ合衆国 | R&D（研究開発）が非常に高い水準。Responsible AI、Economy（経済）、Infrastructure（インフラ）も高水準 |
| 2位 | 中国 | R&D、Economy（経済）、Infrastructure（インフラ）が高水準 |
| 3位 | イギリス | R&D、Economy、Policy and Governance（政策・ガバナンス）、Infrastructureが中程度 |
| 4位 | インド | R&D、Economy、Infrastructureが中程度 |
| 5位 | アラブ首長国連邦 | Policy and Governance（政策・ガバナンス）、Infrastructureが中程度 |
| 6位 | フランス | Policy and Governance、Infrastructureが中程度 |
| 7位 | 韓国 | Policy and Governance、Infrastructureが中程度 |
| 8位 | ドイツ | Infrastructureが中程度（その他は低水準） |
| 9位 | 日本 | Infrastructureが中程度（その他は低水準） |
| 10位 | シンガポール | Infrastructureが中程度（その他は低水準） |

## 補足
このランキングは、スタンフォード大学HAI（Human-Centered Artificial Intelligence）が2024年11月に発表した2023年のAI活力ランキングに基づいており、以下の8つのカテゴリで評価されています：
- R&D（研究開発）
- Responsible AI（責任あるAI）
- Economy（経済）
- Education（教育）
- Diversity（多様性）
- Policy and Governance（政策・ガバナンス）
- Public Opinion（世論）
- Infrastructure（インフラ）

総合的に見ると、**米国が突出して高い評価**を受けており、中国がそれに次ぐ位置にあります。日本は総合9位で、米国・中国・英国などのリード国から水をあけられている状況です。

Opus 4.8 + DPI=300の回答

# AI活力ランキング（2023年）について

## 5位の国
AI活力ランキングで**5位はアラブ首長国連邦（UAE）**です（相対スコア約34）。

## 全体の順位（上位10カ国）

| 順位 | 国名 | 相対スコア |
|---|---|---|
| 1 | 米国（United States） | 約100 |
| 2 | 中国（China） | 約62 |
| 3 | 英国（United Kingdom） | 約40 |
| 4 | インド（India） | 約38 |
| 5 | アラブ首長国連邦（United Arab Emirates） | 約34 |
| 6 | フランス（France） | 約30 |
| 7 | 韓国（South Korea） | 約28 |
| 8 | ドイツ（Germany） | 約26 |
| 9 | 日本（Japan） | 約24 |
| 10 | シンガポール（Singapore） | 約24 |

## 各国が優れている分野
8つの評価カテゴリ（R&D、責任あるAI、経済、教育、多様性、政策・ガバナンス、世論、インフラ）のスコアをもとに、相対的に評価が高い分野は以下のとおりです。

- **米国（1位）**：R&D（非常に高い）、インフラ（非常に高い）が特に突出。責任あるAI、経済、政策・ガバナンスも高い水準。
- **中国（2位）**：R&D（高い）、経済（高い）が強み。政策・ガバナンス、インフラは中程度。
- **英国（3位）**：R&D、経済、政策・ガバナンス、インフラが中程度で、バランス型。
- **インド（4位）**：R&D、経済、政策・ガバナンスが中程度。
- **アラブ首長国連邦（5位）**：政策・ガバナンス、インフラが中程度。その他は低めの水準。
- **フランス（6位）**：政策・ガバナンス、インフラが中程度。
- **韓国（7位）**：政策・ガバナンス、インフラが中程度。
- **ドイツ（8位）**：政策・ガバナンスが中程度で、他は低めの水準。
- **日本（9位）**：政策・ガバナンスが中程度で、他は低めの水準。
- **シンガポール（10位）**：政策・ガバナンスが中程度で、他は低めの水準。

なお、上位国の中でも特に米国・中国が他国を大きく引き離しており、上位の差が顕著である点が特徴です。

（出典）Stanford University Human-Centered Artificial Intelligence（2024）「Global AI Vibrancy Tool」

※スコアは図表上の相対値・程度の表記に基づく概算値です。

Fable 5 + DPI=300の回答

## 回答

### 5位の国
2023年のAI活力ランキング（スタンフォード大学HAI「Global AI Vibrancy Tool」、2024年11月発表）で**5位はアラブ首長国連邦（UAE）**です。

### 上位10カ国の順位

| 順位 | 国名 |
|-|-|
| 1 | 米国（United States） |
| 2 | 中国（China） |
| 3 | 英国（United Kingdom） |
| 4 | インド（India） |
| 5 | アラブ首長国連邦（UAE） |
| 6 | フランス（France） |
| 7 | 韓国（South Korea） |
| 8 | ドイツ（Germany） |
| 9 | 日本（Japan） |
| 10 | シンガポール（Singapore） |

### 各国の優れている分野（指標別スコアより）

※グラフに軸の数値ラベルがないため、米国の合計を100とした相対値での推定値に基づきます。

- **米国**：R&D（~27）とInfrastructure（~26）が突出して高く、Economy（~19）、Policy and Governance（~14）も高水準
- **中国**：R&D（~20）とInfrastructure（~13）が強み
- **英国**：Policy and Governance（~13）が特に高く、Infrastructure（~9）も比較的高い
- **インド**：R&D（~13）が強く、Economy（~7）、Policy and Governance（~7）も比較的高い
- **UAE**：Economy（~8）、Infrastructure（~6）、Education（~5）が比較的高い
- **フランス**：Policy and Governance（~10）が強く、Infrastructure（~7）、Diversity（~4）も比較的高い
- **韓国**：R&D（~7）、Policy and Governance（~7）が比較的高い
- **ドイツ**：Economy（~5）、Infrastructure（~5）が比較的高い
- **日本**：Policy and Governance（~7）が比較的高く、Economy（~5）、Infrastructure（~5）が続く
- **シンガポール**：Economy（~7）が比較的高く、Policy and Governance（~5）が続く

なお、資料では「日本は総合9位に位置付けられており、米国、中国、英国といった国から水をあけられている」と記載されています。

Opus 4.7 + DPI=500の回答

# AI活力ランキング（2023年）回答

## 5位の国
**アラブ首長国連邦（United Arab Emirates）** が5位です。

## 全順位と各国の優れている分野

スタンフォード大学HAIが発表した2023年のAI活力ランキング上位10カ国は以下のとおりです。各国のスコアから比較的強い分野を示します。

| 順位 | 国名 | 特に優れている分野（スコアが高い分野） |
|------|------|------------------------------------------|
| 1位 | 米国（United States） | R&D（研究開発、~30）、Economy（経済、~25）、Infrastructure（インフラ、~20）など、ほぼ全分野で圧倒的にリード |
| 2位 | 中国（China） | R&D（~18）、Economy（~10）、Infrastructure（~10）が強み |
| 3位 | 英国（United Kingdom） | R&D（~6）、Policy and Governance（政策・ガバナンス、~6）、Infrastructure（~6）がバランス良く高い |
| 4位 | インド（India） | Infrastructure（~7）、Policy and Governance（~6）、R&D・Economy（~5）が強い |
| 5位 | アラブ首長国連邦（United Arab Emirates） | Infrastructure（~8）、Policy and Governance（~6）が特に強い |
| 6位 | フランス（France） | Policy and Governance（~5）、Infrastructure（~5）が中心 |
| 7位 | 韓国（South Korea） | Policy and Governance（~7）が突出して高い、Infrastructure（~5）も強い |
| 8位 | ドイツ（Germany） | Infrastructure（~5）、Policy and Governance（~4）が中心 |
| 9位 | 日本（Japan） | Infrastructure（~6）、Policy and Governance（~4）が比較的高い |
| 10位 | シンガポール（Singapore） | Infrastructure（~6）、Policy and Governance（~4）が中心 |

## 補足
- ランキングの評価カテゴリは8分野：**R&D（研究開発）、Responsible AI、Economy（経済）、Education（教育）、Diversity（多様性）、Policy and Governance（政策・ガバナンス）、Public Opinion（世論）、Infrastructure（インフラ）** です。
- 日本は総合9位で、米国、中国、英国といった上位国から水をあけられている状況であると、本資料では指摘されています。

（出典：Stanford University Human-Centered Artificial Intelligence（2024）「Global AI Vibrancy Tool」、図表Ⅰ-1-2-4）

Opus 4.8 + DPI=500の回答

# AI活力ランキング（2023年）について

## 5位の国
AI活力ランキングで**5位はアラブ首長国連邦（United Arab Emirates）**です。

## 全体の順位と各国の優れた分野

検索結果（図表Ⅰ-1-2-4、スタンフォード大学HAI「Global AI Vibrancy Tool」）に基づき、上位10カ国と各国でスコアが高い（優れている）分野を以下にまとめます。

| 順位 | 国 | 特に優れている分野（スコアが高い項目） |
|-|-|-|
| 1位 | アメリカ（United States） | R&D（約22）、Economy（経済、約20）、Infrastructure（インフラ、約16）など総合的に突出 |
| 2位 | 中国（China） | R&D（約14）、Economy（約9）、Policy and Governance（約7） |
| 3位 | イギリス（United Kingdom） | R&D（約5）、Policy and Governance（約5）、Infrastructure（約5） |
| 4位 | インド（India） | R&D（約6）、Infrastructure（約6）、Policy and Governance（約5） |
| 5位 | アラブ首長国連邦（United Arab Emirates） | Infrastructure（約7）、Policy and Governance（約5） |
| 6位 | フランス（France） | Infrastructure（約5）、Policy and Governance（約4） |
| 7位 | 韓国（South Korea） | Policy and Governance（約7） |
| 8位 | ドイツ（Germany） | R&D（約3）、Economy（約3） |
| 9位 | 日本（Japan） | R&D（約3）、Economy（約3）、Infrastructure（約3） |
| 10位 | シンガポール（Singapore） | 各分野が比較的バランス（Policy and Governance、Infrastructure 約3など） |

## 補足
- 評価項目は、R&D（研究開発）、Responsible AI（責任あるAI）、Economy（経済）、Education（教育）、Diversity（多様性）、Policy and Governance（政策・ガバナンス）、Public Opinion（世論）、Infrastructure（インフラ）の8つです。
- 日本は総合9位に位置付けられており、米国・中国・英国といった国から水をあけられている、と本文に記載されています。

※各分野のスコアは図表上の概数値（「～」付き）に基づくものです。

Fable 5 + DPI=500の回答

# AI活力ランキング（2023年）について

## 5位の国

**5位はアラブ首長国連邦（United Arab Emirates）**です。

## 上位10カ国の順位一覧

スタンフォード大学HAIが2024年11月に発表した2023年のAI活力ランキング（図表Ⅰ-1-2-4）によると、順位は以下のとおりです。

| 順位 | 国 | 合計スコア（米国を~100とした相対値） |
|-|-|-|
| 1位 | 米国（United States） | ~100 |
| 2位 | 中国（China） | ~48 |
| 3位 | 英国（United Kingdom） | ~32 |
| 4位 | インド（India） | ~30 |
| **5位** | **アラブ首長国連邦（United Arab Emirates）** | ~28 |
| 6位 | フランス（France） | ~26 |
| 7位 | 韓国（South Korea） | ~24 |
| 8位 | ドイツ（Germany） | ~22 |
| 9位 | 日本（Japan） | ~21 |
| 10位 | シンガポール（Singapore） | ~21 |

## 各国が優れている分野（スコアの高い指標）

※軸の数値ラベルが表示されていないため、推定値に基づく相対的な傾向です。

- **米国**：R&D（~25）とInfrastructure（~24）が突出して高く、Economy（~18）も高い。全指標でバランスよく首位
- **中国**：R&D（~17）とInfrastructure（~12）が強み
- **英国**：Policy and Governance（~11）が特に高い
- **インド**：R&D（~7）、Economy（~6）、Policy and Governance（~7）が比較的高い
- **アラブ首長国連邦**：Infrastructure（~7）、Education（~5）、Policy and Governance（~5）が強み
- **フランス**：Infrastructure（~8）が最も高い
- **韓国**：Infrastructure（~5）、Policy and Governance（~5）、Economy（~4）が比較的高い
- **ドイツ**：Economy（~4）、Policy and Governance（~4）、Infrastructure（~4）が比較的均等
- **日本**：Policy and Governance（~6）が最も高い
- **シンガポール**：Economy（~5）とPolicy and Governance（~5）が強み

なお、検索結果によれば、日本は総合9位に位置付けられており、米国、中国、英国といった国から水をあけられている状況です。

前回の記事では、Opus 4.7 + DPI=300 の組み合わせで回答に総合スコアの推定値が含まれていましたが、今回の検証では含まれませんでした。Opus 4.8 では全体の順位内に相対スコアが含まれ、Fable 5 では各国が優れている分野の相対スコアが含まれる結果となりました。DPI=500 の場合は、表現の違いはありますが 3 つのモデルでチャートに基づくスコアを推定値として回答しています。

出力内容に確率的な揺らぎがあるものの、Opus 4.8 と Fable 5 ではチャートの内容を読み取る能力が向上しているようです。

まとめ

本記事では、Amazon Bedrock 上の Claude Opus 4.7、Opus 4.8、Fable 5 の 3 モデルで、PDF 内のチャート画像をどこまで詳細に読み取れるかを検証しました。冒頭で挙げた 3 つの観点に沿って結果を整理します。

Opus 4.8 と Fable 5 が Opus 4.7 と比べてチャートをどこまで詳細に読めるか

Approach 1（PDF 直渡し）では、Opus 4.7 が 8 つの評価カテゴリーごとに各国の傾向を分析して回答したのに対し、Opus 4.8 と Fable 5 は順位表の提示にとどまり、カテゴリー別の読み取りまでは踏み込みませんでした。新しいモデルほど詳細に読み取れると想定していましたが、少なくとも今回の単発実行ではその想定どおりにはなりませんでした。一方 Approach 2（RAG）では、3 モデルともチャートに基づく推定スコアを回答できています。モデル単体の能力差というよりも、PDF をどう入力するかが結果を大きく左右しました。
PDF を丸ごと渡す方式（Approach 1）と、画像化して RAG を構築する方式（Approach 2）の違い

Approach 1 では PDF 全体を 1 度に渡すため input tokens が約 77,000 と大きく、回答もページ全体の文脈を踏まえたものになりました。Approach 2 では、ページを画像化して Markdown に変換する段階でチャートを構造化テーブルへ落とし込みます。そのため後段の回答では、カテゴリー別スコアのような細部まで安定して引き出せました。チャートの細部まで読み取らせたい場合は、画像化と構造化変換を挟む Approach 2 の方が扱いやすいと言えます。
Approach 2 における DPI の影響

DPI 300 では総合スコアや分野別スコアの推定が含まれるかどうかがモデルによってばらつきましたが、DPI 500 では表現の違いこそあれ 3 モデルとも推定スコアを伴う回答になりました。解像度を上げるという比較的低コストな調整が、チャート読み取りの安定性に効くことが確認できました。

今回の検証にはいくつかの制約があります。結果はいずれも単発実行であり、Opus 4.7 以降は temperature を制御できないため、再実行で回答が変わる可能性があります。また、2026 年 6 月 12 日に Fable 5 へのアクセスが停止されたため、Fable 5 については数回分のデータでの比較にとどまりました。傾向の参考にはなりますが、確定的な結論として扱うには追試が必要です。

実務の観点からは、以下のようなことが言えます。

新しいモデルが常に特定タスクで優れるとは限らないため、自分のユースケースで検証することが欠かせない
チャートを多く含む PDF を扱う場合、PDF を直接渡すよりも、高めの DPI で画像化し、構造化を指示するプロンプトを通して RAG を構築する方が、細部の読み取り精度を確保しやすい

クリーンアップ

検証終了後は S3 Vectors のインデックスとバケットを削除します。

for m in opus-4-7 opus-4-8 fable-5; do
  aws s3vectors delete-index \
    --vector-bucket-name bedrock-claude-chart \
    --index-name chart-rag-$m \
    --profile YOUR_PROFILE --region us-east-1
done

aws s3vectors delete-vector-bucket \
  --vector-bucket-name bedrock-claude-chart \
  --profile YOUR_PROFILE --region us-east-1

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up