BedrockのConverseAPIは構造化PDFをどう読めているのか？

Posted at 2024-07-08

BedrockのConverseAPIがいつの間にか各種ファイル（ドキュメント等）をバイナリで受け取る事が出来るようになったので、（表を含む）構造化PDFがどう読めているのか？というのを確認したいと思います。

先駆者の方々

検証対象ファイル

以下のような悪意に満ちたPDFファイルを使用します。Excelで作成したものをPDF化しています。

検証プログラム

import boto3
import streamlit as st

use_model = st.selectbox("使用するモデルを選択してください",("Sonnet","Haiku"))
pdf_file = st.file_uploader("PDFファイルアップロード", type='pdf')
user_prompt = st.text_area("ユーザープロンプト")
send_button = st.button("送信")

# 送信ボタンを押下
if send_button and pdf_file and user_prompt:

    if use_model == "Sonnet":
        modelId = "anthropic.claude-3-sonnet-20240229-v1:0"
    elif use_model == "Haiku":
        modelId = "anthropic.claude-3-haiku-20240307-v1:0"

    bedrock = boto3.client('bedrock-runtime')

    # テスト用にPDF決め打ちでパラメータ作成
    document = {"document": {"format": "pdf", "name": "string", "source": {"bytes": pdf_file.read()}}}
    messages = [{"role": "user", "content": [{"text": user_prompt},document]}]
    inferenceConfig = {"maxTokens": 4000}

    # Bedrockの呼び出し
    response = bedrock.converse(modelId=modelId, messages=messages, inferenceConfig=inferenceConfig)
    
    # Bedrock呼出し結果の抽出
    answer = response["output"]["message"]["content"][0]["text"]

    # 結果の出力
    st.write(answer)

実行結果

ああなるほど。。。
構造情報は保持して無さそうだなと思いつつ、細かく聞いてみます。

まあそうなりますよね。

ですよね。

ちなみにAnthropic謹製Claudeアプリの場合

Claudeアプリの方が優秀という事は無さそうです。

所感

PDFの構造化情報を維持したい場合はPDFで直接読ませるのではなく事前処理でちょっと小細工した方が良さそうです。
プレーンな文章を読み込むのみであれば簡単に使えますね。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up