More than 1 year has passed since last update.

Azure OpenAI + gpt-4v + streamlitで簡易画像アップロード解説アプリ

Last updated at 2023-12-19Posted at 2023-12-19

はじめに

GPT-4Vを使い倒すにあたり、streamlitのアップロード機能を使ってAzure OAI GPT-4Vと連携させるアプリを作ってみる。画像をアップロードするとgpt-4vへのリクエストを投げて、画像の解説をさせるというシンプルなもの。

画面イメージ

サイドバーに画像をドラッグアンドドロップすると、メイン画面にその画像とgpt-4vの画像説明を表示する。またプロンプトは必要に応じて変更できるようにする。

コード

とてもシンプル。streamlitの部品を使う点と、アップロードした画像ファイルのパスが欲しいので、一度tempに保存する。あとは前回記事のとおりリクエストに組み立てて投げるだけ。最後に画像と結果を表示する。

import streamlit as st
import os
import requests
import base64
from PIL import Image
import tempfile

# Configuration
GPT4V_KEY = ""
GPT4V_ENDPOINT = "https://{RESOURCE_NAME}.openai.azure.com/openai/deployments/{DEPLOYMENT_NAME}/chat/completions?api-version=2023-07-01-preview"


# Streamlitアプリケーションのタイトル
st.title("gpt4vマルチモーダルテスター")


# プロンプト
prompt = st.sidebar.text_area("prompt", value="describe this picture in Japanese.")

# ファイルのアップロード
uploaded_file = st.sidebar.file_uploader("ファイルをアップロードしてください", type=["png", "jpg", "jpeg"])




if uploaded_file is not None:
    
    temp_dir = tempfile.mkdtemp()
    path = os.path.join(temp_dir, uploaded_file.name)
    with open(path, "wb") as f:
        f.write(uploaded_file.getvalue())
        
    encoded_image = base64.b64encode( open(path,'rb').read()).decode('ascii')

    headers = {
        "Content-Type": "application/json",
        "api-key": GPT4V_KEY,
    }

    # Payload for the request
    payload = {
        "messages": [
            {
            "role": "system",
            "content": [
                {
                "type": "text",
                "text": "You are an AI assistant that helps people find information."
                }
            ]
            },
            {
            "role": "user",
            "content": [
                {
                "type": "text",
                "text": prompt
                },
                {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{encoded_image}"
                }
                }
            ]
            },
        ],
        "temperature": 0.7,
        "top_p": 0.95,
        "max_tokens": 800
    }

    #print(payload)

    # Send request
    try:
        response = requests.post(GPT4V_ENDPOINT, headers=headers, json=payload)
        response.raise_for_status()  # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
    except requests.RequestException as e:
        raise SystemExit(f"Failed to make the request. Error: {e}")

    # Handle the response as needed (e.g., print or process)
    print(response.json())
    
    # 結果表示    
    st.image(Image.open(path))
    st.write(response.json()['choices'][0]['message']['content'])

プロンプト応用

コード生成指示に変える。

適当にgoogleのホームページのキャプチャ画像を保存してuploadしてみる。

こんな感じでプロンプト指示に従ってキャプチャ画像と同等のコードを作成させることが可能。

補足

たまに画像の説明がうまくされないことがある。（画像サイズが小さい場合？）
AOAIのgpt-4vのレスポンスの問題。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up