Azure OpenAIにてgpt-4v（マルチモーダル gpt-4-vision）が利用可能に

Posted at 2023-12-18

Azure OpenAIについにgpt-4vが来た

ずっと待ち望んでいたgpt-4v（マルチモーダル）のAzure版が利用可能になったため、早速検証。

利用可能リージョン

2023/12/18時点では下記のリージョンでのみ利用可能とのこと。順次拡大すると思われる。

Auztralia East
Sweden Central
Switzerland North
West US

デプロイ

利用可能リージョンにてgpt-4モデルを新規デプロイし、バージョンで「vision-preview」を選択する。

キーとエンドポイント、デプロイ名を控えておく。

API実行

今のところPOSTリクエストの構造のみ公開されているので下記を参考にする。
（ただしスクラッチでうまくいかなかったので、プレイグラウンドにて動作したソースをもとにしている。）
https://learn.microsoft.com/ja-jp/azure/ai-services/openai/how-to/gpt-with-vision

画像ファイルをbase64エンコードしてASCII文字列化。（おそらくこの部分がNGだった模様。）
あとはJSONを組み立てて、イメージを突っ込む。（上記リンクのドキュメントだと"image"で渡しているが、プレイグラウンドだと下記のようにimage_urlで渡していた）

cat.jpegはこちら。

import os
import requests
import base64

# Configuration
GPT4V_KEY = API_KEY
GPT4V_ENDPOINT = ENDPOINT

IMAGE_PATH = "./cat.jpeg"
encoded_image = base64.b64encode(open(IMAGE_PATH, 'rb').read()).decode('ascii')
headers = {
    "Content-Type": "application/json",
    "api-key": GPT4V_KEY,
}

# Payload for the request
payload = {
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "You are an AI assistant that helps people find information."
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "describe this picture in Japanese."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{encoded_image}"
          }
        }
      ]
    },
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 800
}


# Send request
try:
    response = requests.post(GPT4V_ENDPOINT, headers=headers, json=payload)
    response.raise_for_status()  # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
except requests.RequestException as e:
    raise SystemExit(f"Failed to make the request. Error: {e}")

# Handle the response as needed (e.g., print or process)
print(response.json())

{'id': 'chatcmpl-8WzK4mptElPuTMAtWDNOpa16vpTIg', 'object': 'chat.completion', 'created': 1702872912, 'model': 'gpt-4', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'choices': [{'finish_details': {'type': 'stop', 'stop': '<|fim_suffix|>'}, 'index': 0, 'message': {'role': 'assistant', 'content': 'この写真は、柔らかい光の中でリラックスしている猫を捉えています。猫は床に横たわり、前足を伸ばしています。その毛皮は茶色がかったベージュ色で、縞模様の尾が見えます。大きな緑色の目が特徴的で、穏やかな表情をしています。背景はぼんやりしていて、猫が主役のこの写真にやさしい雰囲気を与えています。'}, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'usage': {'prompt_tokens': 1473, 'completion_tokens': 158, 'total_tokens': 1631}}

応答テキストだけ抜き出すには以下のようにする。

print(response.json()['choices'][0]['message']['content'])

この写真は、柔らかい光の中でリラックスしている猫を捉えています。猫は床に横たわり、前足を伸ばしています。その毛皮は茶色がかったベージュ色で、縞模様の尾が見えます。大きな緑色の目が特徴的で、穏やかな表情をしています。背景はぼんやりしていて、猫が主役のこの写真にやさしい雰囲気を与えています。

感想

GeminiはLangchainが既に対応していたが、AOAIは執筆時点では対応していない模様。（すぐに来るだろうけど）
これで心置きなくマルチモーダル系の検証を行うことができるので、情報も爆発的に増えていくだろうと期待！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up