More than 1 year has passed since last update.

OpenAIの新しいAPIをStreamlitで使ってみた

Last updated at 2024-02-10Posted at 2023-11-10

はじめに

2023年11月6日にサンフランシスコで開催されたOpenAI Dev Day 2023。

GPTsなど、ノーコードで利用できるサービスなどもありましたが、今回は「New modalities in the API」で紹介された新しいAPIのうち以下をStreamlitで実装してみました。（できる限り最小構成にしています。）

GPT-4 Turbo with 128K context(チャット)
GPT-4 Turbo with vision(画像入力)
DALL·E3(画像生成)
Text to speech(音声合成)

※openaiのPythonライブラリも変更が入っているようで、リクエストに対するレスポンスも依然と少し変わっていたので対応しています。

各機能の実装内容

公式のドキュメントをベースに実装しています。

コード全体はGitHubを参照ください。

GPT-4 Turbo with 128K context(チャット)

ここはモデルが強化されていますが基本的な構成は従来と同じです。
レスポンスの形式が変わっているので、「標準のチャットUIでストリーム表示する場合」として実装しています。

    if mode == "チャット":
        # GPT-4 Turbo Example
        st.header("GPT-4 Turbo チャット")
        user_prompt = st.chat_input("user:")
        assistant_text = ""

        for text_info in st.session_state.all_text:
            with st.chat_message(text_info["role"], avatar=text_info["role"]):
                st.write(text_info["role"] + ":\n\n" + text_info["content"])

        if user_prompt:
            with st.chat_message("user", avatar="user"):
                st.write("user" + ":\n\n" + user_prompt)

            st.session_state.all_text.append({"role": "user", "content": user_prompt})

            if len(st.session_state.all_text) > 10:
                st.session_state.all_text.pop(1)

            response = openai.chat.completions.create(
                model="gpt-4-1106-preview",
                messages=st.session_state.all_text,
                stream=True,
            )
            with st.chat_message("assistant", avatar="assistant"):
                place = st.empty()
                for chunk in response:
                    content = chunk.choices[0].delta.content
                    if content:
                        assistant_text += content
                        place.write("assistant" + ":\n\n" + assistant_text)

            st.session_state.all_text.append(
                {"role": "assistant", "content": assistant_text}
            )

GPT-4 Turbo with vision(画像入力)

ここだけ、ライブラリが未対応？のようでrequestsを使って直接リクエストしています。
（チャットとまとめたかったんですが、ストリームをうまく扱えず、、）
実際には複数の画像入力にも対応できるようですが、以下の例は1枚のみです。

    if mode == "画像入力":
        # Image Input Example
        st.header("画像入力")
        uploaded_file = st.file_uploader(
            "Upload an image to analyze", type=["jpg", "jpeg", "png"]
        )
        prompt = st.text_input("Enter your prompt:")
        if uploaded_file:
            st.image(uploaded_file)
            payload = {
                "model": "gpt-4-vision-preview",
                "messages": [
                    {
                        "role": "user",
                        "content": [
                            {"type": "text", "text": prompt},
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/jpeg;base64,{base64.b64encode(uploaded_file.getvalue()).decode()}"
                                },
                            },
                        ],
                    }
                ],
                "max_tokens": 300,
            }
        if st.button("Analyze Image"):
            if uploaded_file:
                response = requests.post(
                    "https://api.openai.com/v1/chat/completions",
                    headers={"Authorization": f"Bearer {openai.api_key}"},
                    json=payload,
                ).json()
                st.write(response["choices"][0]["message"]["content"])

DALL·E3(画像生成)

    if mode == "画像生成":
        # Image Generation Example
        st.header("画像生成")
        image_prompt = st.text_input("Enter your prompt:")
        hight = st.number_input("hight", value=512, min_value=128)
        width = st.number_input("width", value=512, min_value=128)
        if st.button("Generate Image"):
            if image_prompt:
                response = openai.images.generate(
                    prompt=image_prompt,
                    size=f"{hight}x{width}",
                    quality="standard",
                    n=1,
                )
                image_url = response.data[0].url
                st.image(image_url)

Text to speech(音声合成)

    if mode == "音声合成":
        st.header("音声合成")
        audio_prompt = st.text_input(
            "Enter your prompt:", value="エンジニアにとって再利用性・汎用性の高い情報が集まる場をつくろう"
        )
        model = st.selectbox("Model", options=["tts-1", "tts-1-hd"])
        voice = st.selectbox(
            "Voice", options=["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
        )

        if audio_prompt:
            response = openai.audio.speech.create(
                model=model,
                voice=voice,
                input=audio_prompt,
            )

            # Convert the binary response content to a byte stream
            byte_stream = io.BytesIO(response.content)

            st.audio(byte_stream)

おわりに

簡単な例ですが使ってみました。
今回、AssistantAPIは試していませんが、今後APIが強化されていけばLangChainなどの周辺ライブラリも不要になってくるのかも？？

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up