More than 1 year has passed since last update.

Claude3に雑に頼むと行間を埋めてStableDiffusionで絵を描いてくれるチャットアプリ

Last updated at 2024-03-14Posted at 2024-03-10

賢いClaude3を使って、描いて欲しい絵をClaude3に頼むと足りない情報を聞き返してくれて、「もういいかな」となれば行間を埋めながらStableDiffusionで絵にしてくれるアプリを作ってみます。

以下、頼んだ内容と描いてもらった絵

日本のオフィスで自撮りをする地鶏
日本のアニメのようなタッチで

自撮り成分はあるようなないような

K-POP風の美しい女性がダンスをしている写真

なんかケバくなった

ドクタースランプの新しいキャラ。横長で。

前提作業

DynamoDBにSessionTableを作成してPython実行環境からの権限を付与
Amazon Bedrockのclaude3とStableDiffusionへのモデルアクセクの有効化と、Python実行環境からの権限を付与
必要ライブラリのインストール

pip install -U boto3 langchain langchain-community streamlit streamlit_authenticator numpy

途中のClaude3とのチャット部分はLangChainを使っています。

作ったもの

ちょっと長くなりました。

hogehoge.py

from langchain.globals import set_debug
set_debug(False) # debug時はTrue

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage
from langchain_community.chat_message_histories import DynamoDBChatMessageHistory
from langchain_community.chat_models import BedrockChat
import streamlit as st
import streamlit_authenticator as sa
import boto3
import json
import re
import io
import base64
from PIL import Image
import numpy as np

# 認証情報を作成
authenticator = sa.Authenticate(
    credentials={"usernames":{
        "user1":{"name":"user1","password":"pass"},
        "user2":{"name":"user2","password":"pass"},
        "user3":{"name":"user3","password":"pass"}}},
    cookie_name="streamlit_cookie",
    key="signature_key",
    cookie_expiry_days=1
)

# ログイン画面描画
authenticator.login()

if st.session_state["authentication_status"] is False: #ログイン失敗
    st.error('Username/password is incorrect')
elif st.session_state["authentication_status"] is None: #未入力
    st.warning('Please enter your username and password')
else: #ログイン成功

    authenticator.logout(location="sidebar") #ログアウトボタンをサイドバーに表示

    st.title("Claude3 SonnetがStableDiffusionXL 1.0で絵を描きます")

    # ログイン直後っぽい時は過去のメッセージをクリアする
    if "FormSubmitter:Login-Login" in st.session_state:
        if "messages" in st.session_state:
            st.session_state["messages"].clear()

    # usernameをセッションIDのKeyとする
    session_id = st.session_state["username"]

    # DynamoDBの会話履歴（テーブル名"SessionTable"、TTL=3600秒）
    message_history = DynamoDBChatMessageHistory(table_name="SessionTable", session_id=session_id, ttl=3600)

    # プロンプトテンプレートを作成
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system","""
                あなたは人間のリクエストを元に、StableDiffusionXL1.0に適した英語のプロンプトを考えます。
                人間が希望するユニークな画像が出来ると思うまで人間に対して質問を返します。
                ユニークな画像が出来ると思ったら、<prompt></prompt>タグの中にStableDiffusionXL1.0形式の英語のプロンプトを出力します。StableDiffusionXL1.0のプロンプトは絵の特徴毎に細かくカンマ(,)で区切って出力してください。文章になっている必要はありません。
                また、プロンプトを出力する前に、スタイルについて必ず質問します。
                スタイルに指定可能なのは次のもののみです。3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture.
                スタイルについて質問する時は意味の説明もしてください。
                スタイルは<style></style>タグの中に出力します。<prompt>タグを出力する際は必ず<style>タグも出力します。<style>タグに指定可能なものは次のもののみです。3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture.
                サイズについて希望があった場合は、次のもののなかから選択してもらいます。1024x1024, 1152x896, 1216x832, 1344x768, 1536x640, 640x1536, 768x1344, 832x1216, 896x1152.
                サイズは<size></size>タグの中に出力します。<size>タグに指定可能なものは次のもののみです。1024x1024, 1152x896, 1216x832, 1344x768, 1536x640, 640x1536, 768x1344, 832x1216, 896x1152.サイズに関して希望が無い限り<size>タグは出力しません。
            """),
            MessagesPlaceholder(variable_name="messages"),  #ここにDynamoDBから取得した会話履歴を入れる
            MessagesPlaceholder(variable_name="human_message")
        ]
    )

    # LLMの定義
    LLM = BedrockChat(
        model_id="anthropic.claude-3-sonnet-20240229-v1:0",
        model_kwargs={"max_tokens": 1000},
    )

    # chainの定義
    chain = prompt | LLM

    # セッション上の画面表示用チャット履歴を初期化する
    if "messages" not in st.session_state:
        st.session_state["messages"] = []

    # セッション上のこれまでのチャット履歴を全て表示する 
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.markdown(message["content"])

    # 入力を求める
    if input_text := st.chat_input("どんな絵が必要ですか？"):

        # ユーザの入力を表示する
        with st.chat_message("user"):
            st.write(input_text)

        # HumanMessageの組み立て
        content = []

        # 入力テキストをcontent配列に追加
        content_text = {"type": "text", "text": input_text}
        content.append(content_text)

        # chainの実行
        llm_result = chain.invoke({"messages": message_history.messages, "human_message": [HumanMessage(content=content)]})

        # セッション上のチャット履歴の更新
        st.session_state.messages.append({"role": "user", "content": input_text})
        st.session_state.messages.append({"role": "assistant", "content": llm_result.content})

        # ChatBotの返答を表示する 
        with st.chat_message("assistant"):
            st.write(llm_result.content)

        # <prompt>タグが出力されているかを確認
        llm_result_str = llm_result.content.replace("\n","")
        sdxl_prompt = re.search(r'<prompt>(.+)</prompt>',llm_result_str)

        # <prompt>タグが出力されている場合、StableDiffusionの呼出を行う
        if sdxl_prompt:

            # style_presetの取得
            style_preset = re.search(r'<style>([a-z0-9\-]+)</style>',llm_result_str).group(1)

            # サイズの取得
            width=1024  # 初期値
            height=1024  # 初期値
            size = re.search(r'<size>([a-z0-9\-]+)</size>',llm_result_str)
            if size:
                size = size.group(1)
                width = int(size[:size.index("x")])
                height = int(size[size.index("x")+1:])

            # StableDiffusionXLの定義とプロンプトの定義
            bedrock = boto3.client(service_name='bedrock-runtime')
            body=json.dumps({
                "text_prompts": [
                    {
                       "text": sdxl_prompt.group(1)
                    }
                ],
                "cfg_scale": 35,
                "width": width,
                "height": height,
                "seed": 0,
                "steps": 50,
                "samples" : 1,
                "style_preset" : style_preset
            })

            # StableDiffusionXLの呼出
            SDXL_response = bedrock.invoke_model(body=body, modelId="stability.stable-diffusion-xl-v1")
            SDXL_response_body = json.loads(SDXL_response.get("body").read())
            SDXL_response_image = Image.open(io.BytesIO(base64.decodebytes(bytes(SDXL_response_body["artifacts"][0].get("base64"), "utf-8"))))
            img_array = np.array(SDXL_response_image)
            st.image(img_array) # 画像の出力

            # 画像出力後の応答
            with st.chat_message("assistant"):
                st.write("これでどうでしょうか")

            # セッション上のチャット履歴の更新
            st.session_state.messages.append({"role": "assistant", "content": "これでどうでしょうか"})

        # DynamoDBのチャット履歴の更新(画像は登録しない)
        message_history.add_user_message(input_text)
        message_history.add_ai_message(llm_result.content)

プロンプトがクドい部分は、Claude3が上手く言う事を聞いてくれなかった時に足した結果です。
なんか落ちた場合はプロンプトを足してください、、

Haikuを使用する場合はモデルIDをanthropic.claude-3-haiku-20240307-v1:0に変更してください

起動する

python -m streamlit run hogehoge.py

user1/passかuser2/passかuser3/passでログインしてください。

使ってみる

初期状態

「猫の絵」とだけ言ってみます。何個か質問された事に答えます。
※赤の人間アイコンがこちら側で、黄色のロボットアイコンがClaude3です。

なんか出てきました。

生成されたプロンプトを見るとこんな感じです。

1 cat, cute, big eyes, anime style, detailed face, sitting, looking at viewer, white fur with black spots, environment is a cozy room with window showing cherry blossom trees outside, warm lighting

（DeepL 翻訳）
1匹の猫、かわいい、大きな目、アニメ風、詳細な顔、座っている、見る人を見ている、黒い斑点のある白い毛、環境は居心地の良い部屋で、窓からは外の桜の木が見える、暖かい照明

なんか適当に書いてくれました。

絵を変えるように少しリクエストをしてみます。

1 scottish fold cat, cute, big round eyes, anime style, highly detailed face and fur, sitting on plush couch, looking at viewer, gray folded ears, soft blue-gray fur, environment is a luxurious japanese apartment interior with large window showing cityscape outside, warm ambient lighting, panoramic composition

(DeepL翻訳)
1匹のスコティッシュフォールドキャット、かわいい、大きな丸い目、アニメスタイル、非常に詳細な顔と毛皮、豪華なソファに座って、見る人を見ている、グレーの折れ耳、柔らかいブルーグレーの毛皮、環境は豪華な日本のアパートの室内、大きな窓から外の街並みが見える、暖かい周囲の照明、パノラマ構図

また絵を変えるようにリクエストをしてみます。

scottish fold cat, photorealistic, highly detailed fur and features, big round eyes, sitting on plush velvet couch, looking directly at viewer, gray folded ears, soft blue-gray fur, luxurious japanese apartment interior, large window showing modern cityscape outside, warm ambient lighting streaming in

(DeepL翻訳)
スコティッシュフォールドキャット、写実的、非常に詳細な毛並みと特徴、大きな丸い目、豪華なベルベットのソファに座り、見る人を直接見ている、グレーの折れ耳、柔らかいブルーグレーの毛並み、豪華な日本のアパートの室内、大きな窓から見える外のモダンな街並み、暖かな周囲の照明が差し込む中

別の絵を描いてもらいます。

Illustration of a bustling izakaya alley in Shimbashi, Tokyo, narrow street lined with small Japanese pubs and restaurants, some with tables and chairs set up on the street, groups of salary workers and friends drinking and socializing outside, warm inviting lighting from shop interiors spilling onto the alley, capturing the lively atmosphere of a Tokyo night out

(Google翻訳)
東京・新橋のにぎやかな居酒屋横丁のイラスト。小さな居酒屋やレストランが並ぶ狭い通りで、路上にテーブルと椅子が設置されている店もある。サラリーマンや友人のグループが外で飲んだり社交したり、店内からこぼれる温かみのある居心地の良い照明東京の夜の賑やかな雰囲気を再現した路地へ

このようにClaude3が適当に行間を埋めながら絵を描いてくれるようになりました。絵心の無い私にはありがたいですが、まだちょっと絵にクセがあります。StableDiffusionのプロンプトの指定の仕方をもっとちゃんとシステムプロンプトに入れておけば、より良い感じの絵を描いてくれるようになると思います。

インターネットに公開したい場合

自己責任で以下

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up