More than 1 year has passed since last update.

StableDiffusionXLの画像生成APIをCompute Engineに作る

Posted at 2024-04-23

背景

以前 Compute EngineにHuggingFaceのDiffusers 載せて StableDiffusionの画像生成API作ってみるでstable-diffusion-v1.5でAPI作ったので、今回はstable-diffusion-XLで作ります
今回も77token以上を取り扱うためCompelを使います

環境

Compute Engine
machine type: n1-highmem-8
1 x NVIDIA T4
起動イメージ: Google, Deep Learning VM with CUDA 11.8, M116, Debian 11, Python 3.10. With CUDA 11.8 preinstalled

実装

インストール

# 80ポートで動かしたいのでsudoでやっていきます
sudo pip install -q diffusers transformers accelerate invisible-watermark>=0.2.0 compel

実装

main.py

from flask import Flask, request
import base64
from compel import Compel, ReturnedEmbeddingsType
import torch
from diffusers import DiffusionPipeline


pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", variant="fp16", use_safetensors=True, torch_dtype=torch.float16).to("cuda")
compel = Compel(tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2] , text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2], returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED, requires_pooled=[False, True])

app = Flask(__name__)

@app.route("/create", methods=['POST'])
def create():
    data_dict = request.get_json()
    prompt = data_dict['prompt']

    # ネガティブプロンプトの追加
    negative_prompt = "YOUR_NEGATIVE_PROMPT"
    positive_scale = 7.5  # ポジティブなプロンプトの制御強度
    negative_scale = 5.0  # ネガティブプロンプトの制御強度

    # コンディショニングテンソルの作成
    conditioning, pooled = compel(prompt)
    negative_conditioning, pooled_negative_conditioning = compel(negative_prompt)

    # ガイダンススケールを使用して、プロンプトのポジティブおよびネガティブ効果を調整
    image = pipeline(
        prompt_embeds=conditioning,
        pooled_prompt_embeds=pooled,
        negative_prompt_embeds=negative_conditioning,
        negative_pooled_prompt_embeds=pooled_negative_conditioning,
        width=1024, height=1024,
        num_inference_steps=20,
        guidance_scale=positive_scale,
        negative_guidance_scale=negative_scale
    ).images[0]
    image.save('tmp.png')
    with open("tmp.png", "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
    return {"b64_json": encoded_string}

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=80)

起動

sudo gunicorn --bind :80 --workers 8 --timeout 0 main:app

テスト

promptの画像が hoge.png に保存されます.

import requests
import base64
import json

prompt = """"
Image of a cat in the park.
"""

r = requests.post(
    'http://IP_ADDRESS/create',
    headers = {'Content-Type': 'application/json'},
    data=json.dumps({'prompt': prompt})
)

image_bytes = base64.b64decode(r.json()['b64_json'])
with open('hoge.png', 'wb') as file:
    file.write(image_bytes)

まとめ

stable-diffusion-XLの実装もv1.5の実装とそう変わらないので結構サクッとできました
v1.5とXLではcompelの使い方が変わるためそこに少しハマりました
refinerや最適化の実装はまだできていないので、その辺りも少しずつやっていく必要があります
v1.5に比べXLでは格段に画像の品質が良くなりました

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up