Stability AIのdepth-to-imageモデルをDatabricksで動かしてみる

Posted at 2024-11-26

深度情報を維持して画像を生成できるんですね。

StableDiffusionDepth2ImgPipelineによって、新規画像生成に条件を加えるためにテキストのプロンプトと初期画像を入力する事ができます。さらに、画像の構造を保持するためのdepth_mapを指定する事ができます。depth_mapが指定されない場合、今パイプラインは、インテグレーションされているdepth-estimation modelを通じて、自動で深度を予測します。

2年前に発表されていたそうで。気付いてませんでした。

クラスターはこちら。

%pip install mlflow==2.17.2 diffusers==0.31.0
dbutils.library.restartPython()

# huggingfaceへのログイン
from huggingface_hub import login

login(token="<Hugging Faceのアクセストークン>")

StableDiffusionDepth2ImgPipelineのインスタンスを作成します。

import torch
from diffusers import StableDiffusionDepth2ImgPipeline
from diffusers.utils import load_image, make_image_grid

pipeline = StableDiffusionDepth2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-depth",
    torch_dtype=torch.float16,
    use_safetensors=True,
).to("cuda")

これで、パイプラインにプロンプトを入力できるようになりました。特定の用語が画像生成をガイドしないように、negative_promptを指定することもできます:

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = load_image(url)
prompt = "two tigers"
negative_prompt = "bad, deformed, ugly, bad anatomy"
image = pipeline(prompt=prompt, image=init_image, negative_prompt=negative_prompt, strength=0.7).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

左がオリジナル、右が生成結果。

プロンプトを変えてみます。

prompt = "two dogs"
negative_prompt = "bad, deformed, ugly, bad anatomy"
image = pipeline(prompt=prompt, image=init_image, negative_prompt=negative_prompt, strength=0.7).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

パラメーターやネガティブプロンプトも調整してみます。

prompt = "two dogs, cartoon"
negative_prompt = "bad, deformed, ugly, bad anatomy, realistic"
image = pipeline(prompt=prompt, image=init_image, negative_prompt=negative_prompt, strength=1.0).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

面白い。

はじめてのDatabricks

Databricks無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up