Stable Diffusion 3 Mediumを使ってみた

Last updated at 2024-07-11Posted at 2024-06-27

STable Diffusionとは

非常にわかりやすい記事があるので、そちらを参照

Stable Diffusion 3 Medium

Stable Diffusion 3は、マルチモーダル拡散トランスフォーマー(MMDiT)とRectified Flowを使用した新しい画像生成モデル。Stable Diffusion 3 Mediumは、その軽量かつ無料配布版。

Stable Diffusion 3 Medium
・medium 20億パラメーター
・出力画像はデフォルトで1024×1024px

ライセンス

学術研究などの非営利目的では自由に使用でき、商用利用は別の有料ライセンスがないとできないようです。

お客様は、本ソフトウェア製品または派生物を使用して、お客様がホストするサービスの一部として、またはお客様の API を介して、第三者が本ソフトウェア製品または派生物を使用できるようにすることはできません。

使い方

ライブラリをインストールする

pip install -U diffusers
pip install -U peft

Hugging Faceでアクセストークンを取得する

Hugging Faceにログインする
下記にアクセスする
https://huggingface.co/stabilityai/stable-diffusion-3-medium
このモデルへのアクセス権を申請する
設定->アクセストークンを開く
New tokenを押す
Edit PermissionsでReposの権限を付ける

あとはアクセストークンを使用してHugging Hubへアクセスして、対象モデルを指定して使用する。

from diffusers import StableDiffusion3Pipeline
import torch
import datetime

from huggingface_hub import login
login(token='アクセストークン', add_to_git_credential=True)

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = 'A digital Illustration of the Babel tower, 4k, detailed, trending in artstation, fantasy vivid colors, 8k'

image = pipe(
    prompt,
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=7.0,
).images[0]
image.save('stable-diffusion_3_medium.jpg')

生成画像の例

Claude 3 Sonnetにプロンプトを作ってもらって、Stable Diffusino 3 mediumに画像生成してもらった。生成時間はNVIDIA RTX A2000 12GBを使用して、だいたい5分ほどだった。

Stunning panoramic landscape photography, sweeping vistas, majestic cliffs, ancient forests, winding rivers, HDR, exquisite colors, lifelike

'Photorealistic landscape, 4K resolution, detailed, epic mountains, lush green valleys, crystal clear lakes, dramatic lighting, cinematic atmosphere'

Ultra-realistic nature scenery, 8K, highly detailed, snow-capped mountains, misty waterfalls, vibrant wildflowers, golden hour lighting, breathtaking view

Photorealistic portrait, young Japanese woman, delicate facial features, natural makeup, detailed eyes and lips, black silky hair, soft lighting

Photorealistic render of a delicious hamburger, 8K, highly detailed, ray-traced lighting, sharp focus, juicy beef patty, melted cheese, fresh lettuce, tomatoes, detailed sesame seed bun, mouthwatering

A photorealistic image of a dog breed that looks like a calico cat, with patches of orange, black and white fur, highly detailed fur texture, realistic subsurface scattering, 8K resolution, from a front angle, studio lighting

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up