DiffusersライブラリでStable Diffusionの画像生成

Last updated at 2024-01-10Posted at 2024-01-06

1. はじめに

Windows 11で確認。
まだ手探り状態。

2. CUDAインストール

NVIDIAのDeveloperのIDを無料作成して、CUDA Toolkit 12.3 Update 2 をインストールしたけれども、Stable Diffusion web UI が 12.1をインストールしている？と思うので、そっちが使われているっぽい。

3. ライブラリインストール

pip install diffusers transformers accelerate omegaconf pytorch_lightning xformers

使用バージョン：
・Python 3.11.7
・PyTorch 2.1.2+cu121

バージョン確認方法：
python --version
pip list | findstr torch

4. Hugging Faceから初回だけモデルをダウンロードして画像生成するコード

DDIMScheduler の steps_offset はデフォルトの0のままだとoutdated(時代遅れ)なので将来のためにとかなんとかで1を指定するように言われる。

import torch
from diffusers import StableDiffusionPipeline, DDIMScheduler

model_id = 'hakurei/waifu-diffusion'
seed = 46
device = 'cuda'
prompt = 'wdgoodprompt, (new, newest, best quality, extremely detailed, high resolution, anime:1.2), 1 girl, 14 years old, blue eyes, long black hair, shy, embarrassed, open mouth, bedroom, bed, furniture, closed curtain'
negative_prompt = "((((mutated hands and fingers)))), deformed, blurry, bad anatomy, long neck, long_neck, long body, long_body, deformed mutated disfigured, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, ugly face, poorly drawn hands, poorly_drawn_hands, missing limb, missing_limb, blurry, floating limbs, floating_limbs, disconnected limbs, disconnected_limbs, malformed hands, malformed_hands, blur, out of focus, text, title, flat color, flat shading, bad fingers, liquid fingers, poorly drawn fingers, bad anatomy, missing fingers, signature, watermark, username, artist name, missing legs, extra legs, extra_legs, bad hands, mutated hands, missing arms, extra_arms, bad proportions, extra fingers, extra_fingers, extra digit, fewer digits, huge breasts, animal ears, hair ornaments, white glove, purple nipples, violet nipples, garter belt, deformed eyes, partial face, partial head, bad face, inaccurate limb, cropped, single leg, single arm"

scheduler = DDIMScheduler(
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule='scaled_linear',
    clip_sample=False,
    set_alpha_to_one=False,
    steps_offset=1
)

pipeline = StableDiffusionPipeline.from_pretrained(
    model_id,
    scheduler=scheduler,
    use_auth_token=True
).to('cuda')

generator = torch.Generator(device).manual_seed(seed)
image = pipeline(prompt, 512, 512, num_inference_steps=20, guidance_scale=7.5, generator=generator, negative_prompt=negative_prompt).images[0]

image.save('C:/stable-diffusion/Diffusers/output/test1.png')

初回実行時は Hugging Face からモデルがキャッシュディレクトリにダウンロードされる。

5. Diffusers対応モデル一覧

上記コードで指定している hakurei/waifu-diffusion が存在する。

Models - Hugging Face
https://huggingface.co/models?other=stable-diffusion

6. Diffusers未対応のモデルの使用

ローカルにモデルのファイルを置いて使用する。
ついでにVAEも使用する。
web UIのディレクトリ内のファイルを指定している。(web UI自体は使用していない)
ついでにNSFWを使えるようにするために load_safety_checker を False に変更。警告メッセージが出る。safety_checkerの関数置換だと、AOM3A1_orangemixs では pipeline を作る時に not has_nsfw for has_nsfw in has_nsfw_concept と出て駄目っぽい。

import torch
from diffusers import DDIMScheduler
from diffusers.models import AutoencoderKL
from diffusers.pipelines.stable_diffusion.convert_from_ckpt import download_from_original_stable_diffusion_ckpt

model_path = 'C:/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/AOM3A1_orangemixs.safetensors'
vae_path = 'C:/stable-diffusion/stable-diffusion-webui/models/VAE/wd15-beta1-fp32.safetensors'
prompt = 'wdgoodprompt, (new, newest, best quality, extremely detailed, high resolution, anime:1.2), 1 girl, 14 years old, blue eyes, long black hair, shy, embarrassed, open mouth, bedroom, bed, furniture, closed curtain'
negative_prompt = "((((mutated hands and fingers)))), deformed, blurry, bad anatomy, long neck, long_neck, long body, long_body, deformed mutated disfigured, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, ugly face, poorly drawn hands, poorly_drawn_hands, missing limb, missing_limb, blurry, floating limbs, floating_limbs, disconnected limbs, disconnected_limbs, malformed hands, malformed_hands, blur, out of focus, text, title, flat color, flat shading, bad fingers, liquid fingers, poorly drawn fingers, bad anatomy, missing fingers, signature, watermark, username, artist name, missing legs, extra legs, extra_legs, bad hands, mutated hands, missing arms, extra_arms, bad proportions, extra fingers, extra_fingers, extra digit, fewer digits, huge breasts, animal ears, hair ornaments, white glove, purple nipples, violet nipples, garter belt, deformed eyes, partial face, partial head, bad face, inaccurate limb, cropped, single leg, single arm"
steps = 20
seed = 46
device = 'cuda'

vae = AutoencoderKL.from_single_file(vae_path)

scheduler = DDIMScheduler(
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule='scaled_linear',
    clip_sample=False,
    set_alpha_to_one=False,
    steps_offset=1
)

pipeline = download_from_original_stable_diffusion_ckpt(
    checkpoint_path_or_dict=model_path,
    from_safetensors=True,
    vae=vae,
    local_files_only=True,
    device=device,
    load_safety_checker=False
)
pipeline.scheduler = scheduler
pipeline.to(device)

generator = torch.Generator(device).manual_seed(seed)
image = pipeline(
    prompt=prompt,
    width=512,
    height=512,
    num_inference_steps=steps,
    guidance_scale=7.5,
    generator=generator,
    negative_prompt=negative_prompt
).images[0]

image.save('C:/stable-diffusion/Diffusers/output/test1.png')

7. ランダム生成

seed()でランダムなseedを設定する。

if seed == -1:
    generator = torch.Generator(device)
    generator.seed()
else:
    generator = torch.Generator(device).manual_seed(seed)
print(f'initial_seed={generator.initial_seed()}')

コード全体

import torch
from diffusers import DDIMScheduler
from diffusers.models import AutoencoderKL
from diffusers.pipelines.stable_diffusion.convert_from_ckpt import download_from_original_stable_diffusion_ckpt

model_path = 'C:/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/AOM3A1_orangemixs.safetensors'
vae_path = 'C:/stable-diffusion/stable-diffusion-webui/models/VAE/wd15-beta1-fp32.safetensors'
prompt = 'NSFW, wdgoodprompt, (new, newest, best quality, extremely detailed, high resolution, anime:1.2), 1 girl, 14 years old, blue eyes, long black hair, shy, embarrassed, open mouth, bedroom, bed, furniture, closed curtain'
negative_prompt = "((((mutated hands and fingers)))), deformed, blurry, bad anatomy, long neck, long_neck, long body, long_body, deformed mutated disfigured, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, ugly face, poorly drawn hands, poorly_drawn_hands, missing limb, missing_limb, blurry, floating limbs, floating_limbs, disconnected limbs, disconnected_limbs, malformed hands, malformed_hands, blur, out of focus, text, title, flat color, flat shading, bad fingers, liquid fingers, poorly drawn fingers, bad anatomy, missing fingers, signature, watermark, username, artist name, missing legs, extra legs, extra_legs, bad hands, mutated hands, missing arms, extra_arms, bad proportions, extra fingers, extra_fingers, extra digit, fewer digits, huge breasts, animal ears, hair ornaments, white glove, purple nipples, violet nipples, garter belt, deformed eyes, partial face, partial head, bad face, inaccurate limb, cropped, single leg, single arm"
steps = 20
seed = -1
device = 'cuda'

vae = AutoencoderKL.from_single_file(vae_path)

scheduler = DDIMScheduler(
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule='scaled_linear',
    clip_sample=False,
    set_alpha_to_one=False,
    steps_offset=1
)

pipeline = download_from_original_stable_diffusion_ckpt(
    checkpoint_path_or_dict=model_path,
    from_safetensors=True,
    vae=vae,
    local_files_only=True,
    device=device,
    load_safety_checker=False
)
pipeline.scheduler = scheduler
pipeline.to(device)

if seed == -1:
    generator = torch.Generator(device)
    generator.seed()
else:
    generator = torch.Generator(device).manual_seed(seed)
print(f'initial_seed={generator.initial_seed()}')

image = pipeline(
    prompt=prompt,
    width=512,
    height=512,
    num_inference_steps=steps,
    guidance_scale=7.5,
    generator=generator,
    negative_prompt=negative_prompt
).images[0]

image.save('C:/stable-diffusion/Diffusers/output/test1.png')

8. embedding使用、xFormers有効化、日時ディレクトリ作成、指定回数ループ＆連番ファイルに保存

webui_path           = 'C:/stable-diffusion/stable-diffusion-webui/'

embeddings_dir       = os.path.join(webui_path, 'embeddings/')

embedding_files = [
    ('EasyNegativeV2.safetensors', 'EasyNegativeV2'   ),
    ('negative_hand-neg.pt'      , 'negative_hand-neg')
]

negative_prompt = 'EasyNegativeV2, negative_hand-neg, worst quality, (省略)'

for embedding in embedding_files:
    pipeline.load_textual_inversion(
        pretrained_model_name_or_path=os.path.join(embeddings_dir, embedding[0]),
        token=embedding[1],
        local_files_only=True
)

xFormersを使用する際に、WindowsだとTritonモジュールがないため必ず下記のメッセージが出るらしい。

A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'

pipeline.enable_xformers_memory_efficient_attention()

コード全体

import datetime
import os
import torch
from diffusers import DDIMScheduler
from diffusers.models import AutoencoderKL
from diffusers.pipelines.stable_diffusion.convert_from_ckpt import download_from_original_stable_diffusion_ckpt

my_path              = 'C:/stable-diffusion/Diffusers/'
webui_path           = 'C:/stable-diffusion/stable-diffusion-webui/'

outputs_txt2img_path = os.path.join(my_path, 'outputs/txt2img-images/')

model_path           = os.path.join(webui_path, 'models/Stable-diffusion/AOM3A1_orangemixs.safetensors')
vae_path             = os.path.join(webui_path, 'models/VAE/wd15-beta1-fp32.safetensors')
embeddings_dir       = os.path.join(webui_path, 'embeddings/')

embedding_files = [
    ('EasyNegativeV2.safetensors', 'EasyNegativeV2'   ),
    ('negative_hand-neg.pt'      , 'negative_hand-neg')
]

prompt = '8k, 1 girl, 14 years old, blue eyes, long black hair, large breasts, black (school swimsuit:1.2), black elbow length long gloves, black thigh high socks, shy, embarrassed, bedroom, furniture, curtain'
negative_prompt = 'EasyNegativeV2, negative_hand-neg, worst quality, low quality, poor quality, deformed, blurry, disfigured, ugly, mutation, mutated, blur, flat shading, bad anatomy, long body, bad proportions, bad anatomy, poorly drawn face, long neck, missing limb, extra limb, missing legs, extra legs, missing arms, extra arms, poorly drawn hands, malformed hands, bad hands, mutated hands, bad fingers, missing fingers, extra digit, hair ornaments, ribbons, animal ears, text, name tag, choker, tights, see through thigh high socks, open clothes, white school swimsuit'

steps = 20
seed = -1
loop_count = 5
device = 'cuda'

vae = AutoencoderKL.from_single_file(vae_path)

scheduler = DDIMScheduler(
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule='scaled_linear',
    clip_sample=False,
    set_alpha_to_one=False,
    steps_offset=1
)

pipeline = download_from_original_stable_diffusion_ckpt(
    checkpoint_path_or_dict=model_path,
    from_safetensors=True,
    vae=vae,
    local_files_only=True,
    device=device,
    load_safety_checker=False
)
for embedding in embedding_files:
    pipeline.load_textual_inversion(
        pretrained_model_name_or_path=os.path.join(embeddings_dir, embedding[0]),
        token=embedding[1],
        local_files_only=True
)
pipeline.scheduler = scheduler
pipeline.enable_xformers_memory_efficient_attention()
pipeline.to(device)

if seed == -1:
    generator = torch.Generator(device)
    generator.seed()
else:
    generator = torch.Generator(device).manual_seed(seed)
print(f'initial_seed={generator.initial_seed()}')

now = datetime.datetime.now()
outputs_dir = os.path.join(outputs_txt2img_path, now.strftime('%Y%m%d%H%M%S'))
os.makedirs(outputs_dir)

for i in range(1, loop_count + 1):
    image = pipeline(
        prompt=prompt,
        width=512,
        height=512,
        num_inference_steps=steps,
        guidance_scale=7.5,
        generator=generator,
        negative_prompt=negative_prompt
    ).images[0]
    image_filename = os.path.join(outputs_dir, f'{i:08}.png')
    image.save(image_filename)

9. Diffusersのインストール先

pip show diffusers で確認すると、C:\Users(ユーザー名)\AppData\Local\Programs\Python\Python311\Lib\site-packages\ にインストールされている。

PS C:\stable-diffusion\Diffusers> pip show diffusers
Name: diffusers
Version: 0.25.0
Summary: State-of-the-art diffusion in PyTorch and JAX.
Home-page: https://github.com/huggingface/diffusers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/diffusers/graphs/contributors)
Author-email: patrick@huggingface.co
License: Apache 2.0 License
Location: C:\Users(ユーザー名)\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: filelock, huggingface-hub, importlib-metadata, numpy, Pillow, regex, requests, safetensors
Required-by:
PS C:\stable-diffusion\Diffusers>

diffusers\pipelines\stable_diffusion\convert_from_ckpt.py を見ると、download_from_original_stable_diffusion_ckpt() の引数説明が書かれている。

10. 参考文献

Stable Diffusion を Diffusersライブラリで実行する方法 - ガンマソフト
https://gammasoft.jp/blog/stable-diffusion-with-diffusers-library/

Stable Diffusion Pipelineまとめ（1）text2img | 鷹の目週末プログラマー
https://happy-shibusawake.com/sd-pipeline_txt2img/948/

Google Colab ではじめる Stable Diffusion 1.5｜npaka
https://note.com/npaka/n/n0c0b2388b893

diffusersで使える Stable Diffusionモデル一覧｜npaka
https://note.com/npaka/n/ned44e0242ac0

Stable Diffusionのモデルをローカルに保存 | 鷹の目週末プログラマー
https://happy-shibusawake.com/stable-diffusion_local/934/

Transformersの'from_pretrained'の使い方とリスクを考察
https://zenn.dev/yagiyuki/articles/load_pretrained

HuggingFace Diffusers 0.12 : 使用方法 : パイプライン, モデルとスケジューラのロード – Transformers, Diffusers | ClassCat® Chatbot
https://torch.classcat.com/2023/02/10/huggingface-diffusers-0-12-using-diffusers-loading/

CPU で動かす diffusers 入門 #Python - Qiita
https://qiita.com/7shi/items/b4da43f342f0fe3c189c

diffusersでスケジューラを読み込んで画像生成する方法 #AI - Qiita
https://qiita.com/Limitex/items/13a5593d425416a805ca

my notebook
Diffusers の Scheduler を試した
https://osima.jp/posts/sd-diffusers-schedulers/

diffusersでschedulerを変える方法
https://zenn.dev/kaibaoke/articles/art7_change_diffusers_scheduler

StableDiffusionのスケジューラを切り替える｜ワビスケ/ PBMB
https://note.com/wabisuke94/n/ncc1847296685

PyTorch - torch.Generator - PyTorch のジェネレーターとは何ですか?
https://runebook.dev/ja/docs/pytorch/generated/torch.generator#google_vignette

Stable Diffusionでネガティブプロンプト（Negative Prompt）を指定して画像を生成する方法 | Murasan Lab
https://murasan-net.com/index.php/2023/01/21/stable-diffusion-negative-prompt/

Stable Diffusion checkpointとDiffusersモデルの相互変換スクリプト（SD2.0対応）｜Kohya S.
https://note.com/kohya_ss/n/n374f316fe4ad

PyTorchでTensorとモデルのGPU / CPUを指定・切り替え | note.nkmk.me
https://note.nkmk.me/python-pytorch-device-to-cuda-cpu/

DiffusersでNSFW警告からの黒画像を無効にする方法 [Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed.] #StableDiffusion - Qiita
https://qiita.com/Limitex/items/10fc8b7f1285d6627fe3

diffusers 使い方関連まとめ
https://zenn.dev/shiro_toy_box/articles/6ddec00f781dff

Stable Diffusion（Diffusers）による画像生成の効率化と、基本的な使い方まとめ｜瑚太朗
https://note.com/yossymura/n/n64b421ffd927

【Stable Diffusion】VAEを変更して画質を上げる | ジコログ
https://self-development.info/%E3%80%90stable-diffusion%E3%80%91vae%E3%82%92%E5%A4%89%E6%9B%B4%E3%81%97%E3%81%A6%E7%94%BB%E8%B3%AA%E3%82%92%E4%B8%8A%E3%81%92%E3%82%8B/

pipでインストールしたパッケージの場所を調べる #Python - Qiita
https://qiita.com/t-fuku/items/83c721ed7107ffe5d8ff

Stable Diffusion checkpointとDiffusersモデルの相互変換スクリプト（SD2.0対応）｜Kohya S.
https://note.com/kohya_ss/n/n374f316fe4ad

【Stable Diffusion】diffusersでembeddingを使用する方法 - こすたろーんエンジニアの試行錯誤部屋
https://technoxs-stacker.hatenablog.com/entry/2023/12/07/000000#google_vignette

【DIffusers】Diffusersで「EasyNegative」を使ってみる - パソコン関連もろもろ
https://touch-sp.hatenablog.com/entry/2023/05/02/123053#google_vignette

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up