More than 1 year has passed since last update.

Stable DiffusionをノートPCで持ち歩きたい！

Last updated at 2022-09-26Posted at 2022-09-18

話題のお絵描きAI、Stable Diffusionを
ネット環境が全くない状態で動作させる方法を試みました。

使用したPCは
OS:Windows10 Home 64bit
CPU:Intel(R) Core(TM) i7-10750H
GPU:NVIDIA GeForce GTX 1650 Ti
RAM:16GB
上記スペックではいろいろな要因で動作しないようでした。
※動作は遅いですがCPUのみでも動きます。9/20追記

ついでに以下を目標とします。
●　StableDiffusionPipelineを使わない。
　　GTX 1650 Tiなどではパラメータ　precision="full"の設定しないといけないとありますが、
　　設定の仕方が分かりません。また、Pipelineを使用するとネット環境が必要のようです。
●　使い慣れた　Jupyter Notebook　で実行したい。
　　環境作成が終了してoptimizedSD内のoptimized_txt2imgを　jupyter Notebook　内から

!optimized _txt2img prompts=prompt H=512 W=512 seed=22 scale=7.5 dim_steps=50 precision='full'

　　のように　Jupyter　Notebook　で実行することも可能ですが、変更したい点が多くありました。

おおまかな手順は

えっつさんの Stable Diffusion をローカル環境で動かしたかった

をお借りしますが、説明中の一部を以下のように差し替えます。

「stable diffusion をフォーク」では

CompVis/stable-diffusion　の代わりに
basujindal/stable-diffusion
を使用し、stable-diffusion-main　フォルダを入手します。

「学習モデルの取得」

https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
より　sd-v1-4.ckpt　を入手します。

「実行環境作成」

では　environment.yaml　以下のように変更して使います。

name: ldm
channels:
 - pytorch
 - defaults
dependencies:
+ - git
 - python=3.8.5
 - pip=20.3
 - cudatoolkit=11.3
 - pytorch=1.11.0
 - torchvision=0.12.0
 - numpy=1.19.2
 - pip:
   - albumentations==0.4.3
   - diffusers
   - opencv-python==4.1.2.30
   - pudb==2019.2
   - invisible-watermark
   - imageio==2.9.0
   - imageio-ffmpeg==0.4.2
-    - pytorch-lightning==1.4.2
+    - pytorch-lightning==1.5.0
   - omegaconf==2.1.1
   - test-tube>=0.7.5
   - streamlit>=0.73.1
   - einops==0.3.0
   - torch-fidelity==0.3.0
   - transformers==4.19.2
   - torchmetrics==0.6.0
   - kornia==0.6
   - -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
   - -e git+https://github.com/openai/CLIP.git@main#egg=clip
   - -e .

これで
●　Anacondaのインストール
●　モデルファイルフォルダ　stable-diffusion-main　を入手
●　パラメータファイル　sd-v1-4.ckpt　を入手（　model.ckpt　にリネームします）。
できました。
その後、
ディレクトリの位置は以下のようにしました。
jupyter notebook　実行時の作業フォルダーが　Python　になるので、
同じフォルダにダウンロードした　stable-diffusion-main　全部を移動します。、
stable-diffusion-main　model.ckpt　も以下のフォルダに配置しました。

C:\Users
 +---XXXX
   +---Documents
     +---Source
       +---Python
+      +---stable-diffusion-main
         +---environment.yaml
         +---models
           +---ldm
             +---stable-diffusion-v1
+               +----model.ckpt

「jupyter notebook　のインストール」

ANACONDA.NAVIGATOR　から　jupyter notebook　をインストールします。
スタートメニューから　Anacnda3（64ビット）→　Anaconda Navigater（Anaconda3）を起動。

Applications on　横のプルダウンメニューに　ldm　があるので選択します。
下のリスト内に　jupyter Notebook　が表示され、ボタンが　Install となっているのでクリックして
インストールします。
（上図ではインストール後のため、　Launch　となっています）
インストール後はスタートメニューから　Anacnda3（64ビット）→　Jupyter Notebook（ldm）が作成され、
起動できます。

「jupyter notebook　初期ディレクトリを変更」

mitamaさんの　Jupyter Notebookの初期ディレクトリを変更する

をお借りします。

前述のフォルダ構成から以下の変更を行いないました。

jupyter_notebook_config.py

c.NotebookApp.notebook_dir = 'C:/Users/XXXX/Documents/Source/Python'

「Notebookを作成する」

一旦すべての　Window　を閉じ、スタートメニューから　Anacnda3（64ビット）→　Jupyter Notebook（ldm）
を起動します。
初めにコマンドプロンプトが起動し、既定のブラウザが開きます。

ダークモードを使用しているので見かけが違いますが気にしないでください。
右上の　New　プルダウンから　Python 3(ipykernel)　を選択すると、空のNotebookが作成されます。
名前はUntitledになりますが、保存後に　Rename　できます。

jupyter notebookの操作については

oyan29さんのJupyter Notebookの使い方(初心者向け)

をお借りします。

「環境の確認」

Notebookの初めのセルで以下のコードを入力し、動作を確認します。

import sys
print("Python = "+sys.version)
import numpy as np
print("Numpy = "+np.__version__)
!nvcc --version
import torch
print("Pytorch = "+torch.__version__)
print("Pytorch GPU =",torch.cuda.is_available())
print("Pytorch GPU Name = "+torch.cuda.get_device_name())
import matplotlib
print("Matplotlib = "+matplotlib.__version__)
!pwd

結果は

となりました。　Pytorch Cuda が動作し、 GPU　が認識されていることがわかります。
※CPUで動かしたい方、GPUがない場合エラーが出ますが一旦無視してください。9/20追記

「optimized_txt2imgを関数化する」

非力なリソースのため、一枚の画像生成に時間がかかるため、不要なループ処理を削除。
出力もファイルではなく関数の出力とする。
外部ファイルの関数を集めて一つのセルにまとめる。
学習済みモデルを準備し、以降のセルでの実行時に繰り返し使えるようにする。
（重みの読み込み時間を１回にまとめる）
２番目のセルに以下のコードを入力し、動作を確認します。

import argparse, os, re
import torch
import numpy as np
from random import randint
from omegaconf import OmegaConf
from PIL import Image
from tqdm.auto import tqdm, trange
from itertools import islice
from einops import rearrange
from torchvision.utils import make_grid
import time
from pytorch_lightning import seed_everything
from torch import autocast
from contextlib import contextmanager, nullcontext
import importlib

def instantiate_from_config(config):
    if not "target" in config:
        if config == '__is_first_stage__':
            return None
        elif config == "__is_unconditional__":
            return None
        raise KeyError("Expected key `target` to instantiate.")
    return get_obj_from_str(config["target"])(**config.get("params", dict()))

def get_obj_from_str(string, reload=False):
    module, cls = string.rsplit(".", 1)
    if reload:
        module_imp = importlib.import_module(module)
        importlib.reload(module_imp)
    return getattr(importlib.import_module(module, package=None), cls)

def split_weighted_subprompts(text):
    remaining = len(text)
    prompts = []
    weights = []
    while remaining > 0:
        if ":" in text:
            idx = text.index(":") 
            prompt = text[:idx]
            remaining -= idx
            text = text[idx+1:]
            if " " in text:
                idx = text.index(" ")
            else: 
                idx = len(text)
            if idx != 0:
                try:
                    weight = float(text[:idx])
                except:
                    print(f"Warning: '{text[:idx]}' is not a value, are you missing a space?")
                    weight = 1.0
            else:
                weight = 1.0
            remaining -= idx
            text = text[idx+1:]
            prompts.append(prompt)
            weights.append(weight)
        else:
            if len(text) > 0:
                prompts.append(text)
                weights.append(1.0)
            remaining = 0
    return prompts, weights

def load_model_from_config(ckpt, verbose=False):
    print(f"Loading model from {ckpt}")
    pl_sd = torch.load(ckpt, map_location="cpu")
    if "global_step" in pl_sd:
        print(f"Global Step: {pl_sd['global_step']}")
    sd = pl_sd["state_dict"]
    return sd

config = "../stable-diffusion-main/optimizedSD/v1-inference.yaml"
ckpt = "../stable-diffusion-main/models/ldm/stable-diffusion-v1/model.ckpt"

sd = load_model_from_config(f"{ckpt}")

li, lo = [], []
for key, value in sd.items():
    sp = key.split(".")
    if (sp[0]) == "model":
        if "input_blocks" in sp:
            li.append(key)
        elif "middle_block" in sp:
            li.append(key)
        elif "time_embed" in sp:
            li.append(key)
        else:
            lo.append(key)
for key in li:
    sd["model1." + key[6:]] = sd.pop(key)
for key in lo:
    sd["model2." + key[6:]] = sd.pop(key)

config = OmegaConf.load(f"{config}")

model = instantiate_from_config(config.modelUNet)
_, _ = model.load_state_dict(sd, strict=False)

modelCS = instantiate_from_config(config.modelCondStage)
_, _ = modelCS.load_state_dict(sd, strict=False)

modelFS = instantiate_from_config(config.modelFirstStage)
_, _ = modelFS.load_state_dict(sd, strict=False)

del sd

def txt2img(prompts="",H=512,W=512,C=4,f=8,dim_steps=50,fixed_code=50,ddim_eta=0.0,
            n_rows=0,scale=7.5,device='cuda',seed=None,unet_bs=1,precision='full',format_type='png',sampler='plms'):
    
    tic = time.time()
    if seed == None:
        seed = randint(0, 1000000)
        seed_everything(seed)

    model.eval()
    model.unet_bs = unet_bs
    model.cdevice = device
    model.turbo = False

    modelCS.eval()
    modelCS.cond_stage_model.device = device

    modelFS.eval()

    if device != "cpu" and precision == "autocast":
        model.half()
        modelCS.half()

    start_code = None
    if fixed_code:
        start_code = torch.randn([1,C,H // f, W // f], device=device)

    n_rows = n_rows if n_rows > 0 else 1

    if precision == "autocast" and device != "cpu":
        precision_scope = autocast
    else:
        precision_scope = nullcontext

    with torch.no_grad():

        all_samples = list()
        with precision_scope("cuda"):
            modelCS.to(device)
            uc = None
            if scale != 1.0:
                uc = modelCS.get_learned_conditioning([""])
                subprompts, weights = split_weighted_subprompts(prompts)
                if len(subprompts) > 1:
                    c = torch.zeros_like(uc)
                    totalWeight = sum(weights)
                    for i in range(len(subprompts)):
                        weight = weights[i]
                        # if not skip_normalize:
                        weight = weight / totalWeight
                        c = torch.add(c, modelCS.get_learned_conditioning(subprompts[i]), alpha=weight)
                else:
                    c = modelCS.get_learned_conditioning(prompts)

                shape = [1, C, H // f, W // f]

                if device != "cpu":
                    mem = torch.cuda.memory_allocated() / 1e6
                    modelCS.to("cpu")
                    while torch.cuda.memory_allocated() / 1e6 >= mem:
                         time.sleep(1)

                samples_ddim = model.sample(
                    S=dim_steps,
                    conditioning=c,
                    seed=seed,
                    shape=shape,
                    verbose=False,
                    unconditional_guidance_scale=scale,
                    unconditional_conditioning=uc,
                    eta=ddim_eta,
                    x_T=start_code,
                    sampler = sampler,
                )

                modelFS.to(device)

                x_samples_ddim = modelFS.decode_first_stage(samples_ddim[0].unsqueeze(0))
                x_sample = torch.clamp((x_samples_ddim + 1.0) / 2.0, min=0.0, max=1.0)
                x_sample = 255.0 * rearrange(x_sample[0].cpu().numpy(), "c h w -> h w c")
                image=x_sample.astype(np.uint8)
                
            if device != "cpu":
                mem = torch.cuda.memory_allocated() / 1e6
                modelFS.to("cpu")
                while torch.cuda.memory_allocated() / 1e6 >= mem:
                    time.sleep(1)
            del samples_ddim
            print("memory_final = ", torch.cuda.memory_allocated() / 1e6)

    toc = time.time()
    time_taken = (toc - tic) / 60.0

    print(("Samples finished in {0:.2f} minutes " + prompts + "\nSeeds used = {1:}").format(time_taken,seed))
    return image 

print('txt2img defined !!')

中盤の

config = "../stable-diffusion-main/optimizedSD/v1-inference.yaml"
ckpt = "../stable-diffusion-main/models/ldm/stable-diffusion-v1/model.ckpt"

は構成したフォルダの配置で変わってきますので、適当に修正してください。
実行結果は以下のようになりました。

使われていないパラメータがある？などのメッセージが出ますが、問題ありませんでした。
最終的に　txt2img defined !!　が表示されて、処理が完了します。

「txt2imgを使ってみる」

関数の定義は

def txt2img(prompts="",H=512,W=512,C=4,f=8,dim_steps=50,fixed_code=50,ddim_eta=0.0,n_rows=0,scale=7.5,device='cuda',seed=None,unet_bs=1,precision='full',format_type='png',sampler='plms'):

となっているため、指定しなければデフォルト値が使われます。
３番目のセルに以下のコードを入力し、動作を確認します。

import matplotlib.pyplot as plt
%matplotlib inline

prompt="A digital Illustration of the Babel tower, 4k, detailed, trending in artstation, fantasy vivid colors, 8k"

image=txt2img(prompts=prompt,H=512,W=512,seed=2,scale=7.5,dim_steps=50,precision='full')

plt.figure(figsize=(8, 8), dpi=120)
plt.axis('off')
plt.imshow(image)

実行結果は以下のようになりました。

50ステップで4分10秒という結果でした。

※CPUで動かしたい方、

image=txt2img(prompts=prompt,H=512,W=512,seed=2,scale=7.5,dim_steps=50,precision='full',device='cpu')

のように　device='cpu' を追加してください。9/20追記

以降のセルには

prompt="Some text"

image=txt2img(prompts=prompt,H=512,W=512,seed=2,scale=7.5,dim_steps=50,precision='full')

plt.figure(figsize=(8, 8), dpi=120)
plt.axis('off')
plt.imshow(image)

で実行できます。
（もちろん他のパラメータも指定できます）

「まとめ」

ノートPC上の　jupyter notebook　で　Stable Diffusion　を動かすことができました。
ちなみにノートPCを機内モードにしても問題なく動作を確認できました。
完全にローカル化できましたので、話題によく上がるNSFWフィルタも無効化されています。

同様な仕組みを　Google Colaboratory　で実装してみましたが、
強力な　GPU　と　RAM　で最適化されているためか、　StableDiffusionPipeline　の方が高速に動作
しました。

手間がかかる手順は他の方のリンクにお任せ記事になってしまいましたが。
最後までお付き合いいただきましてありがとうございました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up