More than 1 year has passed since last update.

【手順覚書】チキチキ限界Stable Diffusion 〜M1 MBA (8GB RAM) 編〜

Last updated at 2022-08-30Posted at 2022-08-30

はじめに

今話題のStable Diffusionを触ってみた。
あちこちドツボにハマったのでうまく行った手順を残しておく。

先駆者様¹の記事の丸パクリスペクト

環境

M1 MBA
macOS Monterey 12.4
pyenv 2.3.1-20-g572a8bcf
python 3.10.5
pip 22.0.4

手順

必要（らしい）パッケージ

brew cmake rust

rustが入ってる人はもちろんいらない。

レポジトリのクローン

Stable Diffusion本体

git clone https://github.com/CompVis/stable-diffusion

Weightモデル

brew install git-lfs
git install lfs
git clone https://huggingface.co/CompVis/stable-diffusion-v-1-4-original

https://huggingface.coでユーザー登録が必要。
hpyenv 2.3.1-20-g572a8bcfttps://huggingface.co/CompVisでstable-diffusion-v-1-*-originalとなっているものを選択し、Access repositoryをクリック。

cd stable-diffusion
mkdir -p models/ldm/stable-diffusion-v1
cp relative/path/to/sd-v1-4.ckpt models/ldm/stable-diffusion-v1/model.ckpt

コピーじゃなくてシンボリックリンクを貼るでもいいはず（そっちやっている人のほうが多い）。
git-lfsがうまく入っていなかったのか、シンボリックリンクを貼るとNot foundと怒られるのでコピーしている。

python環境構築

pyenv 3.10.5 StableDiffusion
pyenv local 3.10.5/envs/StableDiffusion

stable-diffisionディレクトリ用にpython 3.10.5のStableDiffusionと名付けた環境を使う。

pip install torch torchvision torchaudio albumentations opencv-python pudb
pip install invisible-watermark imageio imageio-ffmpeg pytorch-lightning omegaconf
pip install test-tube streamlit einops torch-fidelity transformers torchmetrics
pip install kornia certifi filelock diffusers
pip install -e "git+https://github.com/CompVis/taming-transformers.git@main#egg=taming-transformers"
pip install -e "git+https://github.com/openai/CLIP.git@main#egg=clip"

必要なパッケージをインストールする。
imWatermarkというパッケージもあるが、別物。

ファイルの編集

scripts/txt2img.py

@@ -60,7 +60,8 @@ def load_model_from_config(config, ckpt, verbose=False):
          print("unexpected keys:")
          print(u)
 
-     model.cuda()
+     # model.cuda()
+     model.to("mps")
      model.eval()
      return model
 
@@ -239,7 +240,8 @@ def main():
      config = OmegaConf.load(f"{opt.config}")
      model = load_model_from_config(config, f"{opt.ckpt}")

-     device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
+     # device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
+     device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
      model = model.to(device)

      if opt.plms:
@@ -279,7 +281,8 @@ def main():

      precision_scope = autocast if opt.precision=="autocast" else nullcontext
      with torch.no_grad():
-         with precision_scope("cuda"):
+         # with precision_scope("cuda"):
+         with nullcontext("mps"):
              with model.ema_scope():
                  tic = time.time()
                  all_samples = list()

ldm/models/diffusion/plms.py

@@ -17,8 +17,10 @@ class PLMSSampler(object):

      def register_buffer(self, name, attr):
          if type(attr) == torch.Tensor:
-             if attr.device != torch.device("cuda"):
-                 attr = attr.to(torch.device("cuda"))
+             # if attr.device != torch.device("cuda"):
+             #     attr = attr.to(torch.device("cuda"))
+             if attr.device != torch.device("mps"):
+                 attr = attr.to(torch.float32).to(torch.device("mps")).contiguous()
          setattr(self, name, attr)

      def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddim_eta=0., verbose=True):

configs/stable-diffusion/v1-inference.yaml

@@ -68,3 +68,5 @@ model:
 
     cond_stage_config:
       target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
+      params:
+        device: mps

~/.anyenv/envs/pyenv/versions/3.10.5/envs/StableDiffusion/lib/python3.10/site-packages/torch/nn/functional.py

@@ -2503,7 +2503,7 @@
-     return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
+     return torch.layer_norm(input.contiguous(), normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)

functional.pyのパスは環境に合わせて確認。
書いてあるのは、pyenvで3.10.5のpythonを使っているとき。

実行

これ

PYTORCH_ENABLE_MPS_FALLBACK=1 python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --n_samples 1

--samplesを減らすと速くなるらしい（ほんまか？）。

結果

実行時間は30分くらい。
↓生成画像

おわりに

RAM 8GBはもはや最低以下のスペック
- 次買うときは64GBくらいは欲しい
Apple SiliconのGPUもNeural Engineも使えていない？
- RAM 16GBのM2 MBAで3.5分ほどで実行できたという記事¹があったので何かの設定がおかしいかもしれない

https://zenn.dev/ryoma310/articles/63bc3d20a8746c#%E4%BD%99%E8%AB%87 ↩ ↩²

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up