実装例：DreamBoothによるstable diffusion再学習を速攻で試す流れ

Last updated at 2025-01-01Posted at 2025-01-01

はじめに

DreaBoothはgoogleらがCVPR2023で発表したdiffusion model系の再学習のしくみ。

[0] N. Ruiz, et. al. "DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation" CVPR2023.

再学習させるデータに過適合するのを防ぎ、また構文の意味を損なわないように配慮して再学習させる。

dreamboothのロジックは以下にざっくりまとめまています
https://qiita.com/masataka46/items/134e08af7fcb4d7a726c

既に多くの解説が出回っているので、ここではハマりどころを重点的にまとめる。

1. 環境構築

以下の２つを試した。

自分のRTX3090パソコンにDockerで環境構築
colabで環境構築

colab系は既に多くの資料があるので、ここでは前者を記述。

1.1 Ubuntuに基本的なものを入れる

Nvidia driver
GPU対応のDocker
その他、pycharmやvs-code、iterm など
ここでは省略。

1.2 Dockerを立ち上げる

例えば以下のようなDockerfileを用いる。

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04

ENV TZ=Asia/Tokyo
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime
RUN echo $TZ > /etc/timezone

RUN apt-get update
RUN apt-get install -y apt-file
RUN apt-file update

RUN apt-get install -y git
RUN apt-get install -y wget
RUN apt-get install -y curl
RUN apt-get install -y vim

# python
RUN apt-get install -y python3-pip

# alias
RUN echo alias python="python3" >> /root/.bashrc
RUN echo alias pip="pip3" >> /root/.bashrc

# Install python packages
RUN pip install --upgrade pip
RUN pip install opencv-python
RUN pip install scipy
RUN pip install scikit-learn
RUN pip install poetry
# Install VIM
RUN apt-get update
RUN apt-get install -y apt-file
RUN apt-file update
RUN apt-get install -y vim

# insatll langchain, and else
RUN pip install langchain==0.1.14
RUN pip install faiss-gpu tiktoken sentence_transformers
RUN pip install wandb
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
RUN pip install flash_attn
RUN pip install openai==0.28.0
RUN pip install pandas==2.2.2
RUN pip install -U langchain-community
RUN pip install openpyxl==3.1.2
RUN pip install python-docx==1.1.2
RUN pip install pypdf==5.1.0
RUN pip install git+https://github.com/huggingface/diffusers
RUN pip install accelerate>=0.16.0
RUN pip install torchvision==0.20.1
RUN pip install ftfy
RUN pip install tensorboard
RUN pip install peft==0.7.0

WORKDIR /root/share
EXPOSE 8080/TCP

必要ないものも多分に含まれています。
diffusersもinstallしています。

# in the case to use cuda
IMAGE_REPOSITORY=dreambooth
IMAGE_TAG=latest
IMAGE_FULLNAME=${IMAGE_REPOSITORY}:${IMAGE_TAG}

# docker build if changed.
docker build -t ${IMAGE_FULLNAME} .

# allow display connection
xhost local:

# run container
docker run \
--interactive \
--tty \
--rm \
--mount=type=bind,src="$(pwd)",dst=/root/share \
--mount=type=bind,src=/etc/group,dst=/etc/group,readonly \
--mount=type=bind,src=/etc/passwd,dst=/etc/passwd,readonly \
--mount=type=bind,src=/tmp/.X11-unix,dst=/tmp/.X11-unix \
$( if [ -e $HOME/.Xauthority ]; then echo "--mount=type=bind,src=$HOME/.Xauthority,dst=/root/.Xauthority"; fi ) \
--gpus=all \
--env=QT_X11_NO_MITSHM=1 \
--env=DISPLAY=${DISPLAY} \
--env=NVIDIA_DRIVER_CAPABILITIES=all \
--net=host \
--publish=8080:8080 \
--publish=8080:8080/udp \
--name=${IMAGE_REPOSITORY}_${IMAGE_TAG}_$(date "+%Y_%m%d_%H%M%S") \
${IMAGE_FULLNAME}

このDockerRun.shとDockerfileを同じディレクトリ内に配置し、Dockerをbuild, runする。

# DockerRun.sh, Dockerfileがある階層にて
bash DockerRun.sh

ポイントは、GPUで演算させる場合、torch_xla をインストールしないこと。このパッケージはTPU等利用する場合に必要のようだが、GPUで演算させる場合はエラーとなる。

2. accelerateの設定をする

accelerate config

でGPU周りなどの設定をする。ここが一番のハマりどころ。いろいろ聞かれるが、設定を間違えると動作しない。

私の場合、

自分のlocalパソコン
GPUは１個

なので、

In which compute environment are you running?
Please select a choice using the arrow or number keys, and selecting with enter
 ➔  This machine
    AWS (Amazon SageMaker)

This machineでリターン。

Which type of machine are you using?
Please select a choice using the arrow or number keys, and selecting with enter
 ➔  No distributed training
    multi-CPU
    multi-XPU
    multi-GPU
    multi-NPU
    multi-MLU
    multi-MUSA
    TPU

私の場合、GPU１個なので、いちばん上の「No distributed training」でリターン。

Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)? [yes/NO]:

GPU使うので、NO。

Do you wish to optimize your script with torch dynamo?[yes/NO]:

よくわからないのでNO。

Do you want to use DeepSpeed? [yes/NO]:

よくわからないのでNO。

What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:

ここではGPUのIDを入力する。私の場合、
nvidia-smi で

nvidia-smi
Wed Jan  1 09:20:16 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        On  | 00000000:05:00.0  On |                  N/A |
|  0%   43C    P8              28W / 350W |    310MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

と表示されたので、GPU欄に記載の０を入力。

Would you like to enable numa efficiency? (Currently only supported on NVIDIA hardware). [yes/NO]:

よくわからないのでNO。

Do you wish to use mixed precision?
Please select a choice using the arrow or number keys, and selecting with enter
 ➔  no
    fp16
    bf16
    fp8

学習時のbit数のことか？float32で学習させる気なので最上部でリターン。
うまく設定できれば

accelerate configuration saved at /root/.cache/huggingface/accelerate/default_config.yaml

と表示される。

3. 学習させる画像を用意する

適当な画像を適当なディレクトリに配置する。私は特定の種類の犬画像を５種類用意した。

4. train_dreambooth.pyを持ってくる

下記リンクの diffusers/examples/dreambooth/train_dreambooth.py をコピペして適当な位置に配置。

5. outputのディレクトリを作成

（自動で作成してくれるのかもしれないが）outputのディレクトリを作成する。
例えば

mkdir ../output

6. 環境変数の設定

今回、diffusionのv1-4を使う。

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="PATH/TO/IMAGE_DIR"
export OUTPUT_DIR="PATH/TO/OUTPUT_DIR"

7. 学習開始

以下のようなコマンドで学習を開始。

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=400

instance_promptは適当なものに設定する。その他、目的に合わせて諸々ハイパラを変更する。

以下のようにVRAM 19GB近く使ってました。減らすには上記の量子化すればよい？

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        On  | 00000000:05:00.0  On |                  N/A |
| 43%   61C    P2             303W / 350W |  18953MiB / 24576MiB |     92%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      5384      G   /usr/lib/xorg/Xorg                          168MiB |
|    0   N/A  N/A      5517      G   /usr/bin/gnome-shell                         75MiB |
|    0   N/A  N/A      6653      G   ...2509800,12662543674627072428,262144       55MiB |
|    0   N/A  N/A    762978      C   /usr/bin/python3                          18638MiB |
+---------------------------------------------------------------------------------------+

10分程度で学習は終了する。

outputしたディレクトリに

feature_extractor  logs  model_index.json  safety_checker  scheduler  text_encoder  tokenizer  unet  vae

のように諸々保存される。

8. 学習したモデルを用いて推論

今回、以下のリンクにある diffusers/src/diffusers/pipelines/のREADME.mdに従って推論させる。

以下の

# make sure you're logged in with `huggingface-cli login`
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

pipe = StableDiffusionPipeline.from_pretrained("../output/")
pipe = pipe.to("cuda")

prompt = "a photo of sks dog"
image = pipe(prompt).images[0]

image.save("tmp_dog.png")

を記述した infer.pyを作成する。

冒頭pipeのpretrain読み込む部分は学習時に OUTPUT_DIR で指定したディレクトリを指定
promptは学習時に指定したpromptを指定
保存する画像名はテキトーに指定

python infer.py

で推論させる。それっぽい画像が生成される。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up