More than 1 year has passed since last update.

[MLX] M1 MacでStable Diffusionしてみた

Last updated at 2024-01-01Posted at 2023-12-31

先日、Apple Silicon向けの機械学習フレームワークであるMLXが公開されました。そこで、MLX ExamplesにあるStable Diffusionを試してみます。

CLIでテキストから画像生成
GUIアプリケーション化

環境

Macbook Pro 14inch
- Apple M1 Pro
- Memory 32GB
- Sonoma 14.2
pyenv
- 2.3.35
python
- 3.11.7

インストール

今回はpyenvを用いた仮想環境で実行していきます。このインストール方法についても記載していますが、お済みの方はMLX Exampleのクローンまでスキップしてください。

pyenvのインストール

homebrewを使ってpyenvをインストールしていきます。

zsh

$ brew update
$ brew install pyenv

パスを通します。

zsh

$ eval "$(pyenv init --path)"

pyenvがインストールされたことの確認

zsh

$ pyenv -v
pyenv 2.3.35

python 3.11のインストール（pyenv）

インストール可能なバージョンの一覧を確認

zsh

$ pyenv install --list

今回は3.11の最新版である3.11.7をインストールします。

zsh

$ pyenv install 3.11.7

python 3.11.7に切り替え

zsh

$ pyenv global 3.11.7

念のため、バージョンを確認

zsh

$ python -V
Python 3.11.7

MLX Exampleのクローンを作成

zsh

$ git clone https://github.com/ml-explore/mlx-examples.git

stable_diffusionフォルダに移動

zsh

$ cd mlx-examples/stable_diffusion

venv環境をセットアップする

venv環境を作成します。今回はmlxという環境名にしますが、任意の環境名で構いません。

zsh

$ python -m venv mlx

仮想環境をアクティベート

zsh

$ source mlx/bin/activate

必要なライブラリをインストール

zsh

$ pip install --upgrade pip
$ pip install -r requirements.txt

CLIでテキストから画像生成

zsh

$ python txt2image.py "A photo of an astronaut riding a horse on Mars." --n_images 4 --n_rows 2

5分待つと画像がout.pngに出力されました。

mlx-examples/stable_diffusion/out.png

GUIアプリケーション化

Gradioを利用して、GUIアプリケーション化します。以下を参考にさせて頂きました。
MLX Stable Diffusion UI

ui.py作成

ui.pyを作成し、以下コピーします。

mlx-examples/stable_diffusion/ui.py

import gradio as gr
from PIL import Image
import numpy as np
import mlx.core as mx
from stable_diffusion import StableDiffusion

def generate_images(prompt, n_images=4, steps=50, cfg=7.5, negative_prompt="", n_rows=1):
    sd = StableDiffusion()

    # Generate the latent vectors using diffusion
    latents = sd.generate_latents(
        prompt,
        n_images=n_images,
        cfg_weight=cfg,
        num_steps=steps,
        negative_text=negative_prompt,
    )
    for x_t in latents:
        mx.simplify(x_t)
        mx.simplify(x_t)
        mx.eval(x_t)

    # Decode them into images
    decoded = []
    for i in range(0, n_images):
        decoded_img = sd.decode(x_t[i:i+1])
        mx.eval(decoded_img)
        decoded.append(decoded_img)

    # Arrange them on a grid
    x = mx.concatenate(decoded, axis=0)
    x = mx.pad(x, [(0, 0), (8, 8), (8, 8), (0, 0)])
    B, H, W, C = x.shape
    x = x.reshape(n_rows, B // n_rows, H, W, C).transpose(0, 2, 1, 3, 4)
    x = x.reshape(n_rows * H, B // n_rows * W, C)
    x = (x * 255).astype(mx.uint8)

    # Convert to PIL Image
    return Image.fromarray(x.__array__())

iface = gr.Interface(
    fn=generate_images,
    inputs=[
        gr.Textbox(label="Prompt"),
        gr.Slider(minimum=1, maximum=10, step=1, value=4, label="Number of Images"),
        gr.Slider(minimum=20, maximum=100, step=1, value=50, label="Steps"),
        gr.Slider(minimum=0.0, maximum=10.0, step=0.1, value=7.5, label="CFG Weight"),
        gr.Textbox(label="Negative Prompt"),
        gr.Slider(minimum=1, maximum=10, step=1, value=1, label="Number of Rows")
    ],
    outputs="image",
    title="Stable Diffusion Image Generator",
    description="Generate images from a textual prompt using Stable Diffusion"
)

iface.launch()

エラー回避のため、参考から以下の変更をしました。

- gr.Textbox(default="", label="Negative Prompt"),
+ gr.Textbox(label="Negative Prompt"),

Gradioのインストール

zsh

pip install gradio

Gradio起動

下記コマンドで起動します。

zsh

$ python ui.py
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

127.0.0.1:7860にアクセスすると、Gradioが起動していることを確認できます。

使い方

Promptのテキストフィールドにテキストを入力して、Submitを押すと、画像生成が開始されます。
※デフォルト設定では画像生成に5分ほどかかりました。色々とパラメータを変えて試してみたいと思います。

まとめ

Stable DiffusionをApple Siliconのローカル環境で動作させて、以下のことができました。ローカル環境で利用できるため、これから重宝しそうです。

CLIでテキストから画像生成
GUIアプリケーション化

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up