More than 1 year has passed since last update.

Waifu Diffusionを日本語で試す

Last updated at 2022-11-06Posted at 2022-11-05

GitHubに興味深いコミュニティパイプラインを作った人がいたので試してみる。

インストール

2022/11/06現在、メインブランチにマージされていない＆しれっと0.7.1がリリースされているので、プルリクエストされたソースをコピーして実行する手順とする。

必要なパッケージのインストール

pip install transformers gradio scipy ftfy "ipywidgets>=7,<8" datasets diffusers[torch] sentencepiece

sentencepieceが増えているので注意。

プルリクエストされたソースをローカルにコピー

git clone https://github.com/huggingface/diffusers.git
cd diffusers
git fetch origin pull/1142/head:add-multilingual-to-community-pipelines
git checkout add-multilingual-to-community-pipelines

今回はexample/community以下のファイルをコピーして終わり

mkdir multilingual_stable_diffusion
cp diffusers/examples/community/multilingual_stable_diffusion.py multilingual_stable_diffusion/pipeline.py

本来であればリポジトリに上がっているものを見に行く仕様であるが、まだ上がっていないのでローカルディレクトリにあるものを呼び出す手順とする。

（参考）チェックアウトからpip installしたい場合

pip install -e .[torch]

pip install diffusers[torch]でリポジトリではなくローカルディレクトリを指定したい場合はpip install -e .[torch]でOK。

動かしてみる

Waifu DiffusionのColabで動作するサンプルを改造したものが以下。

import gradio as gr
import torch
from torch import autocast
from diffusers import DiffusionPipeline
from transformers import (
    pipeline,
    MBart50TokenizerFast,
    MBartForConditionalGeneration,
)

model_id = "hakurei/waifu-diffusion"
device = "cuda" if torch.cuda.is_available() else "cpu"
device_dict = {"cuda": 0, "cpu": -1}

# Add language detection pipeline
language_detection_model_ckpt = "papluca/xlm-roberta-base-language-detection"
language_detection_pipeline = pipeline("text-classification",
                                       model=language_detection_model_ckpt,
                                       device=device_dict[device])

# Add model for language translation
trans_tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-one-mmt")
trans_model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-one-mmt").to(device)

pipe = DiffusionPipeline.from_pretrained(
    model_id,
    custom_pipeline="multilingual_stable_diffusion",
    detection_pipeline=language_detection_pipeline,
    translation_model=trans_model,
    translation_tokenizer=trans_tokenizer,
    revision="fp16",
    torch_dtype=torch.float16,
).to(device)

block = gr.Blocks(css=".container { max-width: 800px; margin: auto; }")

num_samples = 2

def infer(prompt):
    with autocast("cuda"):
        images = pipe([prompt] * num_samples, guidance_scale=7.5)['images']
    return images

with block as demo:
    gr.Markdown("<h1><center>Waifu Diffusion</center></h1>")
    gr.Markdown(
        "waifu-diffusion is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning."
    )
    with gr.Group():
        with gr.Box():
            with gr.Row().style(mobile_collapse=False, equal_height=True):

                text = gr.Textbox(
                    label="Enter your prompt", show_label=False, max_lines=1
                ).style(
                    border=(True, False, True, True),
                    rounded=(True, False, False, True),
                    container=False,
                )
                btn = gr.Button("Run").style(
                    margin=False,
                    rounded=(False, True, True, False),
                )
        gallery = gr.Gallery(label="Generated images", show_label=False).style(
            grid=[2], height="auto"
        )
        text.submit(infer, inputs=[text], outputs=gallery)
        btn.click(infer, inputs=[text], outputs=gallery)

    gr.Markdown(
        """___
   <p style='text-align: center'>
   Created by https://huggingface.co/hakurei
   <br/>
   </p>"""
    )


demo.launch(debug=True, share=True)

実行結果

今後の展開が楽しみである。（微妙とか言わない）

サッカーのユニフォームを着た男の子

サッカーのユニフォームを着た男子高校生

金髪碧眼の男の子がレストランでドーナツを食べる

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up