More than 1 year has passed since last update.

High-Resolution Image Synthesis with Latent Diffusion Models

論文紹介

Posted at 2023-09-08

インフォメーション
母語が日本語じゃないので、変な日本語で論文をまとめる予定です。

モチベーション

計算資源の節約：
1.Diffusion Models(DMs)を訓練するために、数百GPU日が必要。
生成するために、50kサンプル　大体5日間A100GPUが必要。
→ピクセル空間の代わりに、潜在空間を利用する。

2.条件付き（text or bounding boxes）生成：
By introducing cross-attention layers into the model architecture.
※実はDiffusion Modelを訓練するために膨大なデータが必要です。これについて、本論文が特に注目してないようです。

Method

一見怖いかもしれないが、少しずつ見ていこう。
1.まずは画像xをEncoder εに入れ、潜在空間Z（Latent space）に転換する。

ここのEncoder、DecoderがGANの思想で作ったものです（GANの生成部分）。
だたし、損失関数がトリックです。
考慮すべき項目として、元画像がちゃんと復元したのか（MSE（元の画像,生成したの画像）論文でLrecと書いている項）、
潜在空間がnormal distributionに近づいたのか（KL（Latent,normal distribution）論文でLregと書いている項）、
識別器の性能（logDψ（x）,-log[Dψ（D(ε（x）)）]論文でLadvと書いている項）からなる。

2.次に、Latent Spaceで表現した画像xの行列を普通に訓練する（DM）。
3.Decoderに入れて、Pixel Spaceに戻す。
4.条件付きがcross attentionで実現した。注意すべきな部分をDMに伝達する。

英語大丈夫あれば、ここの動画をおすすめ！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up