More than 1 year has passed since last update.

Stable Diffusion 512の壁を超えたい！

Last updated at 2022-11-05Posted at 2022-11-03

Stable Diffusion 画像を作っている途中を見たい！Part 2　で　txt2img/img2img　をローカルで実行できるようにしました。生成画像は　512×512　のままです。今回、画像の分割、統合させ、512の壁に挑戦したいと思います。

使用したPCは
OS:Windows10 Home 64bit
CPU:Intel(R) Core(TM) i7-10750H
GPU:NVIDIA GeForce GTX 1650 Ti
RAM:16GB
です。

一応、　NVIDIA　の　GPU　があります。前回から　「CPUのみで動作」　と言っていません。CPU　のみでも動作するはずですが、とてつもない時間がかかってしまうかもしれません。

今回の目標は
・prompt　から　txt2img　画像を生成。分割した画像を　img2img　使用して画像を再生成。合成して　512×512　を超える画像を生成する。
とします。

バックグランドとなる論文など無く、本当に個人の発想のみで作られていますので、賛否あるかと思いますが、温かく見守ってください。

「環境構築」

環境は　Stable Diffusion 画像を作っている途中を見たい！Part 2　で作ったものを使用します。
フォルダ構成は以下のようにしました。

C:\Users
 +---XXXX
   +---Documents
     +---Source
       +---Python　←このフォルダがカレントディレクトリです。
　      +---stable-diffusion-main
         +---optimizesdSD
            +---ddpm.py ←このファイルは編集されています。
            +---
            +---
         +---models
           +---ldm
             +---stable-diffusion-v1
               +----model.ckpt

ddpm.py　は前回編集しました。（生成画像の表示機能と　GIF　アニメーション生成機能が追加されていますが、今回は使用しません、もちろん動作はします。）

スタートメニューから　Anaconda3（64ビット）→　Jupyter Notebook（ldm）　を起動して、新しい　Notebook　を作成します。
以降、コードセルにコードを入力し、実行していきます。

「環境の確認」と「optimized_txt2img/optimized_img2imgを関数化する」はそのままコピペして、
実行、txt2img/img2img　の定義を行います。
（ここまで、2つのセルが実行済みとなり、txt2img defined !!　　img2img defined !!　が表示されるところまできました。）

「とにかくやってみる」

コードはさておき、txt2img　で画像を生成、分割（ここでは4分割）　img2img　に　prompt　は固定、分割画像を　img2img　の　init_img　に設定。4回　ループさせて出力された画像を結合します。
※以下は説明用でオリジナルのサイズではありません。
サンプル画像(512×512)　⇒　分割(256×256：4枚)
　　

img2img　の結果(512×512：4枚)　⇒　結合

実寸(1024×1024)

当然、境目が不自然になっています。

「init_img　をOverlap　させる」

元画像の生成は以下のコードで行いました。（3番目のコードセルになります）

import matplotlib.pyplot as plt
%matplotlib

prompt="A digital illustration of a medieval town, 4k, detailed, trending in artstation, fantasy"
seed=0

image_org=txt2img(prompts=prompt,H=512,W=512,seed=seed,scale=7.0,ddim_steps=25,precision='full',visible=False)

%matplotlib inline
plt.axis('off')
plt.imshow(image_org)

次に以下のコードを実施します。（意味は後半で分かります）

final_image=image_org
print('Copied !')

分割のコードは以下とします。

r,c,n=final_image.shape
print(r,c,n)
overlap=0.1
sub_images=[]
sub_images.append(final_image[ :int(r/2+r/2*overlap) ,:int(c/2+c/2*overlap), :])
sub_images.append(final_image[ :int(r/2+r/2*overlap):,int(c/2-c/2*overlap):, :])
sub_images.append(final_image[ int(r/2-r/2*overlap): ,:int(c/2+c/2*overlap), :])
sub_images.append(final_image[ int(r/2-r/2*overlap): ,int(c/2-c/2*overlap):, :])

row,col=(2,2)
fig = plt.figure(figsize=(5,5))

i=0
for r in range(row):
    for c in range(col):
        ax=fig.add_subplot(row,col,i+1)
        ax.axis('off')
        ax.imshow(sub_images[i])
        print(sub_images[i].shape)
        i+=1

元画像をコピーした　sub_images　に左上、右上、左下、右下の順に保存します。
overlap　に従ってそれぞれが重複した画像を含むようになります。overlap=0.1　で10%づつ重なります。実行すると以下の結果となります。
サンプル画像(512×512)　⇒　分割(281×281 or 282×282 etc：4枚)
　　
次に分割画像を　img2img　の　init_img　に設定。4回　ループさせます。

import matplotlib.pyplot as plt
%matplotlib

image=[]
for i in range(4):
    init_image=Image.fromarray(sub_images[i])
    image_tmp=img2img(prompts=prompt,init_img=init_image,ddim_steps=60,H=512,W=512,strength=0.5,
                      scale=7.5,device='cuda',seed=seed,precision='full')
    image.append(image_tmp)

%matplotlib inline
row,col=(2,2)
fig = plt.figure(figsize=(10,10))
i=0
for r in range(row):
    for c in range(col):
        ax=fig.add_subplot(row,col,i+1)
        ax.axis('off')
        ax.imshow(image[i])
        i+=1

ここで、strength=0.5　としていますが、大きくすると輪郭を無視してくる傾向が強くなるので、0.5　としました。実行すると上記のスペック（バッテリ駆動時）で12分30秒程度かかりましたが、以下の通りです。

「Overlap　させて結合（その１）」

分割と同じ比率で重なる部分を混ぜて結合します。
重ねる仕組みを考えます。

r,c,n=image[0].shape

bare_size_r=int(r/(1+2*overlap)) #重なりのない部分のサイズ(row方向)
bare_size_c=int(c/(1+2*overlap)) #重なりのない部分のサイズ(col方向)
lap_size_r=r-bare_size_r         #重なる部分のサイズ(row方向)
lap_size_c=c-bare_size_c         #重なる部分のサイズ(col方向)

white_screen=np.ones((r,c,3),dtype='float64')  #分割画面大の1.0行列
flat_r=np.ones((bare_size_r))                  #重なりのない部分を1.0で埋める(row方向)
fade_out_r=np.array([x/lap_size_r for x in reversed(range(lap_size_r))])
                                               #重なる部分を1.0から0.0に順次減少(row方向)
fade_in_r=np.array([x/lap_size_r for x in range(lap_size_r)])
                                               #重なる部分を0.0から1.0に順次減少(row方向)
flat_c=np.ones((bare_size_c))                  #重なりのない部分を1.0で埋める(col方向)
fade_out_c=np.array([x/lap_size_c for x in reversed(range(lap_size_c))])
                                               #重なる部分を1.0から0.0に順次減少(col方向)
fade_in_c=np.array([x/lap_size_c for x in range(lap_size_c)])
                                               #重なる部分を0.0から1.0に順次減少(row方向)

と準備して、例えば左上では

#Upper Left
ptrn_r=np.concatenate([flat_r,fade_out_r],0).reshape(r,1,1)
ptrn_c=np.concatenate([flat_c,fade_out_c],0).reshape(1,c,1)
mask=np.multiply(np.multiply(white_screen,ptrn_r),ptrn_c)

としてマスクをつくります。256段階で示すと
こんな感じです。同様に右上、左下、右下のマスクを作って掛け算後、位置を調整して合成します。
まとめたセルが以下の通りです。

%matplotlib inline
tile_image=[]
r,c,n=image[0].shape
print(r,c,n)

bare_size_r=int(r/(1+2*overlap))
bare_size_c=int(c/(1+2*overlap))
lap_size_r=r-bare_size_r
lap_size_c=c-bare_size_c

white_screen=np.ones((r,c,3),dtype='float64')
flat_r=np.ones((bare_size_r))
fade_out_r=np.array([x/lap_size_r for x in reversed(range(lap_size_r))])
fade_in_r=np.array([x/lap_size_r for x in range(lap_size_r)])
flat_c=np.ones((bare_size_c))
fade_out_c=np.array([x/lap_size_c for x in reversed(range(lap_size_c))])
fade_in_c=np.array([x/lap_size_c for x in range(lap_size_c)])
#Upper Left
ptrn_r=np.concatenate([flat_r,fade_out_r],0).reshape(r,1,1)
ptrn_c=np.concatenate([flat_c,fade_out_c],0).reshape(1,c,1)
mask=np.multiply(np.multiply(white_screen,ptrn_r),ptrn_c)

base_screen=np.zeros((bare_size_r*2+lap_size_r,bare_size_c*2+lap_size_c,3))
masked_image=mask*image[0].astype('float64')
base_screen[0:bare_size_r+lap_size_r,0:bare_size_c+lap_size_c,:]=masked_image
tile_image.append(base_screen)

#Upper Right
ptrn_r=np.concatenate([flat_r,fade_out_r],0).reshape(r,1,1)
ptrn_c=np.concatenate([fade_in_c,flat_c],0).reshape(1,c,1)
mask=np.multiply(np.multiply(white_screen,ptrn_r),ptrn_c)

base_screen=np.zeros((bare_size_r*2+lap_size_r,bare_size_c*2+lap_size_c,3))
masked_image=mask*image[1].astype('float64')
base_screen[0:bare_size_r+lap_size_r,bare_size_c:,:]=masked_image
tile_image.append(base_screen)

#Lower Left
ptrn_r=np.concatenate([fade_in_r,flat_r],0).reshape(r,1,1)
ptrn_c=np.concatenate([flat_c,fade_out_c],0).reshape(1,c,1)
mask=np.multiply(np.multiply(white_screen,ptrn_r),ptrn_c)

base_screen=np.zeros((bare_size_r*2+lap_size_r,bare_size_c*2+lap_size_c,3))
masked_image=mask*image[2].astype('float64')
base_screen[bare_size_r:,:-bare_size_c,:]=masked_image
tile_image.append(base_screen)

#Lower Right
ptrn_r=np.concatenate([fade_in_r,flat_r],0).reshape(r,1,1)
ptrn_c=np.concatenate([fade_in_c,flat_c],0).reshape(1,c,1)
mask=np.multiply(np.multiply(white_screen,ptrn_r),ptrn_c)

base_screen=np.zeros((bare_size_r*2+lap_size_r,bare_size_c*2+lap_size_c,3))
masked_image=mask*image[3].astype('float64')
base_screen[bare_size_r:,bare_size_c:,:]=masked_image
tile_image.append(base_screen)

final_image=(tile_image[0]+tile_image[1]+tile_image[2]+tile_image[3]).astype('uint8')

fig = plt.figure(figsize=(20,10))
ax1=fig.add_subplot(1,2,1)
ax1.axis('off')
ax1.imshow(final_image)
print('final Size',final_image.shape)
ax2=fig.add_subplot(1,2,2)
ax2.axis('off')
ax2.imshow(image_org)

ゴリゴリのコードでお恥ずかしいのですが、わかりやすいとは思います。
結果は」以下の通りです.
原寸(938×938)

どうですか？結構いい線言ってませんか？
調子に乗ってもう一回ループを回してみます。
ここで、謎の4番目のコードセルが意味を持ちます。もう一回5番目のコードセルを実行すると、拡大したイメージを分割して、2度目のループに突入できます。順に実行して得られたのが以下の画像です。

コントラストがどぎつくなって、今一つかと判断しました。

「苦手な人物の試行」

Stable Diffusion　は兎角人物に弱い傾向があるように思われます。
prompt="A portrait of Woman,beautiful face,short hair,cute eyes,beautiful composition"
seed=0
で　txt2img　で生成される画像が以下の通りでした。

上記で　img2img ループ後に結合したのが以下の画像です。(原寸839×839)

「Overlap　させて結合（その２）」

人の顔だと微妙な境目が目立ってしまっています。
そこで、写経マニアは考えました。機械学習で使う　sigmoid　関数が使えるのでは？と。
コードは以下の通り。

%matplotlib inline

def sigmoid(x):
  return 1.0 / (1.0 + np.exp(-x))

tile_image=[]
r,c,n=image[0].shape
print(r,c,n)

gain=8.0
bare_size_r=int(r/(1+2*overlap))
bare_size_c=int(c/(1+2*overlap))
lap_size_r=r-bare_size_r
lap_size_c=c-bare_size_c

white_screen=np.ones((r,c,3),dtype='float64')
flat_r=np.ones((bare_size_r))
fade_out_r=np.array([sigmoid(gain*(x-lap_size_r/2)/lap_size_r) for x in reversed(range(lap_size_r))])
fade_in_r=np.array([sigmoid(gain*(x-lap_size_r/2)/lap_size_r) for x in range(lap_size_r)])
flat_c=np.ones((bare_size_c))
fade_out_c=np.array([sigmoid(gain*(x-lap_size_c/2)/lap_size_c) for x in reversed(range(lap_size_c))])
fade_in_c=np.array([sigmoid(gain*(x-lap_size_c/2)/lap_size_c) for x in range(lap_size_c)])
#Upper Left
ptrn_r=np.concatenate([flat_r,fade_out_r],0).reshape(r,1,1)
ptrn_c=np.concatenate([flat_c,fade_out_c],0).reshape(1,c,1)
mask=np.multiply(np.multiply(white_screen,ptrn_r),ptrn_c)

base_screen=np.zeros((bare_size_r*2+lap_size_r,bare_size_c*2+lap_size_c,3))
masked_image=mask*image[0].astype('float64')

base_screen[0:bare_size_r+lap_size_r,0:bare_size_c+lap_size_c,:]=masked_image
tile_image.append(base_screen)

#Upper Right
ptrn_r=np.concatenate([flat_r,fade_out_r],0).reshape(r,1,1)
ptrn_c=np.concatenate([fade_in_c,flat_c],0).reshape(1,c,1)
mask=np.multiply(np.multiply(white_screen,ptrn_r),ptrn_c)

base_screen=np.zeros((bare_size_r*2+lap_size_r,bare_size_c*2+lap_size_c,3))
masked_image=mask*image[1].astype('float64')
base_screen[0:bare_size_r+lap_size_r,bare_size_c:,:]=masked_image
tile_image.append(base_screen)

#Lower Left
ptrn_r=np.concatenate([fade_in_r,flat_r],0).reshape(r,1,1)
ptrn_c=np.concatenate([flat_c,fade_out_c],0).reshape(1,c,1)
mask=np.multiply(np.multiply(white_screen,ptrn_r),ptrn_c)

base_screen=np.zeros((bare_size_r*2+lap_size_r,bare_size_c*2+lap_size_c,3))
masked_image=mask*image[2].astype('float64')
base_screen[bare_size_r:,:-bare_size_c,:]=masked_image
tile_image.append(base_screen)

#Lower Right
ptrn_r=np.concatenate([fade_in_r,flat_r],0).reshape(r,1,1)
ptrn_c=np.concatenate([fade_in_c,flat_c],0).reshape(1,c,1)
mask=np.multiply(np.multiply(white_screen,ptrn_r),ptrn_c)

base_screen=np.zeros((bare_size_r*2+lap_size_r,bare_size_c*2+lap_size_c,3))
masked_image=mask*image[3].astype('float64')
base_screen[bare_size_r:,bare_size_c:,:]=masked_image
tile_image.append(base_screen)

final_image_sigmoid=(tile_image[0]+tile_image[1]+tile_image[2]+tile_image[3]).astype('uint8')

fig = plt.figure(figsize=(20,10))
ax1=fig.add_subplot(1,2,1)
ax1.axis('off')
ax1.imshow(final_image_sigmoid)
print('final Size',final_image.shape)
ax2=fig.add_subplot(1,2,2)
ax2.axis('off')
ax2.imshow(final_image)

できたマスクは以下の通り。
Liniar　　　　　　　　　　　　　　　　　　 Sigmoid
　　
合成した画像が以下の通りです。
原寸(983×983)

いかがでしょうか？
Sigmoid　の導入で　Gain　というパラメータも増えたので、Try & Error　の幅が増えたように思えます。

「成功例　！」

・prompt="medieval hobbit home, ornate, beautiful, atmosphere, vibe, mist, smoke, chimney, rain, spell - book,wet, pristine, puddles, dripping, waterfall, creek, bridge, forest,wers, concept art illustration,color page, 4 k, tone mapping, doll, akihiko yoshida, james jean, andrei riabovitchev, marc simonetti,yoshitaka amano, digital illustration, greg rutowski, volumetric lighting, sunbeams, particles "
・seed=3128183630,ddim_steps=120
実寸(983×983)

「まとめ」

何気なく、勝手に、思いつくままにコードをいじってきましたが、
ことのほか、うまくいってると思います。いつも長い記事にお付き合いいただきありがとうございます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Stable Diffusion 512の壁を超えたい！

「環境構築」

「とにかくやってみる」

「init_img をOverlap させる」

「Overlap させて結合（その１）」