More than 5 years have passed since last update.

【StyleGAN入門】style_mixing「眼鏡をはずす女性」で遊んでみた♬

Posted at 2020-01-04

StyleGAN二日目は、以下の参考①にある、StyleGANで画像生成する３つの方法のうち二つの方法について解説して、いろいろなStyle_Mixing画像生成をやってみようと思う。なお、StyleGANについての良い解説が参考②にあるので参照するとこの記事の解説もわかりやすいと思う。
【参考】
①NVlabs/stylegan
②StyleGAN解説 CVPR2019読み会@DeNA

やったこと

・まず２つの方法とは
・コードにしてみる
・LatentMixing;潜在空間$z$でMixingをやってみる
・StyleMixing;写像された潜在空間$w$でMixingしてみる
・StyleMixing_2;写像された潜在空間$w$でstyle属性を入れ替えて画像生成する
・StyleMixing_3；写像潜在空間$w$の個別Style属性をMixingして画像生成する　

・まず２つの方法とは

簡単に直訳すると以下のとおりである。

事前学習済みのジェネレーターを使用するには、次の3つの方法があります。
$1. 入力と出力がnumpy配列である即時モード操作にはGs.run（）を使用します.$
訳注）前回はこの手法を使いました

# Pick latent vector.
rnd = np.random.RandomState(5)
latents = rnd.randn(1, Gs.input_shape[1])
# Generate image.
fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
images = Gs.run(latents, None, truncation_psi=0.7, randomize_noise=True, output_transform=fmt)

最初の引数は、形状[num、512]の潜在ベクトルのバッチです。 2番目の引数はクラスラベル用に予約されています（StyleGANでは使用されません）
残りのキーワード引数はオプションであり、操作をさらに変更するために使用できます（以下を参照）。出力は画像のバッチであり、その形式はoutput_transform引数によって決定されます。
訳注）以下参照のオプション(truncation_psi=0.7, randomize_noise=True)については参考①を参照してください

$2.$Use $Gs$.get_output_for() to incorporate the generator as a part of a larger TensorFlow expression:
訳注）これは今回使用しないので飛ばします。
...
$3.Gs.components.mappingおよびGs.components.synthesis$を検索して、ジェネレーターの個々のサブネットワークにアクセスします。$G$と同様に、サブネットワークは$dnnlib.tflib.Network$の独立したインスタンスとして表されます。:
訳注）今回は生成画像Mixingでこの手法を利用します

src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)

・コードにしてみる

上記の手法を用いて実際の最も簡単なコードは以下のとおり記述できます。

import os
import pickle
import numpy as np
import PIL.Image
import dnnlib
import dnnlib.tflib as tflib
import config
from PIL import Image, ImageDraw
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

synthesis_kwargs = dict(output_transform=dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True), minibatch_size=8)

def main():
    # Initialize TensorFlow.
    tflib.init_tf()
    fpath = './weight_files/tensorflow/karras2019stylegan-ffhq-1024x1024.pkl'
    with open(fpath, mode='rb') as f:
        _G, _D, Gs = pickle.load(f)
    #方法1. 入力と出力がnumpy配列である即時モード操作にはGs.run（）を使用
    # Pick latent vector.
    rnd = np.random.RandomState(5)
    latents1 = rnd.randn(1, Gs.input_shape[1])
    # Generate image.
    fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
    images = Gs.run(latents1, None, truncation_psi=0.7, randomize_noise=True, output_transform=fmt)
    plt.imshow(images.reshape(1024,1024,3))
    plt.pause(1)
    plt.savefig("./results/simple1_.png")
    plt.close()
    #方法3.Gs.components.mappingおよびGs.components.synthesisを検索して、ジェネレーターの個々のサブネットワークにアクセスします
    #Gと同様に、サブネットワークはdnnlib.tflib.Networkの独立したインスタンスとして表されます。
    src_seeds = [5]
    src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
    src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
    src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)
    plt.imshow(src_images[0].reshape(1024,1024,3))
    plt.pause(1)
    plt.savefig("./results/simple3_.png")
    plt.close()
    
if __name__ == "__main__":
    main()

このコードだとどちらの手法も同じ画像を生成しそうですが、実際にやってみると以下のとおり少し異なりました。

	手法１	手法２
潜在テンソル	z=latents1	z=src_latents, w=src_dlatents
size	(1,512)	(1,512), (1,18,512)

これらの潜在テンソルはそれぞれ以下の図の$z$と$w$に対応しています。
つまり、Latent $z$は512のパラメータを持つベクトルで、その写像潜在空間$W$のテンソル$w$は(18,512)の次元を持っているということです。
すなわちSynthesis networkへの入力Aは18ヶ所あり(参考③参照)、これがそれぞれStyleの元となっているテンソル$w$というわけです。
【参考】
③Style-mixingをやってみる@StyleGANの学習済みモデルでサクッと遊んでみる

つまり、上記の手法１と３の説明は以下のように言い換えることができます。

手法１.　潜在ベクトル$z$から画像生成しているものです
手法２.　潜在ベクトル$z$から一度写像潜在空間のテンソル$w$を求めて、さらにそこから対応するsynthesys networkの$A$を探索して、それぞれが独立なネットワークとして計算しつつ画像生成しているということです

・LatentMixing;潜在空間 zでMixingをやってみる

これは前回やった通りですが、上記を反映して潜在ベクトル$z$の求め方として二通りで実施します。
主要なコードは以下のとおりです。

simple_method1.py

def main():
    # Initialize TensorFlow.
    tflib.init_tf()
    fpath = './weight_files/tensorflow/karras2019stylegan-ffhq-1024x1024.pkl'
    with open(fpath, mode='rb') as f:
        _G, _D, Gs = pickle.load(f)

    # Pick latent vector.
    rnd = np.random.RandomState(5) #5
    latents1 = rnd.randn(1, Gs.input_shape[1])
    print(latents1.shape)
    
    # Generate image.
    fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
    images = Gs.run(latents1, None, truncation_psi=1, randomize_noise=False, output_transform=fmt)
    # Pick latent vector2
    src_seeds=[6]
    src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
    # Generate image2
    src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
    src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)
    
    for i in range(1,101,4):
        # mixing latent vetor_1-2
        latents = i/100*latents1+(1-i/100)*src_latents[0].reshape(1,512)
        # Generate image for mixing vector by method1.
        fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
        images = Gs.run(latents, None, truncation_psi=1, randomize_noise=False, output_transform=fmt)
        # Save image.
        os.makedirs(config.result_dir, exist_ok=True)
        png_filename = os.path.join(config.result_dir, 'example{}.png'.format(i))
        PIL.Image.fromarray(images[0], 'RGB').save(png_filename)

結果は以下のとおりとなります。

Latent z mixing

ここで、「コードにしてみる」のところで両者の出力が異なっていましたが、これはtruncation_psi, randomize_noiseというパラメータの所為でした。そこでここでは再現性を確実にするために、それぞれ1, Falseと変更しています。
感想）この動画見てると二人のお子さんの顔が見えるようで怖い．．．

・StyleMixing; 写像された潜在空間wでMixingしてみる

今度は上と同じように、しかし写像潜在空間$w$によるMixingをやってみます。
コードの主要な部分は以下のとおりです。

simple_method2.py

synthesis_kwargs = dict(output_transform=dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True), minibatch_size=8)

def main():
    # Initialize TensorFlow.
    tflib.init_tf()
    fpath = './weight_files/tensorflow/karras2019stylegan-ffhq-1024x1024.pkl'
    with open(fpath, mode='rb') as f:
        _G, _D, Gs = pickle.load(f)

    # Pick latent vector.
    rnd = np.random.RandomState(5) #5
    latents1 = rnd.randn(1, Gs.input_shape[1])
    
    # Generate image.
    dlatents1 = Gs.components.mapping.run(latents1, None) # [seed, layer, component]
    images = Gs.components.synthesis.run(dlatents1, randomize_noise=False, **synthesis_kwargs)
    
    src_seeds=[6]
    src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
    src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
    src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)
    
    for i in range(1,101,4):
        dlatents = i/100*dlatents1+(1-i/100)*src_dlatents
        # Generate image.
        images = Gs.components.synthesis.run(dlatents, randomize_noise=False, **synthesis_kwargs)
        # Save image.
        os.makedirs(config.result_dir, exist_ok=True)
        png_filename = os.path.join(config.result_dir, 'example{}.png'.format(i))
        PIL.Image.fromarray(images[0], 'RGB').save(png_filename)

結果は以下のとおりとなりました。
一見して、結果は異なっています。

Style mixing in projected space

入力の潜在ベクトル$z$で線形補間するのと、その非線形（多段MLP）写像空間でのStyleベクトル$w$のそれぞれを線形補間するのと異なるのは当然である。
結果は、ウワンが見る限りだと写像空間でのStyleベクトルの線形補間の方が眼鏡が長持ちするという意味で好ましい気がします。
ここで、次からはこの線形補間は、補間という意味ではまだまだ粗いというのを見ていこうと思います。

・StyleMixing_2;写像された潜在空間wでstyle属性を入れ替えて画像生成する

この手法は論文にも出ていて最も有名な画像変化の例です。
早速、コードを示します。このコードは参考③のコードを参考にしています。
※関数の構成を変更しているのでほぼ全体を載せます

ordinary_style_mixising.py

import os
import pickle
import numpy as np
import PIL.Image
import dnnlib
import dnnlib.tflib as tflib
import config
import matplotlib.pyplot as plt

synthesis_kwargs = dict(output_transform=dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True), minibatch_size=8)

def load_Gs():
    fpath = './weight_files/tensorflow/karras2019stylegan-ffhq-1024x1024.pkl'
    with open(fpath, mode='rb') as f:
        _G, _D, Gs = pickle.load(f)
    return Gs

def draw_style_mixing_figure(png, Gs, w, h, src_seeds, dst_seeds, style_ranges):
    print(png)
    src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
    src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]

    # Pick latent vector.
    rnd = np.random.RandomState(5) #5
    latents1 = rnd.randn(1, Gs.input_shape[1])
    print(latents1.shape)
    
    # Generate image.
    dlatents1 = Gs.components.mapping.run(latents1, None) # [seed, layer, component]
    images = Gs.components.synthesis.run(dlatents1, randomize_noise=False, **synthesis_kwargs)

    dst_dlatents = np.zeros((6,18,512))
    for j in range(6):
        dst_dlatents[j] = dlatents1

    src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)
    dst_images = Gs.components.synthesis.run(dst_dlatents, randomize_noise=False, **synthesis_kwargs)
    print(dst_images.shape)

    canvas = PIL.Image.new('RGB', (w * (len(src_seeds) + 1), h * (len(dst_seeds) + 1)), 'white')
    for col, src_image in enumerate(list(src_images)):
        canvas.paste(PIL.Image.fromarray(src_image, 'RGB'), ((col + 1) * w, 0))
    for row, dst_image in enumerate(list(dst_images)):
        canvas.paste(PIL.Image.fromarray(dst_image, 'RGB'), (0, (row + 1) * h))
        row_dlatents = np.stack([dst_dlatents[row]] * len(src_seeds))
        row_dlatents[:, style_ranges[row]] = src_dlatents[:, style_ranges[row]]
        row_images = Gs.components.synthesis.run(row_dlatents, randomize_noise=False, **synthesis_kwargs)
        for col, image in enumerate(list(row_images)):
            canvas.paste(PIL.Image.fromarray(image, 'RGB'), ((col + 1) * w, (row + 1) * h))
    canvas.save(png)

def main():
    tflib.init_tf()
    os.makedirs(config.result_dir, exist_ok=True)
    draw_style_mixing_figure(os.path.join(config.result_dir, 'style-mixing.png'), 
                             load_Gs(), w=1024, h=1024, src_seeds=[6,701,687,615,2268], dst_seeds=[0,0,0,0,0,0],
                             style_ranges=[range(0,8)]+[range(1,8)]+[range(2,8)]+[range(1,18)]+[range(4,18)]+[range(5,18)])

if __name__ == "__main__":
    main()

結果は以下のとおりとなります。
※なお、掲載サイズのために上記出力1024x1024を256x256に縮小しています

コードからこれらの図は以下のStyle変換で生成されています。
style_ranges=[range(0,8)]+[range(1,8)]+[range(2,8)]+[range(1,18)]+[range(4,18)]+[range(5,18)]
row_dlatents[:, style_ranges[row]] = src_dlatents[:, style_ranges[row]]
すなわち、Style[0,18]のうち上段から以下のStyleの変換だけでこれだけの画像変換ができるということです。
※おじさん側から見て、以下のRange部分を女性のStyleに変更しています
　おじさんを維持するには少なくともrange[0,4]を維持する必要があるということです


	[range(0,8)]
同上	[range(1,8)]
同上	[range(2,8)]
同上	[range(1,18)]
同上	[range(4,18)]
同上	[range(5,18)]
逆に女性から言えば、2-4段目を見るとrange[0]が無いだけで眼鏡は外され女性らしさもかなりボーイッシュな感じに変化させられています。特に4段目はrange[0]以外すべて同じなのにかなりの変化をしています。

・StyleMixing_3；写像潜在空間wの個別Style属性をMixingして画像生成する

ということで、先ほどのコードを使ってこのrange[0]のStyleをMixingすることにより、この変化を見てみようと思います。
simple_method2.pyの当該コード部分を以下に置き換えると実現できます。

individual_mixing_style.py

    for i in range(1,26,1):
        dlatents=src_dlatents
        dlatents[0][0] = i/100*dlatents1[0][0]+(1-i/100)*src_dlatents[0][0]

Individual style mixing in projected space

今回は示しませんが、この手法だとStyle空間の任意のパラメータをMixingできることにより、よりきめ細かなMixingが出来ます。

まとめ

・「眼鏡をはずす女性」をやってみた
・それぞれの特性に特化してMixing出来るようになった
・この手法だとnpyで与えられた画像についても同様に実施できる

・独自画像を学習して独自なStyle画像を生成したい

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up