More than 3 years have passed since last update.

Autoencoderのバリエーション②　Deep AutoencoderとConvolutional Autoencoder

Last updated at 2020-12-06Posted at 2020-12-06

初めに

AutoEncoderの手法を纏める２本目の記事です。１本目はAutoencoderのバリエーション①です。

この記事では前回同様に全結合層を使い、層の数を増やしたDeep Autoencoderと、畳み込みを使ったConvolutional Autoencderの２つを扱います。適度に複雑なモデルを使うので、手書き文字の再現精度は上がっていきます。

Deep AutoEncoder

前回は1層のモデルだけを扱っていました。やはり層を増やして”ディープ”にしたいものです。

# this is the size of our encoded representations
encoding_dim = 32  # 32 floats -> compression of factor 24.5, assuming the input is 784 floats

# this is our input placeholder
input_img = Input(shape=(784,))

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
print(x_train.shape)
print(x_test.shape)

print(x_train.shape[1:])


x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)
print(x_test.shape)

input_img = Input(shape=(784,))
encoded = Dense(128, activation='relu')(input_img)
encoded = Dense(64, activation='relu')(encoded)
encoded = Dense(32, activation='relu')(encoded)

decoded = Dense(64, activation='relu')(encoded)
decoded = Dense(128, activation='relu')(decoded)
decoded = Dense(784, activation='sigmoid')(decoded)

autoencoder = Model(input_img, decoded)

adam = Adam(lr=0.001)
# 損失関数は平均二乗誤差（mse）
autoencoder.compile(optimizer=adam, loss='mse')

history = autoencoder.fit(x_train, x_train,
                epochs=100,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

モデルの構造をmodel_plotを使って図示します。

from keras.utils import plot_model
autoencoder.summary()
plot_model(autoencoder, show_shapes=True)

損失関数は以下の様に減っていきました。

ここでも、もとの手書き数字（上）と、Autoencoderで復元した数字（下）を比較します。

前回のAutoencoderのバリエーション①で示した単純なAutoencoderの結果よりも、少し良く復元できているのではないでしょうか。また、前回と同様に復元の誤差が大きかったデータを確認してみます。

うーん、そもそも入力の手書き数字が汚いですね。復元の精度が低くても、致し方ないか。。。

Convolutional Autoencoderとは

今はMNIST（手書き数字）という画像データを扱っているので、全結合層より畳み込み層を使ったネットワークを使った方が、低次元空間にもより良く圧縮できるのでは、と考えますね。実際にやってみます。まずは必要なkerasのインポートとMNISTの読み込みです。2次元畳み込み Conv2Dを行うためnp.reshape(x_train, [-1,image_size, image_size, 1 ])を使って適切に型を変換します。

from keras.layers import Input, Dense
from keras.layers import Conv2D, Flatten, Reshape, Conv2DTranspose
from keras.models import Model
from keras.utils import plot_model
from keras import backend as K

import numpy as np
import matplotlib.pyplot as plt

# MNISTデータの読み込み
from keras.datasets import mnist

(x_train, _), (x_test, _) = mnist.load_data()

image_size = x_train.shape[1]
print(image_size)

# Conv2D用にnumpy arrayの型を変換
x_train = np.reshape(x_train, [-1,image_size, image_size, 1 ])
x_test = np.reshape(x_test, [-1,image_size, image_size, 1 ])

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

次にエンコーダを構築します。

# ネットワークのパラメータ
input_shape = (image_size, image_size, 1)
batch_size = 128
kernel_size = 3
latent_dim = 32

epochs=100

# エンコーダとデコーダのCNNレイヤーごとのフィルタ数
layer_filters = [32, 64]

# オートエンコーダモデルの構築

# 初めはエンコーダの構築
inputs = Input(shape=input_shape, name='encoder_input')
x = inputs

# フィルタ数32,62のConv2Dを重ねる
for filters in layer_filters:
  x = Conv2D(filters = filters, kernel_size = kernel_size, activation = 'relu', strides=2, padding='same')(x)

# バックエンドを使って型を取得する
shape = K.int_shape(x)

x = Flatten()(x)
latent = Dense(latent_dim, name='latent_vector')(x)

# エンコーダのインスタンス化
encoder = Model(inputs, latent, name='encoder')
encoder.summary()
plot_model(encoder, show_shapes=True)

エンコーダのモデルのネットワークです。28x28から7x7、フィルタ数64まで畳み込んで、Flattenを使って潜在空間のベクトルに変換します。

この次はデコーダです。畳み込みで7x7までに圧縮した画像を、デコーダでは再び解像度をもとに戻す必要があります。このため、Conv2DTransposeを使います。Transposeは「転置」という意味ですが、なぜこう呼ばれるかはUp-sampling with Transposed Convolutionが参考になりました。実装したコードは以下の通りです。

# 後半のデコーダの構築
latent_inputs = Input(shape=(latent_dim,), name='decoder_input')
x = Dense(shape[1] * shape[2] * shape[3])(latent_inputs)
# Conv2Dに適した型に変換
x = Reshape((shape[1], shape[2], shape[3]))(x)

# フィルタ数64,32のConv2DTransposeを重ねる。ここではupsamplingが必要なのでConv2Dを使う
for filters in layer_filters[::-1]:
  x = Conv2DTranspose(filters=filters, kernel_size = kernel_size, activation='relu', strides=2, padding='same')(x)

# 入力の再現
outputs = Conv2DTranspose(filters=1, kernel_size = kernel_size, activation='sigmoid', padding='same', name='decoder_outputs')(x)

# デコーダのインスタンス化
decoder = Model(latent_inputs, outputs, name='decoder')
decoder.summary()
plot_model(decoder, show_shapes=True)

デコーダモデルのネットワークです。2DTransposeを使ってアップサンプリングがじっしされて、元の28x28に復元されています。

エンコーダとデコーダを合わせて一つのオートエンコーダを定義します。

# 全体をオートエンコーダとして定義
autoencoder =Model( inputs, decoder(encoder(inputs)), name='autoencoder' )
autoencoder.summary()
plot_model(autoencoder, show_shapes=True)

このオートエンコーダのモデルをコンパイルして、学習を実施します。

from keras.optimizers import Adam
# adam = keras.optimizers.Adam(lr=0.001)
adam = Adam(lr=0.001)
# 損失関数は平均二乗誤差（mse）
autoencoder.compile(optimizer=adam, loss='mse')

history = autoencoder.fit(x_train, x_train,
                epochs=epochs,
                batch_size=batch_size,
                shuffle=True,
                validation_data=(x_test, x_test))

損失はエポックごとで以下の様になり、Deep Autoencoderよりもだいぶ小さくなりました。

Convolutional Autoencoderでの、もとの手書き数字（上）と、Autoencoderで復元した数字（下）を比較します。

Deep Autoencoderと比較してもだいぶ再現できています。

ここでも再現の精度が悪かった画像を確認してみます。

元の画像にノイズが入っていますね。こういう少数の画像の特徴はなかなか圧縮、復元できなくても仕方ないでしょう。

今回はここまで。次はcolorization autoencoderをやってみたいと思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Autoencoderのバリエーション② Deep AutoencoderとConvolutional Autoencoder

初めに

Deep AutoEncoder

Convolutional Autoencoderとは

Autoencoderのバリエーション②　Deep AutoencoderとConvolutional Autoencoder