More than 5 years have passed since last update.

InceptionV3をFine-tuningさせてクラス分類

Last updated at 2018-11-21Posted at 2018-11-15

目的

Kerasの習得
ニューラルネットワークのさらなる理解
Keras学習済みモデルのInceptionV3をCIFAR-10でFine-tuningさせ、クラス分類モデルを構築

転移学習（Transfer learning）
重みデータを変更させずに、既存の学習済モデルを特徴量抽出機として利用する。

ファインチューニング（Fine-tuning）
重みデータを一部再学習させて、既存の学習済モデルを特徴量抽出機として利用する。

概要

データセット：CIFAR-10
ネットワーク：InceptionV3
実行環境：Google Colaboratory（GPU）

InceptionV3
ImageNetから抽出された画像（1000クラス）で学習したモデル
152層ニューラルネットワークモデル

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%config InlineBackend.figure_formats = {'png', 'retina'}

import os, cv2
from keras.applications.inception_v3 import InceptionV3
from keras.models import Model, load_model
from keras.layers.core import Dense
from keras.layers.pooling import GlobalAveragePooling2D
from keras.optimizers import Adam, RMSprop, SGD
from keras.utils.np_utils import to_categorical
from keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau, TensorBoard
from keras.preprocessing.image import ImageDataGenerator
from keras.datasets import cifar10
from sklearn.model_selection import train_test_split

データ取得

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 37s 0us/step
(50000, 32, 32, 3) (50000, 1) (10000, 32, 32, 3) (10000, 1)

データ抽出

X_train = X_train[:15000]
X_test = X_test[:3000]
print(X_train.shape, X_test.shape)

(15000, 32, 32, 3) (3000, 32, 32, 3)

サイズ変換
(32, 32, 3)から(139, 139, 3)に変換
InceptionV3最小入力サイズである139未満の場合、サイズ変換が必要

input_size = 139
num=len(X_train)
zeros = np.zeros((num,input_size,input_size,3))
for i, img in enumerate(X_train):
    zeros[i] = cv2.resize(
        img,
        dsize = (input_size,input_size)
    )
X_train = zeros
del zeros
X_train.shape

(15000, 139, 139, 3)

num=len(X_test)
zeros = np.zeros((num,input_size,input_size,3))
for i, img in enumerate(X_test):
    zeros[i] = cv2.resize(
        img,
        dsize = (input_size,input_size)
    )
X_test = zeros
del zeros
X_test.shape

(3000, 139, 139, 3)

# データ型の変換＆正規化
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

# one-hot変換
num_classes = 10 
y_train = to_categorical(y_train, num_classes = num_classes)[:15000]
y_test = to_categorical(y_test, num_classes = num_classes)[:3000]
print(y_train.shape, y_test.shape)

(15000, 10) (3000, 10)

データ分割

trainデータからvalidationデータを分割

X_train, X_valid, y_train, y_valid = train_test_split(
    X_train,
    y_train,
    random_state = 0,
    stratify =y_train,
    test_size = 0.2
)
print(X_train.shape, y_train.shape, X_valid.shape, y_valid.shape)

(12000, 139, 139, 3) (12000, 10) (3000, 139, 139, 3) (3000, 10)

InceptionV3の読み込み

include_top：ネットワークの出力層側にある全結合層を含むか（既定値：True）
Trueの場合、入力サイズは299x299
weights：ImageNetで学習した重みを使用するか（既定値：imagenet）
input_shape：shapeのタプル（既定値：None）
include_topがFalseの場合のみ指定可能（width と height は139以上の3チャネル）

全結合層の除去

ネットワークの出力層側にある全結合層の除去

base_model = InceptionV3(
    include_top = False,
    weights = "imagenet",
    input_shape = None
)

モデル再構築

全結合層の新規構築

GlobalAveragePooling2D
各channelにおけるrowとcolの平均を取り、channel単位でひとつの値に集約する処理
(None, 8, 8, 2048) → 各(8, 8)の平面の平均を算出 → (None, 2048)に圧縮
この後の全結合層につなぐことができる。

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

Data Augmentation

Data Augmentation（データ拡張）とは、画像に変換処理を加えることによる学習データの水増し。水増しすることで、同じ画像での学習がなくなり、過学習が緩和され、汎化性能が改善される傾向にある。

datagen = ImageDataGenerator(
    featurewise_center = False,
    samplewise_center = False,
    featurewise_std_normalization = False,
    samplewise_std_normalization = False,
    zca_whitening = False,
    rotation_range = 0,
    width_shift_range = 0.1,
    height_shift_range = 0.1,
    horizontal_flip = True,
    vertical_flip = False
)

# 統計量（平均、標準偏差、主成分）を算出
# 上記featurewise_center， featurewise_std_normalization， zca_whiteningが指定されたときに必要
# datagen.fit(np.concatenate((X_train, X_valid), axis = 0))

Callback

# EarlyStopping
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=10,
    verbose=1
)

# ModelCheckpoint
weights_dir='./weights/'
if os.path.exists(weights_dir)==False:os.mkdir(weights_dir)
model_checkpoint = ModelCheckpoint(
    weights_dir + "val_loss{val_loss:.3f}.hdf5",
    monitor='val_loss',
    verbose=1,
    save_best_only=True,
    save_weights_only=True,
    period=3
)

# reduce learning rate
reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.1,
    patience=3,
    verbose=1
)

# log for TensorBoard
logging = TensorBoard(log_dir="log/")

モデル学習

Keras学習済みモデルのInceptionV3をCIFAR-10でFine-tuning。
CNNでは浅い層ほど色やエッジ、ブロブなど汎用的な特徴が抽出され、深い層ほど学習データに特化した特徴が抽出される傾向にある。そこで、浅い層の汎用的な特徴抽出器はそのまま固定（freeze）させ、深い層の重みのみタスクに合わせて再学習させる。

全314層のうち、下層（入力層側）250層をfreeze
ただし、Batch Normalizationはfreezeを解除させる。
上記を除く、およそ2割の層内にある畳み込み層と全結合層の重みを学習させる。

ブロブ解析
ブロブとは塊のことであり、ブロブの有無、数、面積、位置、長さ、方向など対象の形状特徴を解析する画像処理

# ネットワーク定義
model = Model(inputs = base_model.input, outputs = predictions)

# 250層以降を学習させる
for layer in model.layers[:249]:
    layer.trainable = False
    
    # Batch Normalization の freeze解除
    if layer.name.startswith('batch_normalization'):
        layer.trainable = True
    
for layer in model.layers[249:]:
    layer.trainable = True
    
# layer.trainableの設定後に、必ずcompile
model.compile(
    optimizer = Adam(),
    loss = 'categorical_crossentropy',
    metrics = ["accuracy"]
)

%%time
hist = model.fit_generator(
    datagen.flow(X_train, y_train, batch_size = 32),
    steps_per_epoch = X_train.shape[0] // 32,
    epochs = 50,
    validation_data = (X_valid, y_valid),
    callbacks = [early_stopping, reduce_lr, logging],
    shuffle = True,
    verbose = 1
)

Epoch 1/50
375/375 [==============================] - 150s 400ms/step - loss: 0.8197 - acc: 0.7242 - val_loss: 0.5808 - val_acc: 0.8133
Epoch 2/50
375/375 [==============================] - 131s 348ms/step - loss: 0.5118 - acc: 0.8283 - val_loss: 0.4892 - val_acc: 0.8347
Epoch 3/50
375/375 [==============================] - 129s 343ms/step - loss: 0.3995 - acc: 0.8640 - val_loss: 0.4583 - val_acc: 0.8580

〜省略〜

Epoch 00016: ReduceLROnPlateau reducing learning rate to 1.0000000656873453e-06.
Epoch 17/50
375/375 [==============================] - 130s 346ms/step - loss: 0.0543 - acc: 0.9815 - val_loss: 0.3247 - val_acc: 0.9113
Epoch 18/50
375/375 [==============================] - 132s 352ms/step - loss: 0.0549 - acc: 0.9816 - val_loss: 0.3249 - val_acc: 0.9087
Epoch 19/50
375/375 [==============================] - 131s 350ms/step - loss: 0.0549 - acc: 0.9816 - val_loss: 0.3248 - val_acc: 0.9103

Epoch 00019: ReduceLROnPlateau reducing learning rate to 1.0000001111620805e-07.
Epoch 20/50
375/375 [==============================] - 134s 358ms/step - loss: 0.0587 - acc: 0.9798 - val_loss: 0.3231 - val_acc: 0.9110
Epoch 00020: early stopping
CPU times: user 1h 10min 1s, sys: 8min 15s, total: 1h 18min 17s
Wall time: 44min 27s

モデル保存

model_dir = './model/'
if os.path.exists(model_dir) == False:os.mkdir(model_dir)

model.save(model_dir + 'model.hdf5')

# optimizerのない軽量モデルを保存（学習や評価は不可だが、予測は可能）
model.save(model_dir + 'model-opt.hdf5', include_optimizer = False)

# ベストの重みのみ保存
model.save_weights(weights_dir + 'model_weight.hdf5')

学習曲線をプロット

plt.figure(figsize = (18,6))

# accuracy
plt.subplot(1, 2, 1)
plt.plot(hist.history["acc"], label = "acc", marker = "o")
plt.plot(hist.history["val_acc"], label = "val_acc", marker = "o")
# plt.xticks(np.arange())
# plt.yticks(np.arange())
plt.xlabel("epoch")
plt.ylabel("accuracy")
# plt.title("")
plt.legend(loc = "best")
plt.grid(color = 'gray', alpha = 0.2)

# loss
plt.subplot(1, 2, 2)
plt.plot(hist.history["loss"], label = "loss", marker = "o")
plt.plot(hist.history["val_loss"], label = "val_loss", marker = "o")
# plt.xticks(np.arange())
# plt.yticks(np.arange())
plt.xlabel("epoch")
plt.ylabel("loss")
# plt.title("")
plt.legend(loc = "best")
plt.grid(color = 'gray', alpha = 0.2)

plt.show()

モデル評価

score = model.evaluate(X_test, y_test, verbose=1)
print("evaluate loss: {0[0]}".format(score))
print("evaluate acc: {0[1]}".format(score))

3000/3000 [==============================] - 10s 3ms/step
evaluate loss: 0.3342444945573807
evaluate acc: 0.9063333333333333

モデル読み込み

model = load_model(model_dir + 'model.hdf5')

# optimaizerがないモデルの場合（予測のみに使用可能）
model = load_model(model_dir + 'model_opt.hdf5', compile = False)

モデル予測

labels = np.array([
   'airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck'
])

testデータ30件の画像と正解ラベルを出力

# testデータ30件の正解ラベル
true_classes = np.argmax(y_test[0:30], axis = 1)

# testデータ30件の画像と正解ラベルを出力
plt.figure(figsize = (16, 6))
for i in range(30):
    plt.subplot(3, 10, i + 1)
    plt.axis("off")
    plt.title(labels[true_classes[i]])
    plt.imshow(X_test[i])
plt.show()

testデータ30件の画像と予測ラベル＆予測確率を出力

# testデータ30件の予測ラベル
pred_classes=np.argmax(model.predict(X_test[0:30]),axis=1)

# テストデータ30件の予測確率
pred_probs = np.max(model.predict(X_test[0:30]),axis=1)
pred_probs = ['{:.4f}'.format(i) for i in pred_probs]

# testデータ30件の画像と予測ラベル＆予測確率を出力
plt.figure(figsize = (16, 6))
for i in range(30):
    plt.subplot(3, 10, i + 1)
    plt.axis("off")
    if pred_classes[i] == true_classes[i]:
        plt.title(labels[pred_classes[i]] + '\n' + pred_probs[i])
    else:
        plt.title(labels[pred_classes[i]] + '\n' + pred_probs[i], color = "red")
    plt.imshow(X_test[i])
plt.show()

モデル評価でaccuracy 90%、loss 0.33を計測。以前のKerasによるCNN実装の評価値と比較して、精度の改善がみられ、（data augmentationを実装したとはいえ、全データセットの30%しか学習させていないにもかかわらず）モデル予測も高い確率でクラス分類できている。ただlossの値が決していいとは言えず、過学習の傾向にある。freezeさせる層の数やoptimizer、augmentationなどをチューニングして検証する余地があるように思う。

今後は、InceptionV3以外の学習済みモデルでのFine-tuningや、MNISTやCIFAR-10以外のデータセットも試してみたい。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up