CapsNetとAveragePooling版とそしてSpatialPyramidPooling版を比較検討していると、。。。
なんとなく、それぞれを動かしているのが面倒になってきた。
ということで、3つのモデルを同時に動かすことを考えた。
で、なんとなくなにかできたので紹介する。
プログラム構造
メインモデルは以下のとおり
これは、keras / examples / cifar10_cnn_capsule.py を少し層を増やしたモデルです。そして、その元のオリジナルは以下の中国の方みたいですね。
Capsule Implement is from https://github.com/bojone/Capsule/
def model_cifar(input_image=Input(shape=(None, None, 3))):
# A common Conv2D model
x = Conv2D(64, (3, 3), activation='relu',padding='same')(input_image)
x = Conv2D(64, (3, 3), activation='relu',padding='same')(x)
x = BatchNormalization(axis=3)(x)
x = Dropout(0.5)(x)
x = AveragePooling2D((2, 2))(x)
x = Conv2D(128, (3, 3), activation='relu',padding='same')(x)
x = Conv2D(128, (3, 3), activation='relu',padding='same')(x)
x = BatchNormalization(axis=3)(x)
x = Dropout(0.5)(x)
x = AveragePooling2D((2, 2))(x)
x = Conv2D(256, (3, 3), activation='relu',padding='same')(x)
x = Conv2D(256, (3, 3), activation='relu',padding='same')(x)
#x = BatchNormalization(axis=3)(x)
x = Dropout(0.5)(x)
return x,input_image
モデル呼び出し3通りの出力処理
# SPP
x1,input_image1=model_cifar(input_image=Input(shape=(None, None, 3)))
x1 = SpatialPyramidPooling([1])(x1) #[1,2,4]
output1 = Dense(num_classes, activation='softmax')(x1)
# AveragePooling
x2,input_image2=model_cifar(input_image=Input(shape=(32, 32, 3)))
x2 = AveragePooling2D(pool_size=(2, 2), strides=None, border_mode='valid', dim_ordering='tf')(x2)
x2 = Flatten()(x2)
output2 = Dense(num_classes, activation='softmax')(x2)
# Capsule
x3,input_image3=model_cifar(input_image=Input(shape=(None, None, 3)))
x3 = Reshape((-1, 128))(x3)
capsule = Capsule(10, 96, 3, True)(x3) #16
output3 = Lambda(lambda z: K.sqrt(K.sum(K.square(z), 2)))(capsule)
モデル宣言
model1 = Model(inputs=input_image1, outputs=output1)
model2 = Model(inputs=input_image2, outputs=output2)
model3 = Model(inputs=input_image3, outputs=output3)
# we use a margin loss
model1.compile(loss=margin_loss, optimizer='adam', metrics=['accuracy'])
model2.compile(loss=margin_loss, optimizer='adam', metrics=['accuracy'])
model3.compile(loss=margin_loss, optimizer='adam', metrics=['accuracy'])
# model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model1.summary()
model2.summary()
model3.summary()
これを動かすと。。。以下のようなモデル構造を出力
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, None, None, 3) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, None, None, 64) 1792
_________________________________________________________________
conv2d_2 (Conv2D) (None, None, None, 64) 36928
_________________________________________________________________
batch_normalization_1 (Batch (None, None, None, 64) 256
_________________________________________________________________
dropout_1 (Dropout) (None, None, None, 64) 0
_________________________________________________________________
average_pooling2d_1 (Average (None, None, None, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, None, None, 128) 73856
_________________________________________________________________
conv2d_4 (Conv2D) (None, None, None, 128) 147584
_________________________________________________________________
batch_normalization_2 (Batch (None, None, None, 128) 512
_________________________________________________________________
dropout_2 (Dropout) (None, None, None, 128) 0
_________________________________________________________________
average_pooling2d_2 (Average (None, None, None, 128) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, None, None, 256) 295168
_________________________________________________________________
conv2d_6 (Conv2D) (None, None, None, 256) 590080
_________________________________________________________________
dropout_3 (Dropout) (None, None, None, 256) 0
_________________________________________________________________
spatial_pyramid_pooling_1 (S (None, 5376) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 53770
=================================================================
Total params: 1,199,946
Trainable params: 1,199,562
Non-trainable params: 384
_________________________________________________________________
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) (None, 32, 32, 3) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 32, 32, 64) 1792
_________________________________________________________________
conv2d_8 (Conv2D) (None, 32, 32, 64) 36928
_________________________________________________________________
batch_normalization_3 (Batch (None, 32, 32, 64) 256
_________________________________________________________________
dropout_4 (Dropout) (None, 32, 32, 64) 0
_________________________________________________________________
average_pooling2d_3 (Average (None, 16, 16, 64) 0
_________________________________________________________________
conv2d_9 (Conv2D) (None, 16, 16, 128) 73856
_________________________________________________________________
conv2d_10 (Conv2D) (None, 16, 16, 128) 147584
_________________________________________________________________
batch_normalization_4 (Batch (None, 16, 16, 128) 512
_________________________________________________________________
dropout_5 (Dropout) (None, 16, 16, 128) 0
_________________________________________________________________
average_pooling2d_4 (Average (None, 8, 8, 128) 0
_________________________________________________________________
conv2d_11 (Conv2D) (None, 8, 8, 256) 295168
_________________________________________________________________
conv2d_12 (Conv2D) (None, 8, 8, 256) 590080
_________________________________________________________________
dropout_6 (Dropout) (None, 8, 8, 256) 0
_________________________________________________________________
average_pooling2d_5 (Average (None, 4, 4, 256) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 4096) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 40970
=================================================================
Total params: 1,187,146
Trainable params: 1,186,762
Non-trainable params: 384
_________________________________________________________________
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) (None, None, None, 3) 0
_________________________________________________________________
conv2d_13 (Conv2D) (None, None, None, 64) 1792
_________________________________________________________________
conv2d_14 (Conv2D) (None, None, None, 64) 36928
_________________________________________________________________
batch_normalization_5 (Batch (None, None, None, 64) 256
_________________________________________________________________
dropout_7 (Dropout) (None, None, None, 64) 0
_________________________________________________________________
average_pooling2d_6 (Average (None, None, None, 64) 0
_________________________________________________________________
conv2d_15 (Conv2D) (None, None, None, 128) 73856
_________________________________________________________________
conv2d_16 (Conv2D) (None, None, None, 128) 147584
_________________________________________________________________
batch_normalization_6 (Batch (None, None, None, 128) 512
_________________________________________________________________
dropout_8 (Dropout) (None, None, None, 128) 0
_________________________________________________________________
average_pooling2d_7 (Average (None, None, None, 128) 0
_________________________________________________________________
conv2d_17 (Conv2D) (None, None, None, 256) 295168
_________________________________________________________________
conv2d_18 (Conv2D) (None, None, None, 256) 590080
_________________________________________________________________
dropout_9 (Dropout) (None, None, None, 256) 0
_________________________________________________________________
reshape_1 (Reshape) (None, None, 128) 0
_________________________________________________________________
capsule_1 (Capsule) (None, 10, 96) 122880
_________________________________________________________________
lambda_1 (Lambda) (None, 10) 0
=================================================================
Total params: 1,269,056
Trainable params: 1,268,672
Non-trainable params: 384
_________________________________________________________________
結果
Not using data augmentation.
*****j= 0
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 35s 694us/step - loss: 0.3911 - acc: 0.5035 - val_loss: 0.3821 - val_acc: 0.5240
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 589us/step - loss: 0.4079 - acc: 0.4789 - val_loss: 0.3943 - val_acc: 0.5071
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 38s 764us/step - loss: 0.4780 - acc: 0.3260 - val_loss: 0.4251 - val_acc: 0.3842
*****j= 1
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 32s 646us/step - loss: 0.2779 - acc: 0.6537 - val_loss: 0.2571 - val_acc: 0.6793
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 578us/step - loss: 0.2888 - acc: 0.6391 - val_loss: 0.3097 - val_acc: 0.6087
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 38s 754us/step - loss: 0.3588 - acc: 0.4833 - val_loss: 0.3383 - val_acc: 0.5223
*****j= 2
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 32s 649us/step - loss: 0.2315 - acc: 0.7109 - val_loss: 0.2631 - val_acc: 0.6706
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 583us/step - loss: 0.2414 - acc: 0.6993 - val_loss: 0.2923 - val_acc: 0.6395
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 38s 751us/step - loss: 0.3023 - acc: 0.5718 - val_loss: 0.3411 - val_acc: 0.5316
*****j= 3
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 32s 649us/step - loss: 0.2044 - acc: 0.7456 - val_loss: 0.2445 - val_acc: 0.7010
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 575us/step - loss: 0.2099 - acc: 0.7408 - val_loss: 0.2086 - val_acc: 0.7432
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 38s 752us/step - loss: 0.2595 - acc: 0.6397 - val_loss: 0.2517 - val_acc: 0.6451
因みに、
# SPP
x,input_image1=model_cifar(input_image=Input(shape=(32, 32, 3)))
x1 = SpatialPyramidPooling([1])(x) #[1,2,4]
output1 = Dense(num_classes, activation='softmax')(x1)
# AveragePooling
# x2,input_image2=model_cifar(input_image=Input(shape=(32, 32, 3)))
x2 = AveragePooling2D(pool_size=(2, 2), strides=None, border_mode='valid', dim_ordering='tf')(x)
x2 = Flatten()(x2)
output2 = Dense(num_classes, activation='softmax')(x2)
# Capsule
# x3,input_image3=model_cifar(input_image=Input(shape=(None, None, 3)))
x3 = Reshape((-1, 128))(x)
capsule = Capsule(10, 96, 3, True)(x3) #16
output3 = Lambda(lambda z: K.sqrt(K.sum(K.square(z), 2)))(capsule)
model1 = Model(inputs=input_image1, outputs=output1)
model2 = Model(inputs=input_image1, outputs=output2)
model3 = Model(inputs=input_image1, outputs=output3)
のようにmodel_cifar関数呼び出しを一回で代替しようとすると、。。。以下のようになりました。
Not using data augmentation.
*****j= 0
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 35s 690us/step - loss: 0.4076 - acc: 0.4796 - val_loss: 0.4171 - val_acc: 0.4970
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 586us/step - loss: 0.3161 - acc: 0.6046 - val_loss: 0.2713 - val_acc: 0.6659
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 38s 756us/step - loss: nan - acc: 0.1011 - val_loss: nan - val_acc: 0.1000
*****j= 1
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 32s 647us/step - loss: 0.6400 - acc: 0.0982 - val_loss: 0.6400 - val_acc: 0.1000
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 573us/step - loss: 0.6400 - acc: 0.0977 - val_loss: 0.6400 - val_acc: 0.1000
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 37s 745us/step - loss: nan - acc: 0.1000 - val_loss: nan - val_acc: 0.1000
*****j= 2
これは、テンソルのパラメータが3つのモデルで共通化されてしまい、CapsNetでうまく収束しなくなったためのようです。
ということで、上の別々に記載する形式で3つのモデルを同時比較したいと思います。
コードは、以下に置きました
cifar10_cnn_capsule_alt.py
動かすには、同じところに置いてありますが、Capsule.pyとSpatialPyramidPooling.pyが必要です
そして、100回回した結果
*****j= 90
Epoch 1/1
391/391 [==============================] - 32s 81ms/step - loss: 0.0736 - acc: 0.9123 - val_loss: 0.1039 - val_acc: 0.8839
Epoch 1/1
391/391 [==============================] - 28s 72ms/step - loss: 0.0682 - acc: 0.9174 - val_loss: 0.0971 - val_acc: 0.8917
Epoch 1/1
391/391 [==============================] - 37s 94ms/step - loss: 0.0652 - acc: 0.9255 - val_loss: 0.0896 - val_acc: 0.8961
*****j= 97
Epoch 1/1
391/391 [==============================] - 32s 81ms/step - loss: 0.0705 - acc: 0.9149 - val_loss: 0.0991 - val_acc: 0.8866
Epoch 1/1
391/391 [==============================] - 28s 72ms/step - loss: 0.0630 - acc: 0.9235 - val_loss: 0.1006 - val_acc: 0.8872
Epoch 1/1
391/391 [==============================] - 37s 94ms/step - loss: 0.0630 - acc: 0.9291 - val_loss: 0.0934 - val_acc: 0.8945
*****j= 98
Epoch 1/1
391/391 [==============================] - 32s 81ms/step - loss: 0.0693 - acc: 0.9163 - val_loss: 0.1010 - val_acc: 0.8856
Epoch 1/1
391/391 [==============================] - 28s 73ms/step - loss: 0.0637 - acc: 0.9232 - val_loss: 0.1153 - val_acc: 0.8740
Epoch 1/1
391/391 [==============================] - 37s 94ms/step - loss: 0.0633 - acc: 0.9286 - val_loss: 0.0858 - val_acc: 0.9019
*****j= 99
Epoch 1/1
391/391 [==============================] - 32s 82ms/step - loss: 0.0676 - acc: 0.9191 - val_loss: 0.1062 - val_acc: 0.8799
Epoch 1/1
391/391 [==============================] - 28s 72ms/step - loss: 0.0629 - acc: 0.9252 - val_loss: 0.1087 - val_acc: 0.8794
Epoch 1/1
391/391 [==============================] - 37s 95ms/step - loss: 0.0630 - acc: 0.9298 - val_loss: 0.0926 - val_acc: 0.8959
つまり、100epoch回したところで最後のCapsNetモデルが最大精度90.19%で一番、SPP[1,2,4]は最大88.66%、そして通常のAveragePooling版は最大89.17%という結果となりました。
これで論文の90%超え達成で、CapsNetの効果はあるのかなという結果となりました。