CapsNetとAveragePooling版とそしてSpatialPyramidPooling版を比較検討していると、。。。

なんとなく、それぞれを動かしているのが面倒になってきた。

ということで、3つのモデルを同時に動かすことを考えた。

で、なんとなくなにかできたので紹介する。

プログラム構造

メインモデルは以下のとおり
これは、keras / examples / cifar10_cnn_capsule.py を少し層を増やしたモデルです。そして、その元のオリジナルは以下の中国の方みたいですね。
Capsule Implement is from https://github.com/bojone/Capsule/

def model_cifar(input_image=Input(shape=(None, None, 3))):
# A common Conv2D model
    x = Conv2D(64, (3, 3), activation='relu',padding='same')(input_image)
    x = Conv2D(64, (3, 3), activation='relu',padding='same')(x)
    x = BatchNormalization(axis=3)(x)  
    x = Dropout(0.5)(x)                
    x = AveragePooling2D((2, 2))(x)
    x = Conv2D(128, (3, 3), activation='relu',padding='same')(x)
    x = Conv2D(128, (3, 3), activation='relu',padding='same')(x)
    x = BatchNormalization(axis=3)(x)  
    x = Dropout(0.5)(x)                
    x = AveragePooling2D((2, 2))(x)    
    x = Conv2D(256, (3, 3), activation='relu',padding='same')(x)  
    x = Conv2D(256, (3, 3), activation='relu',padding='same')(x)  
    #x = BatchNormalization(axis=3)(x)  
    x = Dropout(0.5)(x)                
    return x,input_image

モデル呼び出し3通りの出力処理

# SPP
x1,input_image1=model_cifar(input_image=Input(shape=(None, None, 3)))
x1 = SpatialPyramidPooling([1])(x1)    #[1,2,4]
output1 = Dense(num_classes, activation='softmax')(x1)

# AveragePooling
x2,input_image2=model_cifar(input_image=Input(shape=(32, 32, 3)))
x2 = AveragePooling2D(pool_size=(2, 2), strides=None, border_mode='valid', dim_ordering='tf')(x2)
x2 = Flatten()(x2)
output2 = Dense(num_classes, activation='softmax')(x2)

# Capsule
x3,input_image3=model_cifar(input_image=Input(shape=(None, None, 3)))
x3 = Reshape((-1, 128))(x3)
capsule = Capsule(10, 96, 3, True)(x3)  #16
output3 = Lambda(lambda z: K.sqrt(K.sum(K.square(z), 2)))(capsule)

モデル宣言

model1 = Model(inputs=input_image1, outputs=output1)
model2 = Model(inputs=input_image2, outputs=output2)
model3 = Model(inputs=input_image3, outputs=output3)

# we use a margin loss
model1.compile(loss=margin_loss, optimizer='adam', metrics=['accuracy'])
model2.compile(loss=margin_loss, optimizer='adam', metrics=['accuracy'])
model3.compile(loss=margin_loss, optimizer='adam', metrics=['accuracy'])
#model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model1.summary()
model2.summary()
model3.summary()

これを動かすと。。。以下のようなモデル構造を出力

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_2 (InputLayer)         (None, None, None, 3)     0
_________________________________________________________________
conv2d_1 (Conv2D)            (None, None, None, 64)    1792
_________________________________________________________________
conv2d_2 (Conv2D)            (None, None, None, 64)    36928
_________________________________________________________________
batch_normalization_1 (Batch (None, None, None, 64)    256
_________________________________________________________________
dropout_1 (Dropout)          (None, None, None, 64)    0
_________________________________________________________________
average_pooling2d_1 (Average (None, None, None, 64)    0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, None, None, 128)   73856
_________________________________________________________________
conv2d_4 (Conv2D)            (None, None, None, 128)   147584
_________________________________________________________________
batch_normalization_2 (Batch (None, None, None, 128)   512
_________________________________________________________________
dropout_2 (Dropout)          (None, None, None, 128)   0
_________________________________________________________________
average_pooling2d_2 (Average (None, None, None, 128)   0
_________________________________________________________________
conv2d_5 (Conv2D)            (None, None, None, 256)   295168
_________________________________________________________________
conv2d_6 (Conv2D)            (None, None, None, 256)   590080
_________________________________________________________________
dropout_3 (Dropout)          (None, None, None, 256)   0
_________________________________________________________________
spatial_pyramid_pooling_1 (S (None, 5376)              0
_________________________________________________________________
dense_1 (Dense)              (None, 10)                53770
=================================================================
Total params: 1,199,946
Trainable params: 1,199,562
Non-trainable params: 384
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_3 (InputLayer)         (None, 32, 32, 3)         0
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 32, 32, 64)        1792
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 32, 32, 64)        36928
_________________________________________________________________
batch_normalization_3 (Batch (None, 32, 32, 64)        256
_________________________________________________________________
dropout_4 (Dropout)          (None, 32, 32, 64)        0
_________________________________________________________________
average_pooling2d_3 (Average (None, 16, 16, 64)        0
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 16, 16, 128)       73856
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 16, 16, 128)       147584
_________________________________________________________________
batch_normalization_4 (Batch (None, 16, 16, 128)       512
_________________________________________________________________
dropout_5 (Dropout)          (None, 16, 16, 128)       0
_________________________________________________________________
average_pooling2d_4 (Average (None, 8, 8, 128)         0
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 8, 8, 256)         295168
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 8, 8, 256)         590080
_________________________________________________________________
dropout_6 (Dropout)          (None, 8, 8, 256)         0
_________________________________________________________________
average_pooling2d_5 (Average (None, 4, 4, 256)         0
_________________________________________________________________
flatten_1 (Flatten)          (None, 4096)              0
_________________________________________________________________
dense_2 (Dense)              (None, 10)                40970
=================================================================
Total params: 1,187,146
Trainable params: 1,186,762
Non-trainable params: 384
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_4 (InputLayer)         (None, None, None, 3)     0
_________________________________________________________________
conv2d_13 (Conv2D)           (None, None, None, 64)    1792
_________________________________________________________________
conv2d_14 (Conv2D)           (None, None, None, 64)    36928
_________________________________________________________________
batch_normalization_5 (Batch (None, None, None, 64)    256
_________________________________________________________________
dropout_7 (Dropout)          (None, None, None, 64)    0
_________________________________________________________________
average_pooling2d_6 (Average (None, None, None, 64)    0
_________________________________________________________________
conv2d_15 (Conv2D)           (None, None, None, 128)   73856
_________________________________________________________________
conv2d_16 (Conv2D)           (None, None, None, 128)   147584
_________________________________________________________________
batch_normalization_6 (Batch (None, None, None, 128)   512
_________________________________________________________________
dropout_8 (Dropout)          (None, None, None, 128)   0
_________________________________________________________________
average_pooling2d_7 (Average (None, None, None, 128)   0
_________________________________________________________________
conv2d_17 (Conv2D)           (None, None, None, 256)   295168
_________________________________________________________________
conv2d_18 (Conv2D)           (None, None, None, 256)   590080
_________________________________________________________________
dropout_9 (Dropout)          (None, None, None, 256)   0
_________________________________________________________________
reshape_1 (Reshape)          (None, None, 128)         0
_________________________________________________________________
capsule_1 (Capsule)          (None, 10, 96)            122880
_________________________________________________________________
lambda_1 (Lambda)            (None, 10)                0
=================================================================
Total params: 1,269,056
Trainable params: 1,268,672
Non-trainable params: 384
_________________________________________________________________

結果

Not using data augmentation.
*****j=  0
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 35s 694us/step - loss: 0.3911 - acc: 0.5035 - val_loss: 0.3821 - val_acc: 0.5240
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 589us/step - loss: 0.4079 - acc: 0.4789 - val_loss: 0.3943 - val_acc: 0.5071
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 38s 764us/step - loss: 0.4780 - acc: 0.3260 - val_loss: 0.4251 - val_acc: 0.3842
*****j=  1
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 32s 646us/step - loss: 0.2779 - acc: 0.6537 - val_loss: 0.2571 - val_acc: 0.6793
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 578us/step - loss: 0.2888 - acc: 0.6391 - val_loss: 0.3097 - val_acc: 0.6087
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 38s 754us/step - loss: 0.3588 - acc: 0.4833 - val_loss: 0.3383 - val_acc: 0.5223
*****j=  2
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 32s 649us/step - loss: 0.2315 - acc: 0.7109 - val_loss: 0.2631 - val_acc: 0.6706
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 583us/step - loss: 0.2414 - acc: 0.6993 - val_loss: 0.2923 - val_acc: 0.6395
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 38s 751us/step - loss: 0.3023 - acc: 0.5718 - val_loss: 0.3411 - val_acc: 0.5316
*****j=  3
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 32s 649us/step - loss: 0.2044 - acc: 0.7456 - val_loss: 0.2445 - val_acc: 0.7010
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 575us/step - loss: 0.2099 - acc: 0.7408 - val_loss: 0.2086 - val_acc: 0.7432
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 38s 752us/step - loss: 0.2595 - acc: 0.6397 - val_loss: 0.2517 - val_acc: 0.6451

因みに、

# SPP
x,input_image1=model_cifar(input_image=Input(shape=(32, 32, 3)))
x1 = SpatialPyramidPooling([1])(x)    #[1,2,4]
output1 = Dense(num_classes, activation='softmax')(x1)

# AveragePooling
# x2,input_image2=model_cifar(input_image=Input(shape=(32, 32, 3)))
x2 = AveragePooling2D(pool_size=(2, 2), strides=None, border_mode='valid', dim_ordering='tf')(x)
x2 = Flatten()(x2)
output2 = Dense(num_classes, activation='softmax')(x2)

# Capsule
# x3,input_image3=model_cifar(input_image=Input(shape=(None, None, 3)))
x3 = Reshape((-1, 128))(x)
capsule = Capsule(10, 96, 3, True)(x3)  #16
output3 = Lambda(lambda z: K.sqrt(K.sum(K.square(z), 2)))(capsule)

model1 = Model(inputs=input_image1, outputs=output1)
model2 = Model(inputs=input_image1, outputs=output2)
model3 = Model(inputs=input_image1, outputs=output3)

のようにmodel_cifar関数呼び出しを一回で代替しようとすると、。。。以下のようになりました。

Not using data augmentation.
*****j=  0
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 35s 690us/step - loss: 0.4076 - acc: 0.4796 - val_loss: 0.4171 - val_acc: 0.4970
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 586us/step - loss: 0.3161 - acc: 0.6046 - val_loss: 0.2713 - val_acc: 0.6659
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 38s 756us/step - loss: nan - acc: 0.1011 - val_loss: nan - val_acc: 0.1000
*****j=  1
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 32s 647us/step - loss: 0.6400 - acc: 0.0982 - val_loss: 0.6400 - val_acc: 0.1000
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 29s 573us/step - loss: 0.6400 - acc: 0.0977 - val_loss: 0.6400 - val_acc: 0.1000
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 37s 745us/step - loss: nan - acc: 0.1000 - val_loss: nan - val_acc: 0.1000
*****j=  2

これは、テンソルのパラメータが3つのモデルで共通化されてしまい、CapsNetでうまく収束しなくなったためのようです。
ということで、上の別々に記載する形式で3つのモデルを同時比較したいと思います。

コードは、以下に置きました
cifar10_cnn_capsule_alt.py
動かすには、同じところに置いてありますが、Capsule.pyとSpatialPyramidPooling.pyが必要です

そして、100回回した結果

*****j=  90
Epoch 1/1
391/391 [==============================] - 32s 81ms/step - loss: 0.0736 - acc: 0.9123 - val_loss: 0.1039 - val_acc: 0.8839
Epoch 1/1
391/391 [==============================] - 28s 72ms/step - loss: 0.0682 - acc: 0.9174 - val_loss: 0.0971 - val_acc: 0.8917
Epoch 1/1
391/391 [==============================] - 37s 94ms/step - loss: 0.0652 - acc: 0.9255 - val_loss: 0.0896 - val_acc: 0.8961
*****j=  97
Epoch 1/1
391/391 [==============================] - 32s 81ms/step - loss: 0.0705 - acc: 0.9149 - val_loss: 0.0991 - val_acc: 0.8866
Epoch 1/1
391/391 [==============================] - 28s 72ms/step - loss: 0.0630 - acc: 0.9235 - val_loss: 0.1006 - val_acc: 0.8872
Epoch 1/1
391/391 [==============================] - 37s 94ms/step - loss: 0.0630 - acc: 0.9291 - val_loss: 0.0934 - val_acc: 0.8945
*****j=  98
Epoch 1/1
391/391 [==============================] - 32s 81ms/step - loss: 0.0693 - acc: 0.9163 - val_loss: 0.1010 - val_acc: 0.8856
Epoch 1/1
391/391 [==============================] - 28s 73ms/step - loss: 0.0637 - acc: 0.9232 - val_loss: 0.1153 - val_acc: 0.8740
Epoch 1/1
391/391 [==============================] - 37s 94ms/step - loss: 0.0633 - acc: 0.9286 - val_loss: 0.0858 - val_acc: 0.9019
*****j=  99
Epoch 1/1
391/391 [==============================] - 32s 82ms/step - loss: 0.0676 - acc: 0.9191 - val_loss: 0.1062 - val_acc: 0.8799
Epoch 1/1
391/391 [==============================] - 28s 72ms/step - loss: 0.0629 - acc: 0.9252 - val_loss: 0.1087 - val_acc: 0.8794
Epoch 1/1
391/391 [==============================] - 37s 95ms/step - loss: 0.0630 - acc: 0.9298 - val_loss: 0.0926 - val_acc: 0.8959

つまり、100epoch回したところで最後のCapsNetモデルが最大精度90.19%で一番、SPP[1,2,4]は最大88.66%、そして通常のAveragePooling版は最大89.17%という結果となりました。

これで論文の90%超え達成で、CapsNetの効果はあるのかなという結果となりました。

Sign up for free and join this conversation.
Sign Up
If you already have a Qiita account log in.