More than 5 years have passed since last update.

【 CIFAR-100】 - Object Recognition in Imagesで入力サイズのUpconversionによりVGG16モデルで７８．９８％♬

Posted at 2019-04-05

前回のCifar10に引き続いて、Cifar100のObject Recognitionをやってみた。
Cifar100の精度は２年前のKaggleの結果は悪く、参考のとおりトップが0.72360程度である。
一方、最近のデータは参考②に掲載されており、トップデータは９１．３％となっており、非常に躍進した。
今回、入力画像サイズを変えることにより、精度が大きく変わることを示す。
そして、（160,160，3）の画像サイズのとき、７８％を超える精度を得たので報告する。

【参考】
・①CIFAR-100: object recognition@kaggle
・②CIFAR-100 link Classify 32x32 colour images into 100 categories.@Benchmarks.AI

Cifar100について

以下のようなカテゴリである。
※以下の文章は引用先からの引用です
The images and labels are all taken from the CIFAR-100 dataset which was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The 100 object class labels are

beaver, dolphin, otter, seal, whale, 
aquarium fish, flatfish, ray, shark, trout, 
orchids, poppies, roses, sunflowers, tulips, 
bottles, bowls, cans, cups, plates, 
apples, mushrooms, oranges, pears, sweet peppers, 
clock, computer keyboard, lamp, telephone, television, 
bed, chair, couch, table, wardrobe, 
bee, beetle, butterfly, caterpillar, cockroach, 
bear, leopard, lion, tiger, wolf, 
bridge, castle, house, road, skyscraper, 
cloud, forest, mountain, plain, sea, 
camel, cattle, chimpanzee, elephant, kangaroo, 
fox, porcupine, possum, raccoon, skunk, 
crab, lobster, snail, spider, worm, 
baby, boy, girl, man, woman, 
crocodile, dinosaur, lizard, snake, turtle, 
hamster, mouse, rabbit, shrew, squirrel, 
maple, oak, palm, pine, willow, 
bicycle, bus, motorcycle, pickup truck, train, 
lawn-mower, rocket, streetcar, tank, tractor

そして、さらに５つずつまとまって、２０のスーパークラスに分類されている。１クラスは６００枚の画像を含んでいる。
※以下の文章は引用先からの引用です
This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).
Here is the list of classes in the CIFAR-100:

Superclass	Classes
aquatic mammals	beaver, dolphin, otter, seal, whale
fish	aquarium fish, flatfish, ray, shark, trout
flowers	orchids, poppies, roses, sunflowers, tulips
food containers	bottles, bowls, cans, cups, plates
fruit and vegetables	apples, mushrooms, oranges, pears, sweet peppers
household electrical devices	clock, computer keyboard, lamp, telephone, television
household furniture	bed, chair, couch, table, wardrobe
insects	bee, beetle, butterfly, caterpillar, cockroach
large carnivores	bear, leopard, lion, tiger, wolf
large man-made outdoor things	bridge, castle, house, road, skyscraper

Superclass	Classes
large natural outdoor scenes	cloud, forest, mountain, plain, sea
large omnivores and herbivores	camel, cattle, chimpanzee, elephant, kangaroo
medium-sized mammals	fox, porcupine, possum, raccoon, skunk
non-insect invertebrates	crab, lobster, snail, spider, worm
people	baby, boy, girl, man, woman
reptiles	crocodile, dinosaur, lizard, snake, turtle
small mammals	hamster, mouse, rabbit, shrew, squirrel
trees	maple, oak, palm, pine, willow
vehicles 1	bicycle, bus, motorcycle, pickup truck, train
vehicles 2	lawn-mower, rocket, streetcar, tank, tractor

（32，32，3）の学習

これが本来の入力画像に対する精度で、0.6436でした。
もう少し、ハイパーパラメータを工夫すれば少しは向上すると思いますが、大改善は難しそうです。
また、lossのグラフを見ると40epochあたりからVal_lossが増加に転じており、それ以降はVal_accは若干増加していますが、過学習が発生している可能性があります。

(32,32,3)
95	0.213846	0.940702	2.231077	0.637400
96	0.213215	0.941643	2.191432	0.636500
97	0.201016	0.943844	2.167689	0.643600
98	0.196744	0.945065	2.363087	0.629900
99	0.194216	0.946366	2.288370	0.625900
Epoch 100/100
1562/1562 [==============================] - 91s 58ms/step - loss: 0.1943 - acc: 0.9464 - val_loss: 2.2884 - val_acc: 0.6259
i, ir=  0 1e-05

(64,64,3)の学習

この場合もほぼ同様の現象になっています。最高精度は0.742800でした。

(64,64,3)
95	0.063527	0.983509	1.792958	0.736600
96	0.063124	0.983129	1.746459	0.739500
97	0.060453	0.983650	1.787388	0.732500
98	0.066721	0.982554	1.741670	0.742800
99	0.058414	0.984625	1.781311	0.736700
Epoch 100/100
1562/1562 [==============================] - 144s 92ms/step - loss: 0.0584 - acc: 0.9846 - val_loss: 1.7813 - val_acc: 0.7367
i, ir=  0 1e-05

(160,160,3)の学習

最初の30個のデータがメモリー不足で失われてしまいましたが、傾向は上記と同じです。
そして、ここで今回の最高精度0.7898を得ました。
学習時間もここは350sec/epochなので、130epochで12時間以上かかりました。

(160,160,3)
89	0.037111	0.988673	1.498466	0.788700
90	0.037658	0.988593	1.526851	0.779500
91	0.034657	0.989493	1.473778	0.785800
92	0.033044	0.990354	1.510877	0.787400
93	0.034193	0.989253	1.482770	0.789800
94	0.032355	0.989934	1.482628	0.784900
95	0.031635	0.990474	1.485976	0.786200
96	0.031583	0.990614	1.516047	0.783900
97	0.033982	0.990054	1.497305	0.787200
98	0.032040	0.990614	1.519278	0.785000
99	0.030907	0.990974	1.505199	0.787200
1562/1562 [==============================] - 349s 223ms/step - loss: 0.0342 - acc: 0.9893 - val_loss: 1.4828 - val_acc: 0.7898

VGG16 入力サイズ依存性

ー	(32,32,3)	(64,64,3)	(160,160,3)
Val_acc	0.6436	0.742800	0.7898
error%	36.64	25.72	21.02

utils

・Logをまとめるツール
　　⇒学習時にappendしておく方が楽です
・lossとaccをグラフにするツール
　　⇒グラフの縦軸はどっちがいいか（二軸を使う）は何を見たいかによります
　　　苦労の跡を残しています。だんだんこのツールでLogを解析して思い通りに出力できるようになってきました

まとめ

・Cifar100に対してVGG16の入力サイズを拡大(160,160,3)するだけで78.98%を得た
・過学習がありそうなので、適用は注意したい

・参考を見るとWideResnetがCifar100でも80%の精度を出しているので、入力サイズを大きくして実施したいと思う
・Cifar10、100ともに、どういう画像を失敗しているか明らかにし、それらを再学習することにより精度向上したいと思う
・Cifar100は20のサブグループを持っているので、その観点で段階的な識別による精度向上をやってみようと思う

おまけ

Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 32, 32, 3)         0
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 8, 8, 256)         295168
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 8, 8, 256)         590080
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 8, 8, 256)         590080
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 4, 4, 256)         0
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 4, 4, 512)         1180160
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 4, 4, 512)         2359808
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 4, 4, 512)         2359808
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 2, 2, 512)         0
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 2, 2, 512)         2359808
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 2, 2, 512)         2359808
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 2, 2, 512)         2359808
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0
_________________________________________________________________
sequential_1 (Sequential)    (None, 100)               157028
=================================================================
Total params: 14,871,716
Trainable params: 14,871,716
Non-trainable params: 0
_________________________________________________________________
Using real-time data augmentation.

Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 64, 64, 3)         0
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 64, 64, 64)        1792
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 64, 64, 64)        36928
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 32, 32, 64)        0
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 32, 32, 128)       73856
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 32, 32, 128)       147584
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 16, 16, 128)       0
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 16, 16, 256)       295168
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 16, 16, 256)       590080
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 16, 16, 256)       590080
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 8, 8, 256)         0
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 8, 8, 512)         1180160
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 8, 8, 512)         2359808
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 8, 8, 512)         2359808
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 4, 4, 512)         0
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 4, 4, 512)         2359808
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 4, 4, 512)         2359808
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 4, 4, 512)         2359808
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 2, 2, 512)         0
_________________________________________________________________
sequential_1 (Sequential)    (None, 100)               550244
=================================================================
Total params: 15,264,932
Trainable params: 15,264,932
Non-trainable params: 0
_________________________________________________________________
Using real-time data augmentation.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up