1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

【 CIFAR-100】 - Object Recognition in Imagesで入力サイズのUpconversionによりVGG16モデルで78.98%♬

Posted at

前回のCifar10に引き続いて、Cifar100のObject Recognitionをやってみた。
Cifar100の精度は2年前のKaggleの結果は悪く、参考のとおりトップが0.72360程度である。
一方、最近のデータは参考②に掲載されており、トップデータは91.3%となっており、非常に躍進した。
今回、入力画像サイズを変えることにより、精度が大きく変わることを示す。
そして、(160,160,3)の画像サイズのとき、78%を超える精度を得たので報告する。

【参考】
・①CIFAR-100: object recognition@kaggle
・②CIFAR-100 link Classify 32x32 colour images into 100 categories.@Benchmarks.AI

Cifar100について

以下のようなカテゴリである。
※以下の文章は引用先からの引用です
The images and labels are all taken from the CIFAR-100 dataset which was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The 100 object class labels are

beaver, dolphin, otter, seal, whale, 
aquarium fish, flatfish, ray, shark, trout, 
orchids, poppies, roses, sunflowers, tulips, 
bottles, bowls, cans, cups, plates, 
apples, mushrooms, oranges, pears, sweet peppers, 
clock, computer keyboard, lamp, telephone, television, 
bed, chair, couch, table, wardrobe, 
bee, beetle, butterfly, caterpillar, cockroach, 
bear, leopard, lion, tiger, wolf, 
bridge, castle, house, road, skyscraper, 
cloud, forest, mountain, plain, sea, 
camel, cattle, chimpanzee, elephant, kangaroo, 
fox, porcupine, possum, raccoon, skunk, 
crab, lobster, snail, spider, worm, 
baby, boy, girl, man, woman, 
crocodile, dinosaur, lizard, snake, turtle, 
hamster, mouse, rabbit, shrew, squirrel, 
maple, oak, palm, pine, willow, 
bicycle, bus, motorcycle, pickup truck, train, 
lawn-mower, rocket, streetcar, tank, tractor

そして、さらに5つずつまとまって、20のスーパークラスに分類されている。1クラスは600枚の画像を含んでいる。
※以下の文章は引用先からの引用です
This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).
Here is the list of classes in the CIFAR-100:

Superclass Classes
aquatic mammals beaver, dolphin, otter, seal, whale
fish aquarium fish, flatfish, ray, shark, trout
flowers orchids, poppies, roses, sunflowers, tulips
food containers bottles, bowls, cans, cups, plates
fruit and vegetables apples, mushrooms, oranges, pears, sweet peppers
household electrical devices clock, computer keyboard, lamp, telephone, television
household furniture bed, chair, couch, table, wardrobe
insects bee, beetle, butterfly, caterpillar, cockroach
large carnivores bear, leopard, lion, tiger, wolf
large man-made outdoor things bridge, castle, house, road, skyscraper
Superclass Classes
large natural outdoor scenes cloud, forest, mountain, plain, sea
large omnivores and herbivores camel, cattle, chimpanzee, elephant, kangaroo
medium-sized mammals fox, porcupine, possum, raccoon, skunk
non-insect invertebrates crab, lobster, snail, spider, worm
people baby, boy, girl, man, woman
reptiles crocodile, dinosaur, lizard, snake, turtle
small mammals hamster, mouse, rabbit, shrew, squirrel
trees maple, oak, palm, pine, willow
vehicles 1 bicycle, bus, motorcycle, pickup truck, train
vehicles 2 lawn-mower, rocket, streetcar, tank, tractor

(32,32,3)の学習

これが本来の入力画像に対する精度で、0.6436でした。
もう少し、ハイパーパラメータを工夫すれば少しは向上すると思いますが、大改善は難しそうです。
また、lossのグラフを見ると40epochあたりからVal_lossが増加に転じており、それ以降はVal_accは若干増加していますが、過学習が発生している可能性があります。
Figure_1_vgg16_32_x1_cifar100_loss.png
Figure_2_vgg16_32_cifar100_x1_acc.png

(32,32,3)
95	0.213846	0.940702	2.231077	0.637400
96	0.213215	0.941643	2.191432	0.636500
97	0.201016	0.943844	2.167689	0.643600
98	0.196744	0.945065	2.363087	0.629900
99	0.194216	0.946366	2.288370	0.625900
Epoch 100/100
1562/1562 [==============================] - 91s 58ms/step - loss: 0.1943 - acc: 0.9464 - val_loss: 2.2884 - val_acc: 0.6259
i, ir=  0 1e-05

(64,64,3)の学習

この場合もほぼ同様の現象になっています。最高精度は0.742800でした。
Figure_1_vgg16_64_x1_cifar100_loss.png
Figure_2_vgg16_64_x1_cifar100_acc.png

(64,64,3)
95	0.063527	0.983509	1.792958	0.736600
96	0.063124	0.983129	1.746459	0.739500
97	0.060453	0.983650	1.787388	0.732500
98	0.066721	0.982554	1.741670	0.742800
99	0.058414	0.984625	1.781311	0.736700
Epoch 100/100
1562/1562 [==============================] - 144s 92ms/step - loss: 0.0584 - acc: 0.9846 - val_loss: 1.7813 - val_acc: 0.7367
i, ir=  0 1e-05

(160,160,3)の学習

最初の30個のデータがメモリー不足で失われてしまいましたが、傾向は上記と同じです。
そして、ここで今回の最高精度0.7898を得ました。
学習時間もここは350sec/epochなので、130epochで12時間以上かかりました。
Figure_1_vgg16_160_x1_cifar100_loss.png
Figure_2_vgg16_160_cifar100_acc.png

(160,160,3)
89	0.037111	0.988673	1.498466	0.788700
90	0.037658	0.988593	1.526851	0.779500
91	0.034657	0.989493	1.473778	0.785800
92	0.033044	0.990354	1.510877	0.787400
93	0.034193	0.989253	1.482770	0.789800
94	0.032355	0.989934	1.482628	0.784900
95	0.031635	0.990474	1.485976	0.786200
96	0.031583	0.990614	1.516047	0.783900
97	0.033982	0.990054	1.497305	0.787200
98	0.032040	0.990614	1.519278	0.785000
99	0.030907	0.990974	1.505199	0.787200
1562/1562 [==============================] - 349s 223ms/step - loss: 0.0342 - acc: 0.9893 - val_loss: 1.4828 - val_acc: 0.7898

VGG16 入力サイズ依存性

(32,32,3) (64,64,3) (160,160,3)
Val_acc 0.6436 0.742800 0.7898
error% 36.64 25.72 21.02

utils

Logをまとめるツール
  ⇒学習時にappendしておく方が楽です
lossとaccをグラフにするツール
  ⇒グラフの縦軸はどっちがいいか(二軸を使う)は何を見たいかによります
   苦労の跡を残しています。だんだんこのツールでLogを解析して思い通りに出力できるようになってきました

まとめ

・Cifar100に対してVGG16の入力サイズを拡大(160,160,3)するだけで78.98%を得た
・過学習がありそうなので、適用は注意したい

・参考を見るとWideResnetがCifar100でも80%の精度を出しているので、入力サイズを大きくして実施したいと思う
・Cifar10、100ともに、どういう画像を失敗しているか明らかにし、それらを再学習することにより精度向上したいと思う
・Cifar100は20のサブグループを持っているので、その観点で段階的な識別による精度向上をやってみようと思う

おまけ

Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 32, 32, 3)         0
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 8, 8, 256)         295168
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 8, 8, 256)         590080
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 8, 8, 256)         590080
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 4, 4, 256)         0
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 4, 4, 512)         1180160
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 4, 4, 512)         2359808
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 4, 4, 512)         2359808
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 2, 2, 512)         0
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 2, 2, 512)         2359808
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 2, 2, 512)         2359808
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 2, 2, 512)         2359808
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0
_________________________________________________________________
sequential_1 (Sequential)    (None, 100)               157028
=================================================================
Total params: 14,871,716
Trainable params: 14,871,716
Non-trainable params: 0
_________________________________________________________________
Using real-time data augmentation.
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 64, 64, 3)         0
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 64, 64, 64)        1792
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 64, 64, 64)        36928
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 32, 32, 64)        0
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 32, 32, 128)       73856
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 32, 32, 128)       147584
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 16, 16, 128)       0
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 16, 16, 256)       295168
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 16, 16, 256)       590080
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 16, 16, 256)       590080
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 8, 8, 256)         0
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 8, 8, 512)         1180160
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 8, 8, 512)         2359808
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 8, 8, 512)         2359808
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 4, 4, 512)         0
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 4, 4, 512)         2359808
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 4, 4, 512)         2359808
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 4, 4, 512)         2359808
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 2, 2, 512)         0
_________________________________________________________________
sequential_1 (Sequential)    (None, 100)               550244
=================================================================
Total params: 15,264,932
Trainable params: 15,264,932
Non-trainable params: 0
_________________________________________________________________
Using real-time data augmentation.
1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?