LoginSignup
0
1

More than 1 year has passed since last update.

ViT(Vision Transformer)を動かしてみた(4つ)、うっ、Cifar10で80%超えた、ぼちぼちくるかー。すべて他力です。cifar10、imagenet_resized/32x32、結果はまだ「?」付き。

Last updated at Posted at 2020-10-18

概要

以下のgithubに、いま注目されているViT(Vision Transformer)の実装が示されていたので、動かしてみた。

(1)
https://github.com/kamalkraj/Vision-Transformer

対象は、cifar10でした。jupyter notebookの形で、使い方が示されていた。

(2)
https://github.com/emla2805/vision-transformer

こちらは、対象は、imagenet_resized/32x32でした。

(3)
https://github.com/tuvovan/Vision_Transformer_Keras
も試しました。これは、cifar10。

(4)
https://github.com/kentaroy47/vision-transformers-cifar10
こちらも、cifar10。80%超えましたっ!

動かしてみた結果(まとめ)

URL Cifar10/imagenet32x32 結果(%) 備考
1 https://github.com/kamalkraj/Vision-Transformer cifar10 48.9
2 https://github.com/emla2805/vision-transformer imagenet32x32 9.2(途中) 途中ですが、見込みなしと判断
3 https://github.com/tuvovan/Vision_Transformer_Keras cifar10 63%
4 https://github.com/kentaroy47/vision-transformers-cifar10 cifar10 ★80.9

cifar10で、80.2%でた!すごい。
↑どこが優れていいるのか、調べていません。

動かしてみた結果(https://github.com/kamalkraj/Vision-Transformer)

cifar10。エポック数は10になっていたので、300に延ばした。

出だしは、以下のような感じ。悪くない。計算も速かった。

epoch 1: train loss 1.79971. train accuracy 0.34166
epoch 1: test loss 1.56127. test accuracy 0.43050
epoch 2: train loss 1.51326. train accuracy 0.44944
epoch 2: test loss 1.45918. test accuracy 0.45840
epoch 3: train loss 1.41494. train accuracy 0.48768
epoch 3: test loss 1.39249. test accuracy 0.49690

300エポックで、以下のような感じ。
(非力なGPUで半日程度で完了しました。)

epoch 295: train loss 0.11632. train accuracy 0.96030
epoch 295: test loss 5.43269. test accuracy 0.49070
epoch 296: train loss 0.10554. train accuracy 0.96354
epoch 296: test loss 5.39661. test accuracy 0.49290
epoch 297: train loss 0.10795. train accuracy 0.96264
epoch 297: test loss 5.40721. test accuracy 0.49250
epoch 298: train loss 0.11270. train accuracy 0.96154
epoch 298: test loss 5.41567. test accuracy 0.48540
epoch 299: train loss 0.10613. train accuracy 0.96350
epoch 299: test loss 5.40492. test accuracy 0.48590
epoch 300: train loss 0.10614. train accuracy 0.96536
epoch 300: test loss 5.57218. test accuracy 0.48910

test accuracy 0.48910 と、滅茶ワル!
train accuracy 0.96536 は、普通?

⇒ 実装は、augmentationとかは入っていなかった。
それにしても、test accuracy 0.48910 はナイはずなので、、、
(←augmentationがない世間の実行例をみても70%とかにはなっていました。当然、モデルにも依存しますが。)
なんでしょう????

動かしてみた結果(https://github.com/emla2805/vision-transformer)

バッチサイズは、メモリ不足が出て実行できなかったので、デフォルトの4096を1024に変更した。
imagenet_resized/32x32。

Epoch 1/300
1252/1252 [==============================] - 3325s 3s/step - loss: 6.3762 - accuracy: 0.0150 - val_loss: 6.0516 - val_accuracy: 0.0278
Epoch 2/300
1252/1252 [==============================] - 3686s 3s/step - loss: 5.9006 - accuracy: 0.0350 - val_loss: 5.7768 - val_accuracy: 0.0434
Epoch 3/300
1252/1252 [==============================] - 3687s 3s/step - loss: 5.7059 - accuracy: 0.0458 - val_loss: 5.6090 - val_accuracy: 0.0531
Epoch 4/300
1252/1252 [==============================] - 3688s 3s/step - loss: 5.5572 - accuracy: 0.0555 - val_loss: 5.4895 - val_accuracy: 0.0627
Epoch 5/300
1252/1252 [==============================] - 3689s 3s/step - loss: 5.4528 - accuracy: 0.0631 - val_loss: 5.3938 - val_accuracy: 0.0689
Epoch 6/300
1252/1252 [==============================] - 3685s 3s/step - loss: 5.3737 - accuracy: 0.0691 - val_loss: 5.3331 - val_accuracy: 0.0732
Epoch 7/300
1252/1252 [==============================] - 3686s 3s/step - loss: 5.3049 - accuracy: 0.0746 - val_loss: 5.2774 - val_accuracy: 0.0774
Epoch 8/300
1252/1252 [==============================] - 3685s 3s/step - loss: 5.2486 - accuracy: 0.0789 - val_loss: 5.2161 - val_accuracy: 0.0831
Epoch 9/300
1252/1252 [==============================] - 3683s 3s/step - loss: 5.1994 - accuracy: 0.0830 - val_loss: 5.1670 - val_accuracy: 0.0867
Epoch 10/300
1252/1252 [==============================] - 3684s 3s/step - loss: 5.1580 - accuracy: 0.0866 - val_loss: 5.1153 - val_accuracy: 0.0912
Epoch 11/300
1252/1252 [==============================] - 3086s 2s/step - loss: 5.1240 - accuracy: 0.0893 - val_loss: 5.0699 - val_accuracy: 0.0937
Epoch 12/300
1252/1252 [==============================] - 2549s 2s/step - loss: 5.0951 - accuracy: 0.0917 - val_loss: 5.0980 - val_accuracy: 0.0932

長いエポック数は、実行していないが、すでに、勢いが止まっているので、まともな率にはならないと見切りました。
単純には、高い率は、でない気がします。

動かしてみた結果(https://github.com/tuvovan/Vision_Transformer_Keras)

cifar10。63%

Epoch 1/50
782/782 [==============================] - 71s 91ms/step - loss: 1.9103 - accuracy: 0.2675 - val_loss: 1.6849 - val_accuracy: 0.3607
Epoch 2/50
782/782 [==============================] - 74s 95ms/step - loss: 1.6341 - accuracy: 0.3881 - val_loss: 1.5447 - val_accuracy: 0.4328
Epoch 3/50
782/782 [==============================] - 181s 231ms/step - loss: 1.4721 - accuracy: 0.4582 - val_loss: 1.3912 - val_accuracy: 0.4886
Epoch 4/50
782/782 [==============================] - 336s 430ms/step - loss: 1.3570 - accuracy: 0.5040 - val_loss: 1.3044 - val_accuracy: 0.5215
Epoch 5/50
782/782 [==============================] - 337s 430ms/step - loss: 1.2924 - accuracy: 0.5304 - val_loss: 1.2297 - val_accuracy: 0.5506
Epoch 6/50
782/782 [==============================] - 336s 430ms/step - loss: 1.2307 - accuracy: 0.5537 - val_loss: 1.2331 - val_accuracy: 0.5489
Epoch 7/50
782/782 [==============================] - 337s 431ms/step - loss: 1.1847 - accuracy: 0.5724 - val_loss: 1.1611 - val_accuracy: 0.5777
Epoch 8/50
782/782 [==============================] - 337s 431ms/step - loss: 1.1462 - accuracy: 0.5853 - val_loss: 1.1819 - val_accuracy: 0.5754
Epoch 9/50
782/782 [==============================] - 338s 432ms/step - loss: 1.1151 - accuracy: 0.5981 - val_loss: 1.1267 - val_accuracy: 0.5923
Epoch 10/50
782/782 [==============================] - 338s 432ms/step - loss: 1.0888 - accuracy: 0.6074 - val_loss: 1.0718 - val_accuracy: 0.6131
Epoch 11/50
782/782 [==============================] - 338s 432ms/step - loss: 1.0692 - accuracy: 0.6155 - val_loss: 1.0739 - val_accuracy: 0.6144
Epoch 12/50
782/782 [==============================] - 337s 431ms/step - loss: 1.0522 - accuracy: 0.6235 - val_loss: 1.1166 - val_accuracy: 0.5950
Epoch 13/50
782/782 [==============================] - 344s 440ms/step - loss: 1.0327 - accuracy: 0.6312 - val_loss: 1.1115 - val_accuracy: 0.6009
Epoch 14/50
782/782 [==============================] - 337s 431ms/step - loss: 0.9722 - accuracy: 0.6583 - val_loss: 1.0198 - val_accuracy: ★0.6392
Epoch 15/50
782/782 [==============================] - 337s 432ms/step - loss: 1.0219 - accuracy: 0.6455 - val_loss: 1.0559 - val_accuracy: 0.6287
Epoch 16/50
782/782 [==============================] - 337s 431ms/step - loss: 1.0714 - accuracy: 0.6281 - val_loss: 1.1004 - val_accuracy: 0.6127
Epoch 17/50
782/782 [==============================] - 337s 431ms/step - loss: 1.1214 - accuracy: 0.6117 - val_loss: 1.1289 - val_accuracy: 0.6028
Epoch 18/50
782/782 [==============================] - 337s 431ms/step - loss: 1.2367 - accuracy: 0.5923 - val_loss: 1.3054 - val_accuracy: 0.5793
Epoch 19/50
782/782 [==============================] - 337s 431ms/step - loss: 1.4621 - accuracy: 0.5401 - val_loss: 1.5011 - val_accuracy: 0.5335
Epoch 20/50
782/782 [==============================] - 337s 430ms/step - loss: 1.6503 - accuracy: 0.4775 - val_loss: 1.6644 - val_accuracy: 0.4793
Epoch 21/50
782/782 [==============================] - 337s 430ms/step - loss: 1.8397 - accuracy: 0.4129 - val_loss: 1.9011 - val_accuracy: 0.3987
Epoch 22/50
782/782 [==============================] - 337s 431ms/step - loss: 2.0167 - accuracy: 0.3414 - val_loss: 2.0578 - val_accuracy: 0.3361
Epoch 23/50
782/782 [==============================] - 337s 430ms/step - loss: 2.1249 - accuracy: 0.2904 - val_loss: 2.1507 - val_accuracy: 0.2885
Epoch 24/50
782/782 [==============================] - 337s 431ms/step - loss: 2.1957 - accuracy: 0.2472 - val_loss: 2.2183 - val_accuracy: 0.2379

動かしてみた結果(https://github.com/kentaroy47/vision-transformers-cifar10)

cifar10。80.9%

Epoch: 44
 [===============================>]  Step: 791ms | Tot: 782/782  Loss: 0.273 | Acc: 90.354% (45177/50000)4)2)
 [===============================>]  Step: 1s536ms | To 100/100 | Loss: 0.646 | Acc: 80.570% (8057/10000)0)
Epoch: 45
 [===============================>]  Step: 768ms | Tot: 782/782  Loss: 0.268 | Acc: 90.502% (45251/50000)))6)
 [===============================>]  Step: 1s653ms | To 100/100 | Loss: 0.646 | Acc: ★80.920% (8092/10000)0)
Epoch: 46
 [===============================>]  Step: 767ms | Tot 782/782  Loss: 0.255 | Acc: 90.958% (45479/50000)4)0))
 [===============================>]  Step: 1s583ms | To 100/100 | Loss: 0.645 | Acc: 80.810% (8081/10000)))
Epoch: 47
 [===============================>]  Step: 838ms | Tot: 782/782  Loss: 0.256 | Acc: 90.874% (45437/50000)4)6)
 [===============================>]  Step: 1s552ms | To 100/100 | Loss: 0.646 | Acc: 80.730% (8073/10000)))
Epoch: 48
 [===============================>]  Step: 756ms | Tot: 3 782/782  Loss: 0.257 | Acc: 90.864% (45432/50000)))
 [===============================>]  Step: 1s689ms | To 100/100 | Loss: 0.647 | Acc: 80.700% (8070/10000)))
Epoch: 49
 [===============================>]  Step: 813ms | Tot: 782/782  Loss: 0.253 | Acc: 91.026% (45513/50000)4)8)
 [===============================>]  Step: 1s529ms | To 100/100 | Loss: 0.648 | Acc: 80.750% (8075/10000)0)



・・・・
・・・・


Epoch: 80
 [===============================>]  Step: 967ms | Tot: 782/782  Loss: 0.254 | Acc: 90.834% (45417/50000)))2)
 [===============================>]  Step: 1s663ms | To 100/100 | Loss: 0.648 | Acc: 80.720% (8072/10000)))
Epoch: 81
 [===============================>]  Step: 815ms | Tot: 782/782  Loss: 0.249 | Acc: 91.138% (45569/50000)0)4)
 [===============================>]  Step: 1s426ms | To 100/100 | Loss: 0.648 | Acc: 80.720% (8072/10000)))
Epoch: 82
 [===============================>]  Step: 916ms | Tot: 782/782  Loss: 0.251 | Acc: 91.200% (45600/50000)4)4)
 [===============================>]  Step: 1s659ms | To 100/100 | Loss: 0.648 | Acc: 80.730% (8073/10000)))

試行錯誤

Transformerに期待し?、Verticalのaugmentationを有効にしてみたが、、、何もいいことは起きなかった!!

まとめ

上記のとおり、なんでしょう????終わりです。
コメントなどあれば、お願いします。

関連

ViT(Vision Transformer)を動かしてみた、4つほど、うっ、Cifar10で80%超えた、ぼちぼちくるかー。【番外編】

0
1
4

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1