今日は、Tensorflow編。。。
【Darknet】リアルタイムオブジェクト認識 YOLOをTensorflowで試すに沿って話を進めます。
といっても、案外簡単にできました。
まずは、前半はDarknetの話なので割愛します。
ということで、「TensorflowでYOLO」
git clone https://github.com/gliese581gg/YOLO_tensorflow.git
一応、動かなかったかな??
※たしか、gitファイルが見つかりませんでした。
というわけで、gliese581gg/YOLO_tensorflow を参考にすることにします。
3ステップで動くようです。
2.Install
(1) Download code
(2) Download YOLO weight file from
YOLO_small : https://drive.google.com/file/d/0B2JbaJSrWLpza08yS2FSUnV2dlE/view?usp=sharing
YOLO_tiny : https://drive.google.com/file/d/0B2JbaJSrWLpza0FtQlc3ejhMTTA/view?usp=sharing
YOLO_face : https://drive.google.com/file/d/0B2JbaJSrWLpzMzR5eURGN2dMTk0/view?usp=sharing
(3) Put the 'YOLO_(version).ckpt' in the 'weight' folder of downloaded code
最初のDownload codeで、このディレクトリ構成を作成して、対応するところに
YOLO_tensorflow / YOLO_small_tf.py などをダウンロードします。
faceはいらないから、smallとtiny、そしてLicenceが大切ですね。。。あとはDirectoryを切っておくと便利です。
このコードのLicenceは以下のとおり、商用利用禁止です!
YOLO_tensorflow LICENSE
Version 0.1, FEB 15 2016
ACCORDING TO ORIGINAL CODE'S LICENSE,
DO NOT USE THIS ON COMMERCIAL!
I OR ORIGINAL AUTHOR DO NOT HOLD LIABILITY FOR ANY DAMAGES!
BELOW IS THE ORIGINAL CODE'S LICENSE
{
THIS SOFTWARE LICENSE IS PROVIDED "ALL CAPS" SO THAT YOU KNOW IT IS SUPER
SERIOUS AND YOU DON'T MESS AROUND WITH COPYRIGHT LAW BECAUSE YOU WILL GET IN
TROUBLE HERE ARE SOME OTHER BUZZWORDS COMMONLY IN THESE THINGS WARRANTIES
LIABILITY CONTRACT TORT LIABLE CLAIMS RESTRICTION MERCHANTABILITY SUBJECT TO
THE FOLLOWING CONDITIONS:
1. #yolo
2. #swag
3. #blazeit
}
とはいえ、学習や研究するにはうってつけです。
なぜなら、ネットワーク構造がコードで見えるからです。
次に、
(2) Download YOLO weight file fromして、
(3) Put the 'YOLO_(version).ckpt' in the 'weight' folder of downloaded code
WeightsファイルをWeights Directoryに置きます。
YOLO_small.ckpt 367MBとYOLO_tiny.ckpt 176MBです。
そして、実行です。
YOLO_tiny_tf.pyは以下の通りです。
C:\Users\user\YOLO_tensorflow-master>python YOLO_tiny_tf.py -fromfile test/person.jpg -tofile_img predictions.png
検出精度は
class : person , [x,y,w,h]=[232,224,142,193], Confidence = 0.2230725884437561
Elapsed time : 1.0469865798950195 secs
そして、YOLO_small_tf.pyは以下の通りです。
C:\Users\user\YOLO_tensorflow-master>python YOLO_small_tf.py -fromfile test/person.jpg -tofile_img predictions.png
検出精度は
class : person , [x,y,w,h]=[231,234,145,263], Confidence = 0.602942705154419
class : dog , [x,y,w,h]=[146,308,124,117], Confidence = 0.4560053050518036
Elapsed time : 1.2792901992797852 secs```
###まとめ
・Tensorflow版だが、Python版としてYoloを動かし、物体検出できた
・精度は、やはり前回のYolov3やYolov2と比較すると落ちる
・ネットワーク構造さえ深くすれば、検出精度が上がるのか興味がわく
・学習の仕方は不明である
####YOLO_tiny_tf.pyのネットワーク構造と実行結果
C:\Users\user\YOLO_tensorflow-master>python YOLO_tiny_tf.py -fromfile test/person.jpg
Building YOLO_tiny graph...
Layer 1 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 16, Input channels = 3
Layer 2 : Type = Pool, Size = 2 * 2, Stride = 2
Layer 3 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 32, Input channels = 16
Layer 4 : Type = Pool, Size = 2 * 2, Stride = 2
Layer 5 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 64, Input channels = 32
Layer 6 : Type = Pool, Size = 2 * 2, Stride = 2
Layer 7 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 128, Input channels = 64
Layer 8 : Type = Pool, Size = 2 * 2, Stride = 2
Layer 9 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 256, Input channels = 128
Layer 10 : Type = Pool, Size = 2 * 2, Stride = 2
Layer 11 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 512, Input channels = 256
Layer 12 : Type = Pool, Size = 2 * 2, Stride = 2
Layer 13 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 1024, Input channels = 512
Layer 14 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 1024, Input channels = 1024
Layer 15 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 1024, Input channels = 1024
Layer 16 : Type = Full, Hidden = 256, Input dimension = 50176, Flat = 1, Activation = 1
Layer 17 : Type = Full, Hidden = 4096, Input dimension = 256, Flat = 0, Activation = 1
Layer 19 : Type = Full, Hidden = 1470, Input dimension = 4096, Flat = 0, Activation = 0
####YOLO_small_tf.pyのネットワーク構造と実行結果
C:\Users\user\YOLO_tensorflow-master>python YOLO_small_tf.py -fromfile test/person.jpg
Building YOLO_small graph...
Layer 1 : Type = Conv, Size = 7 * 7, Stride = 2, Filters = 64, Input channels = 3
Layer 2 : Type = Pool, Size = 2 * 2, Stride = 2
Layer 3 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 192, Input channels = 64
Layer 4 : Type = Pool, Size = 2 * 2, Stride = 2
Layer 5 : Type = Conv, Size = 1 * 1, Stride = 1, Filters = 128, Input channels = 192
Layer 6 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 256, Input channels = 128
Layer 7 : Type = Conv, Size = 1 * 1, Stride = 1, Filters = 256, Input channels = 256
Layer 8 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 512, Input channels = 256
Layer 9 : Type = Pool, Size = 2 * 2, Stride = 2
Layer 10 : Type = Conv, Size = 1 * 1, Stride = 1, Filters = 256, Input channels = 512
Layer 11 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 512, Input channels = 256
Layer 12 : Type = Conv, Size = 1 * 1, Stride = 1, Filters = 256, Input channels = 512
Layer 13 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 512, Input channels = 256
Layer 14 : Type = Conv, Size = 1 * 1, Stride = 1, Filters = 256, Input channels = 512
Layer 15 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 512, Input channels = 256
Layer 16 : Type = Conv, Size = 1 * 1, Stride = 1, Filters = 256, Input channels = 512
Layer 17 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 512, Input channels = 256
Layer 18 : Type = Conv, Size = 1 * 1, Stride = 1, Filters = 512, Input channels = 512
Layer 19 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 1024, Input channels = 512
Layer 20 : Type = Pool, Size = 2 * 2, Stride = 2
Layer 21 : Type = Conv, Size = 1 * 1, Stride = 1, Filters = 512, Input channels = 1024
Layer 22 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 1024, Input channels = 512
Layer 23 : Type = Conv, Size = 1 * 1, Stride = 1, Filters = 512, Input channels = 1024
Layer 24 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 1024, Input channels = 512
Layer 25 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 1024, Input channels = 1024
Layer 26 : Type = Conv, Size = 3 * 3, Stride = 2, Filters = 1024, Input channels = 1024
Layer 27 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 1024, Input channels = 1024
Layer 28 : Type = Conv, Size = 3 * 3, Stride = 1, Filters = 1024, Input channels = 1024
Layer 29 : Type = Full, Hidden = 512, Input dimension = 50176, Flat = 1, Activation = 1
Layer 30 : Type = Full, Hidden = 4096, Input dimension = 512, Flat = 0, Activation = 1
Layer 32 : Type = Full, Hidden = 1470, Input dimension = 4096, Flat = 0, Activation = 0