I tried automatic tracking annotation by VATIC and implemented conversion to TFRecord format.
1.Introduction
アノテーションの作業、とても面倒ですよね。 今回は8年前にコミットされた VATIC
というオブジェクト自動追尾機能付きアノテーションツールのご紹介と使い方をまとめてみようと思います。 8年前に作成されたからといってあなどるなかれ、めちゃくちゃ便利です。 2〜3箇所アノテーションするだけでずっと自動追尾して自動的にアノテーション情報を保存してくれます。 ライセンスは MIT です。
Microsoft社から提供されている VoTT
(コチラ) というソフトウェアにも自動追尾機能が備わっていた気がしますが、新しい世代のバージョンでは私の扱い方が悪いのか、自動追尾機能をうまく動作させることができませんでした。 もしうまくできるのなら、VoTTを使用したほうが良いような気もします。
--xml XML
--json JSON
--matlab MATLAB
--pickle Python's Pickle
--labelme LabelMe video's XML format
--pascal PASCAL VOC format, treating each frame as an image
TFRecord形式へのコンバート手順は公式のObject Detection APIを大掛かりに変更するのが面倒でしたので、公式のものをほぼそのまま使用しました。 The 手抜きです。
なお、以下は英語で記事を書きましたが、キャプチャ画像をベタベタ貼りましたのでコマンドと画像を見ていただければあまり難しくはないと思います。
雰囲気だけ先にご紹介しますと、下図のような感じです。 画像をクリックするとYoutube動画が再生されます。
Youtube: https://youtu.be/y03-kdMrBiE
2.Environment
- Ubuntu 16.04 x86_64
- Python 3.5.2
- [Docker] Ubuntu 14.04
- [Docker] Python 3.4.3
- Corei7 Gen8
- Google Chrome
- Docker
- Client:
- Version: 18.09.5
- API version: 1.39
- Go version: go1.10.8
- Git commit: e8ff056
- Built: Thu Apr 11 04:44:24 2019
- OS/Arch: linux/amd64
- Experimental: false
- Server: Docker Engine - Community
- Engine:
- Version: 18.09.5
- API version: 1.39 (minimum version 1.12)
- Go version: go1.10.8
- Git commit: e8ff056
- Built: Thu Apr 11 04:10:53 2019
- OS/Arch: linux/amd64
- Experimental: false
- Engine:
- Client:
3.Procedure
3−1.Environment preparation
3−1−1.Clone the VATIC-Docker repository
$ cd ~
$ git clone https://github.com/NPSVisionLab/vatic-docker.git
$ cd vatic-docker
$ mkdir -p data/videos_in
$ sudo rm -rf data/db.mysql
3−1−2.Place a video for annotation
If the video file is named testvideo.mp4
.
Please change the copy source path of the video file according to your environment.
$ cp ~/testvideo.mp4 data/videos_in
3−1−3.Edit label information
Open "labels.txt" in a text editor and add label information.
If you define multiple labels, separate them with line breaks and add them side by side.
The last line must not contain an empty line.
If you want to change the label list during annotation work, you need to restart the Docker container.
3−1−4.Docker Run
$ sudo docker run -it -p 8111:80 -v $PWD/data:/root/vatic/data npsvisionlab/vatic-docker /bin/bash -C /root/vatic/example.sh
Unable to find image 'npsvisionlab/vatic-docker:latest' locally
latest: Pulling from npsvisionlab/vatic-docker
064f9af02539: Pull complete
390957b2f4f0: Pull complete
cee0974db2b8: Pull complete
c8144262002c: Pull complete
5ee1f24af8a6: Pull complete
1d9960422fa1: Pull complete
baa5641dc562: Pull complete
671c438bbff0: Pull complete
deec772cc23b: Pull complete
7acbdd1641ac: Pull complete
b0b6f5f3d865: Pull complete
a45f9ecd8863: Pull complete
625e13411eb9: Pull complete
01f6ee126a43: Pull complete
ad731db2ae7f: Pull complete
d5e8d41b9f20: Pull complete
34816bf17724: Pull complete
Digest: sha256:aa9f113f1db9e6bda51bd87b4101e5cc9c23dcae3f0dddd6f34439884b61c345
Status: Downloaded newer image for npsvisionlab/vatic-docker:latest
Labels = Car Person Bicycle
New Videos to process.
* Starting MySQL database server mysqld [ OK ]
* Checking for tables which need an upgrade, are corrupt or were
not closed cleanly.
ffmpeg version N-80901-gfebc862 Copyright (c) 2000-2016 the FFmpeg developers
built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
configuration: --extra-libs=-ldl --prefix=/opt/ffmpeg --mandir=/usr/share/man --enable-avresample --disable-debug --enable-nonfree --enable-gpl --enable-version3 --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-decoder=amrnb --disable-decoder=amrwb --enable-libpulse --enable-libfreetype --enable-gnutls --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-libvorbis --enable-libmp3lame --enable-libopus --enable-libvpx --enable-libspeex --enable-libass --enable-avisynth --enable-libsoxr --enable-libxvid --enable-libvidstab
libavutil 55. 28.100 / 55. 28.100
libavcodec 57. 48.101 / 57. 48.101
libavformat 57. 41.100 / 57. 41.100
libavdevice 57. 0.102 / 57. 0.102
libavfilter 6. 47.100 / 6. 47.100
libavresample 3. 0. 0 / 3. 0. 0
libswscale 4. 1.100 / 4. 1.100
libswresample 2. 1.100 / 2. 1.100
libpostproc 54. 0.100 / 54. 0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/root/vatic/data/videos_in/testvideo.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.12.100
Duration: 00:00:20.09, start: 0.000000, bitrate: 595 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 480x270, 583 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 2 kb/s (default)
Metadata:
handler_name : SoundHandler
Please use -b:a or -b:v, -b is ambiguous
[swscaler @ 0x2273fe0] deprecated pixel format used, make sure you did set range correctly
[image2 @ 0x223ae80] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
Output #0, image2, to '/tmp/pyvision-ffmpeg-408300912/%d.jpg':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.41.100
Stream #0:0(und): Video: mjpeg, yuvj420p(pc), 480x270, q=2-31, 10000 kb/s, 29.97 fps, 29.97 tbn, 29.97 tbc (default)
Metadata:
handler_name : VideoHandler
encoder : Lavc57.48.101 mjpeg
Side data:
cpb: bitrate max/min/avg: 0/0/10000000 buffer size: 0 vbv_delay: -1
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Press [q] to stop, [?] for help
frame= 602 fps=0.0 q=1.6 Lsize=N/A time=00:00:20.08 bitrate=N/A speed=24.9x
video:20290kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Decoding frames 0 to 100
Decoding frames 100 to 200
Decoding frames 200 to 300
Decoding frames 300 to 400
Decoding frames 400 to 500
Decoding frames 500 to 600
Decoding frames 600 to 700
Checking integrity...
Searching for last frame...
Found 602 frames.
Binding labels and attributes...
Creating symbolic link...
Creating segments...
Video loaded and ready for publication.
http://localhost/?id=1&hitId=offline
http://localhost/?id=2&hitId=offline
http://localhost/?id=3&hitId=offline
root@acd3861df5f7:~/vatic#
3−2.Implementation of annotation work
Launch your browser and enter http://localhost:8111/directory/
in the address field to access it.
It should be displayed as shown below.
Click "Video Segment".
It will transition to the editing screen as shown below.
Click the + New Object
button.
Enclose the feature in the bounding box and click on the "Car" option.
Operate the slide bar to advance the video a little.
Slide the bounding box with the mouse to correct it to the correct position.
Again, move the slide bar to advance the video a bit.
Again, slide the bounding box with the mouse to correct it to the correct position.
Move the slide bar until the object disappears from the screen.
Check "Outside of view frame" so that the bounding box is not recognized in the current frame.
It was only about 3 annotations, but even with this one, you should be able to annotate with quite high accuracy. Let's play the video to see how beautifully annotated it is.
Click the Rewind
button to return the video to its initial position.
Let's play, annotation video!!
3−3.Save work content
If you want to save the progress of the work, click the Save Work
button.
3−4.Convert labelme format to Pascal VOC format
# mkdir -p data/VOCdevkit/VOC2007
# turkic dump currentvideo --pascal --output /root/vatic/data/VOCdevkit/VOC2007 2>&1; mysqldump --user root --all-databases > data/db.mysql
3−5.【Host PC】 Convert Pascal VOC format to TFRecord format
Execute the following command on the host PC side.
$ apt-get update;apt-get upgrade -y
$ apt-get install -y protobuf-compiler python-pil python-lxml python-tk \
autoconf automake libtool curl make g++ unzip wget git nano \
libgflags-dev libgoogle-glog-dev liblmdb-dev libleveldb-dev \
libhdf5* python3-dev python3-numpy python3-skimage gfortran libturbojpeg \
python-dev python-numpy python-skimage python3-pip python-pip \
libboost-all-dev libopenblas-dev libsnappy-dev software-properties-common \
protobuf-compiler python-pil python-lxml python-tk libfreetype6-dev pkg-config libpng12*
$ sudo -H pip3 install pip==18.0.0 --upgrade
$ sudo -H pip3 install Cython opencv-python lxml
$ sudo -H pip3 install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
$ sudo -H pip3 install six numpy --ignore-installed --upgrade
$ sudo -H pip3 install tensorflow==1.12.0 --upgrade
$ wget https://github.com/protocolbuffers/protobuf/archive/v3.7.0.zip
$ unzip v3.7.0.zip;rm v3.7.0.zip;cd protobuf-3.7.0
$ ./autogen.sh
$ ./configure
$ make -j$(($(nproc) + 1))
$ make install
$ cd python
$ export LD_LIBRARY_PATH=../src/.libs
$ python3 setup.py build --cpp_implementation
$ python3 setup.py test --cpp_implementation
$ python3 setup.py install --cpp_implementation
$ ldconfig
$ cd ../..
$ git clone https://github.com/tensorflow/models.git
$ cd models/research
$ sed -i "s%category_name = unicode(category_name, 'utf-8')%category_name = str(category_name, 'utf-8')%g" "object_detection/utils/object_detection_evaluation.py"
$ sed -i "s%<folder>/root/vatic/data/VOCdevkit/VOC2007</folder>%<folder>${HOME}/vatic-docker/data/VOCdevkit/VOC2007</folder>%g" ${HOME}/vatic-docker/data/VOCdevkit/VOC2007/Annotations/*
### Create_label_map.pbtxt
$ nano ~/vatic-docker/data/label_map.pbtxt
item {
id: 1
name: 'Car'
}
item {
id: 2
name: 'Person'
}
item {
id: 3
name: 'Bicycle'
}
$ protoc object_detection/protos/*.proto --python_out=.
$ sed -i "s%'aeroplane_' + FLAGS.set + '.txt'%'Car_' + FLAGS.set + '.txt'%g" object_detection/dataset_tools/create_pascal_tf_record.py
$ sudo chmod 777 -R ${HOME}/vatic-docker/data/*
$ python3 object_detection/dataset_tools/create_pascal_tf_record.py \
--label_map_path="${HOME}/vatic-docker/data/label_map.pbtxt" \
--data_dir="${HOME}/vatic-docker/data/VOCdevkit" \
--year=VOC2007 \
--set=train \
--output_path="${HOME}/vatic-docker/data/pascal_train.record" \
--ignore_difficult_instances=True
$ python3 object_detection/dataset_tools/create_pascal_tf_record.py \
--label_map_path="${HOME}/vatic-docker/data/label_map.pbtxt" \
--data_dir="${HOME}/vatic-docker/data/VOCdevkit" \
--year=VOC2007 \
--set=trainval \
--output_path="${HOME}/vatic-docker/data/pascal_val.record" \
--ignore_difficult_instances=True
4.Finally
終盤のTFRecord生成手順はかなり手を抜きました。 create_pascal_tf_record.py
を要件に合うようにちゃんと修正したほうが良いと思います。
また、流用させていただいた VATIC の Docker File は Ubuntu 14.04 (Trusty) のイメージが生成されますので色々とハマりました。 やむなくホストPC側で最後の手順を実施しましたが、 本来は Docker File を修正して、 Ubuntu 16.04 (xenial) 以降の新しいイメージで作業したほうが気持ち良いと思います。
作業手順を行ったり来たりしながら記事を書いたため誤りがあるかもしれません。 お気づきの際はご指摘いただけますと幸いです。