More than 5 years have passed since last update.

今更ですが、VATICによる動画の自動追尾アノテーションを使用してTFRecord形式への変換まで実施してみました[Docker編]

Last updated at 2019-05-17Posted at 2019-05-12

I tried automatic tracking annotation by VATIC and implemented conversion to TFRecord format.

１．Introduction

アノテーションの作業、とても面倒ですよね。今回は８年前にコミットされた VATIC というオブジェクト自動追尾機能付きアノテーションツールのご紹介と使い方をまとめてみようと思います。８年前に作成されたからといってあなどるなかれ、めちゃくちゃ便利です。２〜３箇所アノテーションするだけでずっと自動追尾して自動的にアノテーション情報を保存してくれます。ライセンスは MIT です。
Microsoft社から提供されている VoTT (コチラ) というソフトウェアにも自動追尾機能が備わっていた気がしますが、新しい世代のバージョンでは私の扱い方が悪いのか、自動追尾機能をうまく動作させることができませんでした。もしうまくできるのなら、VoTTを使用したほうが良いような気もします。

アノテーションデータの出力フォーマット種類

--xml       XML
--json      JSON
--matlab    MATLAB
--pickle    Python's Pickle
--labelme   LabelMe video's XML format
--pascal    PASCAL VOC format, treating each frame as an image

TFRecord形式へのコンバート手順は公式のObject Detection APIを大掛かりに変更するのが面倒でしたので、公式のものをほぼそのまま使用しました。 The 手抜きです。

なお、以下は英語で記事を書きましたが、キャプチャ画像をベタベタ貼りましたのでコマンドと画像を見ていただければあまり難しくはないと思います。

雰囲気だけ先にご紹介しますと、下図のような感じです。画像をクリックするとYoutube動画が再生されます。
Youtube： https://youtu.be/y03-kdMrBiE

２．Environment

Ubuntu 16.04 x86_64
Python 3.5.2
[Docker] Ubuntu 14.04
[Docker] Python 3.4.3
Corei7 Gen8
Google Chrome
Docker
- Client:
  - Version: 18.09.5
  - API version: 1.39
  - Go version: go1.10.8
  - Git commit: e8ff056
  - Built: Thu Apr 11 04:44:24 2019
  - OS/Arch: linux/amd64
  - Experimental: false
- Server: Docker Engine - Community
  - Engine:
    - Version: 18.09.5
    - API version: 1.39 (minimum version 1.12)
    - Go version: go1.10.8
    - Git commit: e8ff056
    - Built: Thu Apr 11 04:10:53 2019
    - OS/Arch: linux/amd64
    - Experimental: false

３．Procedure

３−１．Environment preparation

３−１−１．Clone the VATIC-Docker repository

Repository_Clone_and_Folder_Creation

$ cd ~
$ git clone https://github.com/NPSVisionLab/vatic-docker.git
$ cd vatic-docker
$ mkdir -p data/videos_in
$ sudo rm -rf data/db.mysql

３−１−２．Place a video for annotation

If the video file is named testvideo.mp4.
Please change the copy source path of the video file according to your environment.

Copy_video_file_to_"videos_in"_folder

$ cp ~/testvideo.mp4 data/videos_in

３−１−３．Edit label information

Open "labels.txt" in a text editor and add label information.
If you define multiple labels, separate them with line breaks and add them side by side.
The last line must not contain an empty line.
If you want to change the label list during annotation work, you need to restart the Docker container.

３−１−４．Docker Run

Docker_execution_command

$ sudo docker run -it -p 8111:80 -v $PWD/data:/root/vatic/data npsvisionlab/vatic-docker /bin/bash -C /root/vatic/example.sh

Startup_history

Unable to find image 'npsvisionlab/vatic-docker:latest' locally
latest: Pulling from npsvisionlab/vatic-docker
064f9af02539: Pull complete 
390957b2f4f0: Pull complete 
cee0974db2b8: Pull complete 
c8144262002c: Pull complete 
5ee1f24af8a6: Pull complete 
1d9960422fa1: Pull complete 
baa5641dc562: Pull complete 
671c438bbff0: Pull complete 
deec772cc23b: Pull complete 
7acbdd1641ac: Pull complete 
b0b6f5f3d865: Pull complete 
a45f9ecd8863: Pull complete 
625e13411eb9: Pull complete 
01f6ee126a43: Pull complete 
ad731db2ae7f: Pull complete 
d5e8d41b9f20: Pull complete 
34816bf17724: Pull complete 
Digest: sha256:aa9f113f1db9e6bda51bd87b4101e5cc9c23dcae3f0dddd6f34439884b61c345
Status: Downloaded newer image for npsvisionlab/vatic-docker:latest
Labels = Car Person Bicycle
New Videos to process.
 * Starting MySQL database server mysqld                                 [ OK ] 
 * Checking for tables which need an upgrade, are corrupt or were 
not closed cleanly.
ffmpeg version N-80901-gfebc862 Copyright (c) 2000-2016 the FFmpeg developers
  built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
  configuration: --extra-libs=-ldl --prefix=/opt/ffmpeg --mandir=/usr/share/man --enable-avresample --disable-debug --enable-nonfree --enable-gpl --enable-version3 --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-decoder=amrnb --disable-decoder=amrwb --enable-libpulse --enable-libfreetype --enable-gnutls --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-libvorbis --enable-libmp3lame --enable-libopus --enable-libvpx --enable-libspeex --enable-libass --enable-avisynth --enable-libsoxr --enable-libxvid --enable-libvidstab
  libavutil      55. 28.100 / 55. 28.100
  libavcodec     57. 48.101 / 57. 48.101
  libavformat    57. 41.100 / 57. 41.100
  libavdevice    57.  0.102 / 57.  0.102
  libavfilter     6. 47.100 /  6. 47.100
  libavresample   3.  0.  0 /  3.  0.  0
  libswscale      4.  1.100 /  4.  1.100
  libswresample   2.  1.100 /  2.  1.100
  libpostproc    54.  0.100 / 54.  0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/root/vatic/data/videos_in/testvideo.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.12.100
  Duration: 00:00:20.09, start: 0.000000, bitrate: 595 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 480x270, 583 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 2 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Please use -b:a or -b:v, -b is ambiguous
[swscaler @ 0x2273fe0] deprecated pixel format used, make sure you did set range correctly
[image2 @ 0x223ae80] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
Output #0, image2, to '/tmp/pyvision-ffmpeg-408300912/%d.jpg':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf57.41.100
    Stream #0:0(und): Video: mjpeg, yuvj420p(pc), 480x270, q=2-31, 10000 kb/s, 29.97 fps, 29.97 tbn, 29.97 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      encoder         : Lavc57.48.101 mjpeg
    Side data:
      cpb: bitrate max/min/avg: 0/0/10000000 buffer size: 0 vbv_delay: -1
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Press [q] to stop, [?] for help
frame=  602 fps=0.0 q=1.6 Lsize=N/A time=00:00:20.08 bitrate=N/A speed=24.9x    
video:20290kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Decoding frames 0 to 100
Decoding frames 100 to 200
Decoding frames 200 to 300
Decoding frames 300 to 400
Decoding frames 400 to 500
Decoding frames 500 to 600
Decoding frames 600 to 700
Checking integrity...
Searching for last frame...
Found 602 frames.
Binding labels and attributes...
Creating symbolic link...
Creating segments...
Video loaded and ready for publication.
http://localhost/?id=1&hitId=offline
http://localhost/?id=2&hitId=offline
http://localhost/?id=3&hitId=offline
root@acd3861df5f7:~/vatic#

３−２．Implementation of annotation work

Launch your browser and enter http://localhost:8111/directory/ in the address field to access it.

It should be displayed as shown below.

Click "Video Segment".

It will transition to the editing screen as shown below.

Click the + New Object button.

Enclose the feature in the bounding box and click on the "Car" option.

Operate the slide bar to advance the video a little.

Slide the bounding box with the mouse to correct it to the correct position.

Again, move the slide bar to advance the video a bit.

Again, slide the bounding box with the mouse to correct it to the correct position.

Move the slide bar until the object disappears from the screen.

Check "Outside of view frame" so that the bounding box is not recognized in the current frame.

It was only about 3 annotations, but even with this one, you should be able to annotate with quite high accuracy. Let's play the video to see how beautifully annotated it is.

Click the Rewind button to return the video to its initial position.

Let's play, annotation video!!

３−３．Save work content

If you want to save the progress of the work, click the Save Work button.

３−４．Convert labelme format to Pascal VOC format

command_sample

# mkdir -p data/VOCdevkit/VOC2007
# turkic dump currentvideo --pascal --output /root/vatic/data/VOCdevkit/VOC2007 2>&1; mysqldump --user root --all-databases > data/db.mysql

３−５．【Host PC】 Convert Pascal VOC format to TFRecord format

Execute the following command on the host PC side.

$ apt-get update;apt-get upgrade -y
$ apt-get install -y protobuf-compiler python-pil python-lxml python-tk \
autoconf automake libtool curl make g++ unzip wget git nano \
libgflags-dev libgoogle-glog-dev liblmdb-dev libleveldb-dev \
libhdf5* python3-dev python3-numpy python3-skimage gfortran libturbojpeg \
python-dev python-numpy python-skimage python3-pip python-pip \
libboost-all-dev libopenblas-dev libsnappy-dev software-properties-common \
protobuf-compiler python-pil python-lxml python-tk libfreetype6-dev pkg-config libpng12*

$ sudo -H pip3 install pip==18.0.0 --upgrade
$ sudo -H pip3 install Cython opencv-python lxml
$ sudo -H pip3 install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
$ sudo -H pip3 install six numpy --ignore-installed --upgrade
$ sudo -H pip3 install tensorflow==1.12.0 --upgrade
$ wget https://github.com/protocolbuffers/protobuf/archive/v3.7.0.zip
$ unzip v3.7.0.zip;rm v3.7.0.zip;cd protobuf-3.7.0
$ ./autogen.sh
$ ./configure
$ make -j$(($(nproc) + 1))
$ make install
$ cd python
$ export LD_LIBRARY_PATH=../src/.libs
$ python3 setup.py build --cpp_implementation
$ python3 setup.py test --cpp_implementation
$ python3 setup.py install --cpp_implementation
$ ldconfig

$ cd ../..
$ git clone https://github.com/tensorflow/models.git
$ cd models/research
$ sed -i "s%category_name = unicode(category_name, 'utf-8')%category_name = str(category_name, 'utf-8')%g" "object_detection/utils/object_detection_evaluation.py"
$ sed -i "s%<folder>/root/vatic/data/VOCdevkit/VOC2007</folder>%<folder>${HOME}/vatic-docker/data/VOCdevkit/VOC2007</folder>%g" ${HOME}/vatic-docker/data/VOCdevkit/VOC2007/Annotations/*

### Create_label_map.pbtxt
$ nano ~/vatic-docker/data/label_map.pbtxt

item {
  id: 1
  name: 'Car'
}

item {
  id: 2
  name: 'Person'
}

item {
  id: 3
  name: 'Bicycle'
}

$ protoc object_detection/protos/*.proto --python_out=.
$ sed -i "s%'aeroplane_' + FLAGS.set + '.txt'%'Car_' + FLAGS.set + '.txt'%g" object_detection/dataset_tools/create_pascal_tf_record.py
$ sudo chmod 777 -R ${HOME}/vatic-docker/data/*

$ python3 object_detection/dataset_tools/create_pascal_tf_record.py \
  --label_map_path="${HOME}/vatic-docker/data/label_map.pbtxt" \
  --data_dir="${HOME}/vatic-docker/data/VOCdevkit" \
  --year=VOC2007 \
  --set=train \
  --output_path="${HOME}/vatic-docker/data/pascal_train.record" \
  --ignore_difficult_instances=True

$ python3 object_detection/dataset_tools/create_pascal_tf_record.py \
  --label_map_path="${HOME}/vatic-docker/data/label_map.pbtxt" \
  --data_dir="${HOME}/vatic-docker/data/VOCdevkit" \
  --year=VOC2007 \
  --set=trainval \
  --output_path="${HOME}/vatic-docker/data/pascal_val.record" \
  --ignore_difficult_instances=True

４．Finally

終盤のTFRecord生成手順はかなり手を抜きました。 create_pascal_tf_record.py を要件に合うようにちゃんと修正したほうが良いと思います。
また、流用させていただいた VATIC の Docker File は Ubuntu 14.04 (Trusty) のイメージが生成されますので色々とハマりました。やむなくホストPC側で最後の手順を実施しましたが、本来は Docker File を修正して、 Ubuntu 16.04 (xenial) 以降の新しいイメージで作業したほうが気持ち良いと思います。
作業手順を行ったり来たりしながら記事を書いたため誤りがあるかもしれません。お気づきの際はご指摘いただけますと幸いです。

５．Reference articles

cvondrick/vatic - Github - contrib_branch

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up