TensorflowLite-UNet
Tensorflow-ENet
Tensorflow-ENet2
ICNet-tensorflow
OpenVINO-ADAS
OpenVINO-DeeplabV3
I wrote an English translation at the end of the article, here
#◆ はじめに
まず、衝撃的な結論から先に言おう。
OpenVINOでIntelのCPU向けにモデルを最適化した場合、Neural Compute Stick 2を使用するよりも、CPU実行のほうが推論が速い。
公式フォーラムで各国のエンジニアと議論した内容は コチラ
あふれ出るワクワクの衝動を抑えきれない、とにかくセッカチで超絶困ったさんなあなたは コチラ から評価結果へショートカット可能。
最終的に シングルボードコンピュータのCPUのみ
で下図のようなスピードでのセグメンテーションを実現する。
自動運転で使えるスピードには程遠いが、CPU Onlyということを鑑みるとめちゃくちゃ速い。
分かる人にしか分からない結果。
MobileNet-SSD + RaspberryPi で Neural Compute Stick は使い倒した。
↓ クリックでYoutube再生される
Qiita記事 にしたり、 Githubにコミット したり、この縁あって 本家Tensorflowリポジトリ にPullRequestしてみたり、CaffeとTensorflowのDeepLearning基礎の基礎を学べたことで、自己成長という点では完全にモトがとれたと思っている。
2018/11/14 Neural Compute Stick 2 (初代の8倍の性能)
(←本家URLのリンク) の発売が決定したが、初代発売後1年経過した今でもなおSDKの品質が とてつもなく低過ぎ
かつ NCS2 は RaspberryPi(ARM)には対応していない
ので、購入するかどうかは少しだけ迷ったが、やっぱり我慢できずに購入してしまった。
v1 と全く同じサイズ。 そして、Intel買収先の Movidius
の文字が消えた。
できればもう少し横幅を縮めて欲しかったなぁ。。。
Ver2は、わずか3本で Jetson TX2 の性能を凌駕するということで、ワクワクさんの心がうずく。
Neural Compute Stick v1 ... 100GFLOPS
Neural Compute Stick v2 ... 800GFLOPS (単純に8倍してみた)
TX2 ... 2TFLOPS
実は NCSDK の品質が著しく低いため、SDKの品質に左右されそうにない OpenVINO
を試したくなったと言っても過言ではない。
今回は、11月初旬に予約販売で手に入れた LattePanda Alpha 864 (OS無し)
に Ubuntu16.04
を導入し、更に OpenVINO
を導入して Neural Compute Stick
および Neural Compute Stick 2
のカスタムセグメンテーションモデルの動作検証を行う。
LattePanda Alpha を調達した目的は、 シングルボードコンピュータ上での Neural Compute Stick + OpenVINO
の有用性検証のため。
コスト面において趣味で取り組むレベルを完全に超えているため、みなさんは決してマネしないように。
OpenVINO
は Caffe
, TensorFlow
, MXNet
, Kaldi
, ONNX
でそれぞれ生成したモデルを、共通フォーマットの中間バイナリ(IR [Intermediate representation of the model])へ変換し、推論エンジンAPI(Inference Engine)を経由して共通的に実行できる、というもの。
なお、実行基盤は ARMアーキテクチャ
には対応しておらず、Intel の x86/64系CPU
にしか対応していない。
1. Develop Multiplatform Computer Vision Solutions - Intel Developer Zone
2. Install the Intel® Distribution of OpenVINO™ toolkit for Linux - Intel Developer Zone
3. How to Integrate the Inference Engine in Your Application - Intel Inference Engine Developer Guide
4. Accelerate Deep Learning Inference with Integrated Intel® Processor Graphics Rev 2.0 - Intel Developer Zone
#◆ LattePanda Alpha の外観
1.外箱1
2.外箱2 (渋い)
3.中箱 (パンダの顔があしらってある)
4.同梱品一式 (ケースは付属しない)
5.タバコの箱と比較したサイズ感 (縦横はRaspberryPiより若干大きいが、反面で薄く、タバコの箱の半分ぐらいの厚み)
#◆ LattePanda Alpha のスペック
無駄にハイスペック。
-
価格:
-
OS無し版:$358 (¥40,000)
-
Win10 バンドル版:$398 (¥45,000)
-
CPU:
-
Intel 7th Gen Core m3-7y30
-
Core:
-
1.6-2.6GHz Dual-Core,Four-Thread
-
Benchmark (PassMark):
-
Up to 3500, double computing power compared with same price range products in the market
-
Graphics:
-
Intel HD Graphics 615, 300-900MHz
-
RAM:
-
8G LPDDR3 1866MHz Dual-Channel
-
Memory:
-
64GB eMMC V5.0l
-
External Memory:
-
1x M.2 M Key, PCIe 4x, Supports NVMe SSD and SATA SSD
-
1x M.2 E Key, PCIe 2x,Supports USB2.0, UART, PCM
-
Connectivity:
-
Dual Band Bluetooth 4.2
-
Gigabyte Ethernet
-
USB Ports:
-
3x USB 3.0 Type A
-
1x USB Type C, supports PD, DP, USB 3.0
-
Display:
-
HDMI Output
-
Type-C DP Support
-
Extendable eDP touch displays
-
Co-processor:
-
Arduino Leonardo
-
GPIO & Other Features:
-
2x 50p GPIOs including I2C
-
I2S, USB
-
RS232
-
UART
-
RT
-
Power Managemen
-
Extendable power button
-
OS Support:
-
Windows 10 Pro
-
Linux Ubuntu
#◆ キッティングに使用した部材
- Windows 10 PC (Ubuntu1604のUSB起動メディアが作成できる環境なら何でもOK)
- LattePanda Alpha
- Intel Movidius Neural Compute Stick v1 / v2
- USBメモリ 16GB
- HDMIケーブル
- HDMIディスプレイ
- USBキーボード
- USBマウス
#◆ 導入・使用ソフト
- Ubuntu 16.04 x86_64
- OpenVINO toolkit 2018 R4 (2018.4.420)
- Python 3.5
- OpenCV 3.4.3 (pip3インストール)
- Rufus v3.3
- Tensorflow v1.11.0 (pip3インストール)
#◆ Ubuntu16.04のインストール
##● Windows 10 PC での作業 (Ubuntu1604のUSBフラッシュドライブ作成)
1.Ubuntu16.04.5 Desktop のイメージダウンロード (1.5GB)
http://releases.ubuntu.com/releases/16.04/ubuntu-16.04.5-desktop-amd64.iso
2.USBフラッシュドライブ作成用ツール Rufus
のダウンロード
公式ページ - Rufus - Japanese
ダウンロード用リンク https://github.com/pbatard/rufus/releases/download/v3.3/rufus-3.3.exe
3.USBメモリをWindows10PCへ挿入する
4.Rufus(rufus-3.3.exe)
を起動し、DD
モードでUbuntu16.04イメージを書き込む
Rufusのメイン画面 (スタートボタン押下後にDDモード指定のダイアログが表示される)
DDモードの指定
書き込み中の様子
5.USBメモリをWindows10PCから外す
##● LattePanda Alpha 864 での作業
6.Wi-Fiアンテナ、キーボード、マウス、HDMIケーブル/ディスプレイ、USBメモリを LattePanda Alphaへ接続し、最後に電源を接続する
例1) Wi-Fiアンテナの接続 (Alphaの場合、アンテナは2本ある)
例2) HDMIケーブルの接続
例3) 全部材接続済みの様子 (電源OFF、青色LEDが消灯した瞬間を撮影してしまった)
例4) 電源のTpye-Cケーブルを接続
Type-Cケーブルを接続すると、通電確認用の赤色LEDが常時点灯状態になり、青色のLEDが一瞬点灯する。
青色LEDが明滅状態になるのを待ってから、電源ボタンを3秒間長押しすると電源がONになり、青色LEDが常時点灯状態になる。
7.LattePanda Alpha の電源がONになると同時にキーボードの Esc
キーを連打する
8.Boot
→ Boot Option #1
を選択して Enter
9.USBメモリの名前 + Partition1
を選択して Enter
10.Save & Exit
→ Save Changes and Exit
を選択して Enter
11.Yes
を選択して Enter
12.Install Ubuntu
を選択して Enter
13.しばらく待ち
14.English
を選択して Continue
15.Wi-Fiに接続する場合は、 Connect to this network
を選択し、一覧からSSIDを選択して Connect
16.Wi-Fiのパスワードを入力して Connect
17.Install third-party software for graphics and Wi-Fi hardware, Flash, MP3 and other media
を選択し、Continue
18.Erase disk and install Ubuntu
を選択し、Install Now
19.Continue
20.Tokyo
を選択し、Continue
21.左右の欄からそれぞれ Japanese
を選択し、Continue
22.ユーザIDや端末名、パスワードを入力し、Continue
23.しばらく待ち
24.Restart Now
※再起動が始まるが、うまくいかない場合は一度電源ケーブルを抜き差しして再度電源をONにする
25.Ubuntu16.04の起動完了、あっけなく正常起動した。
26.ログオンしたあとでターミナルを起動し、アップデートだけ行っておく。
$ sudo apt-get update
$ sudo apt-get upgrade
う〜む、シングルボードコンピュータとは思えない異常な快適さ。
公式インストール手順
http://docs.lattepanda.com/content/alpha_edition/power_on/
#◆ OpenVINOのインストール
インストール対象のOpenVINOバージョン: 2018.4.420
##● OpenVINO本体のインストール
AIを始めよう!OpenVINOのインストールからデモの実行まで - Qiita - ammo0613さん の記事を参考に OpenVINO を導入する。
しっかりと手順を記載いただいているため、ココでは取り立てて記載をしない。
ただし、ツールキットがバージョンアップするごとに少しづつコマンドスクリプトが変更されているため、 公式のチュートリアル を併せて参照しながら作業を進めることを推奨する。
##● Intel Movidius Neural Compute Stick v1/v2 のための追加インストール
下記のコマンドを実行する。
$ cd ~
$ sudo usermod -a -G users "$(whoami)"
$ sudo cat <<EOF > 97-usbboot.rules
SUBSYSTEM=="usb", ATTRS{idProduct}=="2150", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
EOF
$ sudo cp 97-usbboot.rules /etc/udev/rules.d/
$ sudo udevadm control --reload-rules
$ sudo udevadm trigger
$ sudo ldconfig
$ sudo rm 97-usbboot.rules
sudo ldconfig
を実行したときに下記のようなエラーが発生した。
シンボリックが正しく張れていないようだ。
alpha@LattePandaAlpha:~$ sudo ldconfig
/sbin/ldconfig.real: /opt/intel/common/mdf/lib64/igfxcmrt64.so is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libmfxhw64.so.1 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libmfx.so.1 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libva-glx.so.2 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libva.so.2 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libigdgmm.so.1 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libva-drm.so.2 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libva-x11.so.2 is not a symbolic link
調べると、各ファイル下記のような状況になっていた。
さすがIntel、 期待を裏切ったためしが無いぜ。
./igfxcmrt64.so
./libigfxcmrt64.so
./libmfxhw64.so
./libmfxhw64.so.1
./libmfxhw64.so.1.28
./libmfx.so
./libmfx.so.1
./libmfx.so.1.28
./libva-glx.so
./libva-glx.so.2
./libva-glx.so.2.300.0
./libva.so
./libva.so.2
./libva.so.2.300.0
./libigdgmm.so
./libigdgmm.so.1
./libigdgmm.so.1.0.0
./libva-drm.so
./libva-drm.so.2
./libva-drm.so.2.300.0
./libva-x11.so
./libva-x11.so.2
./libva-x11.so.2.300.0
下記コマンドにより、シンボリックをマニュアルで作成する。
$ cd /opt/intel/common/mdf/lib64
$ sudo mv igfxcmrt64.so igfxcmrt64.so.org
$ sudo ln -s libigfxcmrt64.so igfxcmrt64.so
$ cd /opt/intel/mediasdk/lib64
$ sudo mv libmfxhw64.so.1 libmfxhw64.so.1.org
$ sudo mv libmfx.so.1 libmfx.so.1.org
$ sudo mv libva-glx.so.2 libva-glx.so.2.org
$ sudo mv libva.so.2 libva.so.2.org
$ sudo mv libigdgmm.so.1 libigdgmm.so.1.org
$ sudo mv libva-drm.so.2 libva-drm.so.2.org
$ sudo mv libva-x11.so.2 libva-x11.so.2.org
$ sudo ln -s libmfxhw64.so.1.28 libmfxhw64.so.1
$ sudo ln -s libmfx.so.1.28 libmfx.so.1
$ sudo ln -s libva-glx.so.2.300.0 libva-glx.so.2
$ sudo ln -s libva.so.2.300.0 libva.so.2
$ sudo ln -s libigdgmm.so.1.0.0 libigdgmm.so.1
$ sudo ln -s libva-drm.so.2.300.0 libva-drm.so.2
$ sudo ln -s libva-x11.so.2.300.0 libva-x11.so.2
気を取り直してもう一度 sudo ldconfig
を実行する。
$ cd ~
$ sudo ldconfig
今度は正常に終了した。
デフォルトで導入される OpenCV4.0.0-pre
には Gstreamer のバグが有ってまともに動かなかったので、自力で OpenCV3.4.3
を導入し直す。
下記コマンドを実行する。
$ sudo -H pip3 install opencv-python==3.4.3.18
$ nano ~/.bashrc
export PYTHONPATH=/usr/local/lib/python3.5/dist-packages/cv2:$PYTHONPATH
$ source ~/.bashrc
公式インストール手順
Intel®Movidius™Neural Compute StickおよびIntel®Neural Compute Stick 2の追加インストール手順
##● Tensorflow v1.11.0 へのアップグレード
後続のモデルオプティマイザの処理でエラーが発生するため、デフォルトで導入される古いバージョンの Tensorflow v1.9.0
を、 Tensorflow v1.11.0
へアップグレードする。
$ python3 -c 'import tensorflow as tf; print(tf.__version__)'
1.9.0
$ sudo -H pip3 install pip --upgrade
$ sudo -H pip3 install tensorflow==1.11.0 --upgrade
##● カスタムレイヤの動作をTensorflowへオフロードするための設定
Intel公式チュートリアル - Offloading Computations to TensorFlow*
OpenVINOの標準APIでサポートされないカスタムレイヤの操作をTensorflow側にオフロードすることができる。
特定のオペレーションだけを切り出して Tensorflow側
に動作を一任することができる仕組みは面白い。
下記コマンドを実行し、Tensorflowランタイムを使用して推論エンジンレイヤを自力ビルドする。
ただし、Intelが提供するスクリプトにバグがあるため、一部マニュアルで修正する必要がある。
また、このタイミングで導入する Bazel
は 0.18.1
である必要がある。
2018年11月17日時点では 0.19.0
以上だと推論エンジンレイヤが正常にビルドできないため注意。
LattePanda AlphaのようにRAMを潤沢に搭載していない端末、例えば RAM 1GB の場合は、
sudo -H $HOME/bin/bazel build --config monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so
を
sudo -H $HOME/bin/bazel --host_jvm_args=-Xmx512m build --config monolithic --local_resources 1024.0,0.5,0.5 //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so
のように読み替えると成功する可能性がある。
$ sudo apt-get install -y git pkg-config zip g++ zlib1g-dev unzip
$ cd ~
$ wget https://github.com/bazelbuild/bazel/releases/download/0.18.1/bazel-0.18.1-installer-linux-x86_64.sh
$ sudo chmod +x bazel-0.18.1-installer-linux-x86_64.sh
$ ./bazel-0.18.1-installer-linux-x86_64.sh --user
$ echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
$ source ~/.bashrc
$ cd /opt
$ sudo git clone -b v1.11.0 https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ sudo git checkout -b v1.11.0
$ echo 'export TF_ROOT_DIR=/opt/tensorflow' >> ~/.bashrc
$ source ~/.bashrc
$ sudo nano /opt/intel/computer_vision_sdk/bin/setupvars.sh
#Before
INSTALLDIR=/opt/intel//computer_vision_sdk_2018.4.420
↓
#After
INSTALLDIR=/opt/intel/computer_vision_sdk_2018.4.420
$ source /opt/intel/computer_vision_sdk/bin/setupvars.sh
$ sudo nano /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer/tf_call_ie_layer/build.sh
#Before
bazel build --config=monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so
↓
#After
sudo -H $HOME/bin/bazel build --config monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so
$ sudo -E /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer/tf_call_ie_layer/build.sh
推論エンジンレイヤは下記のパスに生成される。
/opt/tensorflow/bazel-bin/tensorflow/cc/inference_engine_layer/libtensorflow_call_layer.so
このままでは python 実行時に、一般ユーザーによる /opt 配下へのアクセス権限が無く Permission denied
エラーが発生するため、配置先を変更する。
$ su -
$ cp /opt/tensorflow/bazel-bin/tensorflow/cc/inference_engine_layer/libtensorflow_call_layer.so /usr/local/lib
$ exit
$ nano ~/.bashrc
export PYTHONPATH=$PYTHONPATH:/usr/local/lib
$ source ~/.bashrc
$ sudo ldconfig
#◆ デモプログラムの味見
##● 画像分類のサンプル
下記のコマンドを実行する。
$ cd /opt/intel/computer_vision_sdk/deployment_tools/demo
$ ./demo_squeezenet_download_convert_run.sh
下図の画像を読み込んで。。。
80%の確率で スポーツカー
と認識したようだ。
表面上は普通過ぎてつまらんね。
ただ、 153 FPS
とか異次元の計測値が出ているw
###################################################
Run Inference Engine classification sample
Run ./classification_sample -d CPU -i /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png -m /home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.xml
[ INFO ] InferenceEngine:
API version ............ 1.4
Build .................. 17328
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ] /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png
[ INFO ] Loading plugin
API version ............ 1.4
Build .................. lnx_20181004
Description ....... MKLDNNPlugin
[ INFO ] Loading network files:
/home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.xml
/home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.bin
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (787, 259) to (227, 227)
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (1 iterations)
[ INFO ] Processing output blobs
Top 10 results:
Image /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png
817 0.8363345 label sports car, sport car
511 0.0946488 label convertible
479 0.0419131 label car wheel
751 0.0091071 label racer, race car, racing car
436 0.0068161 label beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon
656 0.0037564 label minivan
586 0.0025741 label half track
717 0.0016069 label pickup, pickup truck
864 0.0012027 label tow truck, tow car, wrecker
581 0.0005882 label grille, radiator grille
total inference time: 6.5318211
Average running time of one iteration: 6.5318211 ms
Throughput: 153.0966609 FPS
[ INFO ] Execution successful
###################################################
Demo completed successfully.
##● 3段階推論のサンプル
3段階の推論を別々の学習モデルで連続実行させるサンプルのようだ。
アイデアとしては、私のような者でも思いつくほどにありきたり。
- 車の検出(黒い車 だとか 白い車だとかの属性含む)
- ナンバープレートの検出
- 識別したナンバープレート内の文字認識
下記のコマンドを実行する。
$ cd /opt/intel/computer_vision_sdk/deployment_tools/demo
$ ./demo_security_barrier_camera.sh
##● 上記以外の各種サンプルプログラム
Intel® Distribution of OpenVINO™ Toolkit - Inference Engine Samples
#◆ 独自モデルの変換と実行サンプルスクリプト
公式チュートリアル - Using the Model Optimizer to Convert TensorFlow* Models
公式チュートリアル - Model Optimizer Developer Guide - TensorFlow* Models with Custom Layers
下記が Tensorflow
の .pb
(FreezeGraph) を OpenVINO用の IR形式
に変換するサンプルスクリプト。
$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer
$ python3 mo_tf.py --input_model <INPUT_MODEL>.pb
**変換コマンドのオプションと説明**
optional arguments:
-h, --help show this help message and exit
--framework {tf,caffe,mxnet,kaldi,onnx}
Name of the framework used to train the input model.
Framework-agnostic parameters:
--input_model INPUT_MODEL, -w INPUT_MODEL, -m INPUT_MODEL
Tensorflow*: a file with a pre-trained model (binary
or text .pb file after freezing). Caffe*: a model
proto file with model weights
--model_name MODEL_NAME, -n MODEL_NAME
Model_name parameter passed to the final create_ir
transform. This parameter is used to name a network in
a generated IR and output .xml/.bin files.
--output_dir OUTPUT_DIR, -o OUTPUT_DIR
Directory that stores the generated IR. By default, it
is the directory from where the Model Optimizer is
launched.
--input_shape INPUT_SHAPE
Input shape(s) that should be fed to an input node(s)
of the model. Shape is defined as a comma-separated
list of integer numbers enclosed in parentheses or
square brackets, for example [1,3,227,227] or
(1,227,227,3), where the order of dimensions depends
on the framework input layout of the model. For
example, [N,C,H,W] is used for Caffe* models and
[N,H,W,C] for TensorFlow* models. Model Optimizer
performs necessary transformations to convert the
shape to the layout required by Inference Engine
(N,C,H,W). The shape should not contain undefined
dimensions (? or -1) and should fit the dimensions
defined in the input operation of the graph. If there
are multiple inputs in the model, --input_shape should
contain definition of shape for each input separated
by a comma, for example: [1,3,227,227],[2,4] for a
model with two inputs with 4D and 2D shapes.
--scale SCALE, -s SCALE
All input values coming from original network inputs
will be divided by this value. When a list of inputs
is overridden by the --input parameter, this scale is
not applied for any input that does not match with the
original input of the model.
--reverse_input_channels
Switch the input channels order from RGB to BGR (or
vice versa). Applied to original inputs of the model
if and only if a number of channels equals 3. Applied
after application of --mean_values and --scale_values
options, so numbers in --mean_values and
--scale_values go in the order of channels used in the
original model.
--log_level {CRITICAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}
Logger level
--input INPUT The name of the input operation of the given model.
Usually this is a name of the input placeholder of the
model.
--output OUTPUT The name of the output operation of the model. For
TensorFlow*, do not add :0 to this name.
--mean_values MEAN_VALUES, -ms MEAN_VALUES
Mean values to be used for the input image per
channel. Values to be provided in the (R,G,B) or
[R,G,B] format. Can be defined for desired input of
the model, for example: "--mean_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--scale_values SCALE_VALUES
Scale values to be used for the input image per
channel. Values are provided in the (R,G,B) or [R,G,B]
format. Can be defined for desired input of the model,
for example: "--scale_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--data_type {FP16,FP32,half,float}
Data type for all intermediate tensors and weights. If
original model is in FP32 and --data_type=FP16 is
specified, all model weights and biases are quantized
to FP16.
--disable_fusing Turn off fusing of linear operations to Convolution
--disable_resnet_optimization
Turn off resnet optimization
--finegrain_fusing FINEGRAIN_FUSING
Regex for layers/operations that won't be fused.
Example: --finegrain_fusing Convolution1,.*Scale.*
--disable_gfusing Turn off fusing of grouped convolutions
--move_to_preprocess Move mean values to IR preprocess section
--extensions EXTENSIONS
Directory or a comma separated list of directories
with extensions. To disable all extensions including
those that are placed at the default location, pass an
empty string.
--batch BATCH, -b BATCH
Input batch size
--version Version of Model Optimizer
--silent Prevent any output messages except those that
correspond to log level equals ERROR, that can be set
with the following option: --log_level. By default,
log level is already ERROR.
--freeze_placeholder_with_value FREEZE_PLACEHOLDER_WITH_VALUE
Replaces input layer with constant node with provided
value, e.g.: "node_name->True"
--generate_deprecated_IR_V2
Force to generate legacy/deprecated IR V2 to work with
previous versions of the Inference Engine. The
resulting IR may or may not be correctly loaded by
Inference Engine API (including the most recent and
old versions of Inference Engine) and provided as a
partially-validated backup option for specific
deployment scenarios. Use it at your own discretion.
By default, without this option, the Model Optimizer
generates IR V3.
**Tensorflow固有の変換コマンドのオプションと説明**
TensorFlow*-specific parameters:
--input_model_is_text
TensorFlow*: treat the input model file as a text
protobuf format. If not specified, the Model Optimizer
treats it as a binary file by default.
--input_checkpoint INPUT_CHECKPOINT
TensorFlow*: variables file to load.
--input_meta_graph INPUT_META_GRAPH
Tensorflow*: a file with a meta-graph of the model
before freezing
--saved_model_dir SAVED_MODEL_DIR
TensorFlow*: directory representing non frozen model
--saved_model_tags SAVED_MODEL_TAGS
Group of tag(s) of the MetaGraphDef to load, in string
format, separated by ','. For tag-set contains
multiple tags, all tags must be passed in.
--offload_unsupported_operations_to_tf
TensorFlow*: automatically offload unsupported
operations to TensorFlow*
--tensorflow_subgraph_patterns TENSORFLOW_SUBGRAPH_PATTERNS
TensorFlow*: a list of comma separated patterns that
will be applied to TensorFlow* node names to infer a
part of the graph using TensorFlow*.
--tensorflow_operation_patterns TENSORFLOW_OPERATION_PATTERNS
TensorFlow*: a list of comma separated patterns that
will be applied to TensorFlow* node type (ops) to
infer these operations using TensorFlow*.
--tensorflow_custom_operations_config_update TENSORFLOW_CUSTOM_OPERATIONS_CONFIG_UPDATE
TensorFlow*: update the configuration file with node
name patterns with input/output nodes information.
--tensorflow_use_custom_operations_config TENSORFLOW_USE_CUSTOM_OPERATIONS_CONFIG
TensorFlow*: use the configuration file with custom
operation description.
--tensorflow_object_detection_api_pipeline_config TENSORFLOW_OBJECT_DETECTION_API_PIPELINE_CONFIG
TensorFlow*: path to the pipeline configuration file
used to generate model created with help of Object
Detection API.
--tensorboard_logdir TENSORBOARD_LOGDIR
TensorFlow*: dump the input graph to a given directory
that should be used with TensorBoard.
--tensorflow_custom_layer_libraries TENSORFLOW_CUSTOM_LAYER_LIBRARIES
TensorFlow*: comma separated list of shared libraries
with TensorFlow* custom operations implementation.
--disable_nhwc_to_nchw
Disables default translation from NHWC to NCHW
#◆ 自力生成モデル・Semantic Segmentation 「UNet」 の変換
さて、ようやくここからが今回の検証の本題。
Intel Movidius Neural Compute Stick の 純正SDK 「NCSDK v2.x」 では実行できなかったモデルが、OpenVINO上では動作するかどうかを検証する。
私としては、このモデルがNCSによる推論に成功するだけで歓喜。
なお、この記事は LattePanda Alpha
の性能を検証することが目的ではないことを改めて周知させていただく。
まずは、構造が超シンプルな UNet
からトライする。
.pbファイルは TensorflowLite-UNet - PINTO0309 - Github に配置してあるものを使用する。
こちらは、 Personクラス
のみに限定して学習させた Semantic Segmentation のモデルだ。
TensorflowLite-UNet/model/semanticsegmentation_frozen_person_32.pb
(31.1MB)
##● データタイプ FP16 への変換
下記のコマンドを実行する。
--input_model
は、変換対象とする.pbファイル名 (FreezeGraph名)
--output_dir
は、変換後の lrファイル の出力先パス
--input
は、入力ノード名 (プレースホルダ名)
--output
は、出力ノード名
--data_type
は、変換後のデータ精度型名 [FP16/FP32/half/float]
--batch
は、入力バッチサイズの強制置換 (.pbの入力形状がバッチサイズ不定 [-1, 256, 256, 3] のような時に -1 の部分を強制的に置換することができる、OpenVINOは バッチサイズ = -1 を許容しないらしい)
--scale
は、BGR各値を 255 (UInt8) で割り算し、0~1 の値範囲へ正規化するときに使用する指定
--mean_values
は、ピクセル単位での BGR値 の平均減算値を指定
--offload_unsupported_operations_to_tf
は、OpenVINOで処理できないTensorflowのカスタムレイヤーをTensorflow側にオフロードして処理させるための指定
$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer
$ sudo mkdir -p 01_pbmodels/UNet
$ sudo mkdir -p 10_lrmodels/UNet/FP16
$ sudo wget https://github.com/PINTO0309/TensorflowLite-UNet/raw/master/model/semanticsegmentation_frozen_person_32.pb -P 01_pbmodels/UNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
--output_dir 10_lrmodels/UNet/FP16 \
--input input \
--output output/BiasAdd \
--data_type FP16 \
--batch 1
<RGB平均値算出の参考POST>
https://forums.fast.ai/t/images-normalization/4058
https://github.com/DrSleep/tensorflow-deeplab-resnet/issues/106
# 画像1枚あたりのRGB平均値算出
mean = np.mean(jpgimg, axis=(0, 1))
meanB += mean[0]
meanG += mean[1]
meanR += mean[2]
# 全学習画像のRGB平均値算出
print("meanB =", meanB / imgcnt)
print("meanG =", meanG / imgcnt)
print("meanR =", meanR / imgcnt)
どうにか変換には成功したようだ。
FP32のモデルからFP16へ変換したため、見た目上のファイルサイズが変換前の半分の15.5MBになった。
**lr変換ログ**
Model Optimizer arguments:
Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16
- IR output name: semanticsegmentation_frozen_person_32
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: output/BiasAdd
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP16
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: False
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d
[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16/semanticsegmentation_frozen_person_32.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16/semanticsegmentation_frozen_person_32.bin
[ SUCCESS ] Total execution time: 3.86 seconds.
##● データタイプ FP32 への変換
下記のコマンドを実行する。
$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer
$ sudo mkdir -p 01_pbmodels/UNet
$ sudo mkdir -p 10_lrmodels/UNet/FP32
$ sudo wget https://github.com/PINTO0309/TensorflowLite-UNet/raw/master/model/semanticsegmentation_frozen_person_32.pb -P 01_pbmodels/UNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
--output_dir 10_lrmodels/UNet/FP32 \
--input input \
--output output/BiasAdd \
--data_type FP32 \
--batch 1
こちらも成功したようだ。
元先で精度を変更していないため、最終アウトプットのファイルサイズに変化は無い。
**lr変換ログ**
Model Optimizer arguments:
Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32
- IR output name: semanticsegmentation_frozen_person_32
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: output/BiasAdd
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP32
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: False
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d
[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.bin
[ SUCCESS ] Total execution time: 3.70 seconds.
#◆ 自力生成モデル・Semantic Segmentation 「ENet」 の変換(その1)
こちらもNCSのSDK 「NCSDK v2.x」 では実行できなかったモデル。
ちなみに、本家の Tensorflow Lite
上でも現時点では素の状態で動作しない。
こちらも動作すれば歓喜。
.pbファイルは TensorFlow-ENet - PINTO0309 - Github に配置してあるものを使用する。
##● 【失敗】 データタイプ FP16 への変換
下記のコマンドを実行する。
$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer
$ sudo mkdir -p 01_pbmodels/ENet
$ sudo mkdir -p 10_lrmodels/ENet/FP16
$ sudo wget https://github.com/PINTO0309/TensorFlow-ENet/raw/pinto0309work/checkpoint/semanticsegmentation_enet.pb -P 01_pbmodels/ENet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/ENet/semanticsegmentation_enet.pb \
--output_dir 10_lrmodels/ENet/FP16 \
--input input \
--output ENet/logits_to_softmax \
--data_type FP16 \
--batch 1 \
--offload_unsupported_operations_to_tf \
--tensorflow_operation_patterns Range,ScatterNd
ダメだ、 ScatterNd
の EagerExecution で何故か型変換エラーが発生する。
どのような要素が足りていないのかが分からない。。。
[ ERROR ] Cannot infer shapes or values for node "TFSubgraphCall_2743".
[ ERROR ] Error converting shape to a TensorShape: only size-1 arrays can be converted to Python scalars.
[ ERROR ]
[ ERROR ] It can happen due to bug in custom shape infer function <function tf_subgraph_infer at 0x7fc3fc967400>.
[ ERROR ] Or because the node inputs have incorrect values/shapes.
[ ERROR ] Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ] Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ] Stopped shape/value propagation at "TFSubgraphCall_2743" node.
For more information please refer to Model Optimizer FAQ (<INSTALL_DIR>/deployment_tools/documentation/docs/MO_FAQ.html), question #38.
#◆ 自力生成モデル・Semantic Segmentation 「ENet」 の変換(その2)
オジサン、あきらめない! ( ✧Д✧) カッ!!
今度はコチラのリポジトリを拝借して、CPU対応とモデルサイズ縮小の独自カスタマイズを実施。
segmentation - fregu856 - Github
カスタマイズ後
Tensorflow-ENet2 - PINTO0309 - Github
##● 【失敗】 データタイプ FP16 への変換
下記のコマンドを実行する。
$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer
$ sudo mkdir -p 01_pbmodels/ENet
$ sudo mkdir -p 10_lrmodels/ENet/FP16
$ sudo wget https://github.com/PINTO0309/Tensorflow-ENet2/raw/master/training_logs/best_model/semanticsegmentation_frozen_enet.pb -P 01_pbmodels/ENet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/ENet/semanticsegmentation_frozen_enet.pb \
--output_dir 10_lrmodels/ENet/FP16 \
--input imgs_ph \
--output fullconv/Relu \
--data_type FP16 \
--batch 1 \
--offload_unsupported_operations_to_tf \
--tensorflow_operation_patterns Range,ScatterNd
ダメだ。 同じく ScatterNd
の EagerExecution で何故か型変換エラーが発生する。
ScatterNd
を使用せずに Upsampling
するにはどうしたら良いのか。。。分からない。。。
[ ERROR ] Cannot infer shapes or values for node "TFSubgraphCall_1695".
[ ERROR ] Error converting shape to a TensorShape: only size-1 arrays can be converted to Python scalars.
[ ERROR ]
[ ERROR ] It can happen due to bug in custom shape infer function <function tf_subgraph_infer at 0x7f425c167400>.
[ ERROR ] Or because the node inputs have incorrect values/shapes.
[ ERROR ] Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ] Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ] Stopped shape/value propagation at "TFSubgraphCall_1695" node.
For more information please refer to Model Optimizer FAQ (<INSTALL_DIR>/deployment_tools/documentation/docs/MO_FAQ.html), question #38.
★Unpoolingの別実装参考
https://assiaben.github.io/posts/2018-06-tf-unpooling/
https://github.com/assiaben/w/blob/master/unpooling/unpool_test.py
#◆ 他力本願モデル・ADAS(先進運転支援システム)向け Semantic Segmentation モデルの変換
くやしいけど、Intelが公式に公開してくれているサンプルモデルを使う。
使うと言っても、OpenVINO導入時に FP16 と FP32 それぞれの精度で変換済みのモデルが勝手にインストールされているようだ。
今回は Neural Compute Stick を使用するため、 FP16 の方を後続の工程で使用することにする。
/opt/intel/computer_vision_sdk/deployment_tools/intel_models/semantic-segmentation-adas-000/FP16
/opt/intel/computer_vision_sdk/deployment_tools/intel_models/semantic-segmentation-adas-000/FP32
semantic-segmentation-adas-0001.bin
semantic-segmentation-adas-0001.xml
#◆ 自力生成モデル・Semantic Segmentation 「ICNet」の変換
.pbファイルは ICNet-tensorflow - PINTO0309 - Github に配置してあるものを使用する。
##● データタイプ FP16 への変換
下記のコマンドを実行する。
$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer
$ sudo mkdir -p 01_pbmodels/ICNet
$ sudo mkdir -p 10_lrmodels/ICNet/FP16
$ sudo wget https://github.com/PINTO0309/ICNet-tensorflow/raw/pinto0309work/snapshots/semanticsegmentation_ICNet.pb -P 01_pbmodels/ICNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/ICNet/semanticsegmentation_ICNet.pb \
--output_dir 10_lrmodels/ICNet/FP16 \
--input input \
--output ResizeBilinear_19 \
--data_type FP16
成功したようだ。
**lr変換ログ**
Model Optimizer arguments:
Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/ICNet/semanticsegmentation_ICNet.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP16
- IR output name: semanticsegmentation_ICNet
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: ResizeBilinear_19
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP16
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: True
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d
[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP16/semanticsegmentation_ICNet.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP16/semanticsegmentation_ICNet.bin
[ SUCCESS ] Total execution time: 6.58 seconds.
##● データタイプ FP32 への変換
下記のコマンドを実行する。
$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer
$ sudo mkdir -p 01_pbmodels/ICNet
$ sudo mkdir -p 10_lrmodels/ICNet/FP32
$ sudo wget https://github.com/PINTO0309/ICNet-tensorflow/raw/pinto0309work/snapshots/semanticsegmentation_ICNet.pb -P 01_pbmodels/ICNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/ICNet/semanticsegmentation_ICNet.pb \
--output_dir 10_lrmodels/ICNet/FP32 \
--input input \
--output ResizeBilinear_19 \
--data_type FP32
こちらも成功したようだ。
**lr変換ログ**
Model Optimizer arguments:
Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/ICNet/semanticsegmentation_ICNet.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP32
- IR output name: semanticsegmentation_ICNet
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: ResizeBilinear_19
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP32
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: True
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d
[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP32/semanticsegmentation_ICNet.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP32/semanticsegmentation_ICNet.bin
[ SUCCESS ] Total execution time: 8.47 seconds.
#◆ OpenVINOによる UNet 実行環境の構築と実行
本当は ENet
を実装したかったが、ScatterNd
の変換エラーがどうしてもクリアできなかったため、仕方なく UNet
を実装してみる。
import sys
import cv2
import numpy as np
from PIL import Image
import time
from openvino.inference_engine import IENetwork, IEPlugin
model_xml='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.xml'
model_bin='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.bin'
net = IENetwork.from_ir(model=model_xml, weights=model_bin)
seg_image = Image.open("data/input/009649.png")
palette = seg_image.getpalette() # Get a color palette
index_void = 2 # Define index_void Back Ground
camera_width = 320
camera_height = 240
fps = ""
elapsedTime = 0
plugin = IEPlugin(device="HETERO:MYRIAD,CPU")
plugin.set_config({"TARGET_FALLBACK": "HETERO:MYRIAD,CPU"})
plugin.set_initial_affinity(net)
#plugin = IEPlugin(device="MYRIAD")
#plugin = IEPlugin(device="CPU")
exec_net = plugin.load(network=net)
input_blob = next(iter(net.inputs)) #input_blob = 'input'
out_blob = next(iter(net.outputs)) #out_blob = 'output/BiasAdd'
n, c, h, w = net.inputs[input_blob].shape #n, c, h, w = 1, 3, 256, 256
del net
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FPS, 30)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, camera_width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, camera_height)
time.sleep(1)
while cap.isOpened():
t1 = time.time()
ret, frame = cap.read()
if not ret:
break
#frame = cv2.imread('data/input/000003.jpg')
prepimg = frame[:, :, ::-1].copy()
#prepimg = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
prepimg = Image.fromarray(prepimg)
prepimg = prepimg.resize((256, 256), Image.ANTIALIAS)
prepimg = np.asarray(prepimg) / 255.0
prepimg = prepimg.transpose((2, 0, 1)).reshape((1, c, h, w))
t2 = time.perf_counter()
exec_net.start_async(request_id=0, inputs={input_blob: prepimg})
if exec_net.requests[0].wait(-1) == 0:
outputs = exec_net.requests[0].outputs[out_blob] # (1, 3, 256, 256)
print("SegmentationTime = {:.7f}".format(time.perf_counter() - t2))
outputs = outputs.transpose((2, 3, 1, 0)).reshape((h, w, c)) # (256, 256 3)
outputs = cv2.resize(outputs, (camera_width, camera_height)) # (240, 320, 3)
# View
res = np.argmax(outputs, axis=2)
if index_void is not None:
res = np.where(res == index_void, 0, res)
image = Image.fromarray(np.uint8(res), mode="P")
image.putpalette(palette)
image = image.convert("RGB")
image = np.asarray(image)
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
image = cv2.addWeighted(frame, 1, image, 0.9, 0)
cv2.putText(image, fps, (camera_width-180,15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (38,0,255), 1, cv2.LINE_AA)
cv2.imshow("Result", image)
if cv2.waitKey(1)&0xFF == ord('q'):
break
elapsedTime = time.time() - t1
fps = "(Playback) {:.1f} FPS".format(1/elapsedTime)
cv2.destroyAllWindows()
del exec_net
del plugin
◆ 処理速度の計測結果
おいっ! CPUのほうが速いぞ! Σ(゚ロ゚;)
まさかとは思うが、Intel 7th Gen Core m3-7y30
のほうが、専用AIチップの Myriad X
より性能が良い、なんてことはないよね???
MKL-DNN
ってそんなに強力なの???
本気で懐疑的過ぎて、即座に公式フォーラムへ issue を挙げてしまった。
とにかく、現状の計測方法では、何故か Neural Compute Stick を使わないほうがパフォーマンスが良い。
ただし、Neural Compute Stick v1
よりも Neural Compute Stick v2
のほうが2倍以上のパフォーマンスが出ていることは確かだ。
また、RaspberryPi3 の ARM CPU 単独で同じモデルを処理させたときは、11秒掛かっていたため、超爆速パフォーマンスになっているのも確かだ。
ちなみに参考に記載したプログラムは、コメント部を調整すればUSBカメラで撮影した動画をリアルタイムにセグメンテーションできるようにしてある。
USBカメラ撮影で、おおむね 4 FPS 〜 5 FPS の性能が出ていた。
しかしながら、精度は極悪なので使えたものではないけれど。。。
【2018/11/29追記】
海外のエンジニアと協力して検証し、 Intel Celeron
になら勝てる、との結論に至るw 検証結果は コチラ
◆ 公式フォーラムへ投稿したissue
https://software.intel.com/en-us/forums/computer-vision/topic/800215
#◆ 【一部成功】OpenVINOによるADAS用セグメンテーション実行環境の構築と実行
ダウンロード可能なモデルの一覧は下記を参照。
Intel公式のチュートリアルは雑過ぎるため、あえて Github上 の OpenCV のリポジトリから拝借しても良い。
内容を突き合わせしたが、OpenCVの内容のほうが新しいようだ。
ちなみに、OpenCVのリポジトリのほうが、ダウンロード可能なモデルの種類が豊富だ。
あくまで参考までに記載するため、あえて実施しなくてもこの先の検証は継続可能。
OpenCV - Github - Public Topologies Downloader
$ sudo -H pip3 install pyyaml requests
$ cd ~
$ git clone https://github.com/opencv/open_model_zoo.git
$ cd open_model_zoo/model_downloader
$ ./downloader.py --name semantic-segmentation-adas-0001
$ ./downloader.py --name semantic-segmentation-adas-0001-fp16
$ ./downloader.py --name チョメチョメ
:
では本題。上記を実施していなくてもココから先の作業を実施すれば良い。
下記コマンドを実行し、サンプルプログラムをビルドする。
$ sudo /opt/intel/computer_vision_sdk/deployment_tools/inference_engine/samples/build_samples.sh
何故か home/<username>/inference_engine_samples_build/intel64/Release
配下にビルド済みバイナリが生成される。
$ cd ~/inference_engine_samples_build/intel64
$ sudo chmod 777 Release
$ cd Release
$ ./segmentation_demo -h
[ INFO ] InferenceEngine:
API version ............ 1.4
Build .................. 17328
[ INFO ] Parsing input parameters
segmentation_demo [OPTION]
Options:
-h Print a usage message.
-i "<path>" Required. Path to an .bmp image.
-m "<path>" Required. Path to an .xml file with a trained model.
-l "<absolute_path>" Required for MKLDNN (CPU)-targeted custom layers. Absolute path to a shared library with the kernels impl.
Or
-c "<absolute_path>" Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels desc.
-pp "<path>" Path to a plugin folder.
-d "<device>" Specify the target device to infer on: CPU, GPU, FPGA or MYRIAD is acceptable. The demo will look for a suitable plugin for a specified device (CPU by default).
-ni "<integer>" Number of iterations (default 1)
-pc Enables per-layer performance report
まずは、 Neural Compute Stick モード
で実行する。
$ ./segmentation_demo \
-i test.png \
-m /opt/intel/computer_vision_sdk/deployment_tools/intel_models/semantic-segmentation-adas-000/FP16/semantic-segmentation-adas-0001.xml \
-d MYRIAD \
-pc
動かない。。。 ナメてんのか? いんてる。
まぁ、Python API 自体がまだお試し版リリースであることは理解している。
[ INFO ] InferenceEngine:
API version ............ 1.4
Build .................. 17328
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ] ./test.png
[ INFO ] Loading plugin
API version ............ 1.4
Build .................. 17328
Description ....... myriadPlugin
[ INFO ] Loading network files
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (512, 256) to (2048, 1024)
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ ERROR ] Cannot convert layer "argmax" due to unsupported layer type "ArgMax"
ArgMax
ぐらい、モデルの外で自力でプログラムを書いても1〜2行のことなので、変換前モデルを捜索するも、何故か caffemodel を非公開にしていらっしゃる。
呆れる。。。 品質の低さとユーザー軽視は、彼の ○i○r○s○f○ より酷い。
しかし、オジサンはイチイチメゲない。 次に行ってみよう。
次は、 CPU モード
で実行してみる。
$ ./segmentation_demo \
-i test.png \
-m /opt/intel/computer_vision_sdk/deployment_tools/intel_models/semantic-segmentation-adas-000/FP16/semantic-segmentation-adas-0001.xml \
-d CPU \
-pc
下図のテスト用画像をインプットすると。。。
なかなか綺麗にセグメンテーションされた。
CPU実行にもかかわらず、推論時間は 909 ms
だった。
かなり速い!!
#◆ OpenVINOによる ICNet 実行環境の構築
最後の砦、 ICNet
。 エッジセグメンテーションの未来は君の双肩に掛かっている。
下記プログラムを実行する。
今回は CPUエクステンションライブラリ
を有効にする。
import sys
import cv2
import numpy as np
from PIL import Image
import time
from openvino.inference_engine import IENetwork, IEPlugin
model_xml='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP32/semanticsegmentation_ICNet.xml'
model_bin='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP32/semanticsegmentation_ICNet.bin'
net = IENetwork.from_ir(model=model_xml, weights=model_bin)
seg_image = Image.open("data/input/009649.png")
palette = seg_image.getpalette() # Get a color palette
index_void = 2 # Define index_void Back Ground
camera_width = 320
camera_height = 240
fps = ""
elapsedTime = 0
#plugin = IEPlugin(device="HETERO:MYRIAD,CPU")
#plugin.set_config({"TARGET_FALLBACK": "HETERO:MYRIAD,CPU"})
#plugin.set_initial_affinity(net)
#plugin = IEPlugin(device="MYRIAD")
plugin = IEPlugin(device="CPU")
plugin.add_cpu_extension("/home/alpha/inference_engine_samples_build/intel64/Release/lib/libcpu_extension.so")
exec_net = plugin.load(network=net)
input_blob = next(iter(net.inputs)) #input_blob = 'input'
out_blob = next(iter(net.outputs)) #out_blob = 'ResizeBilinear_19'
#print(net.inputs[input_blob].shape)
h, w, c = net.inputs[input_blob].shape #h, w, c = 256, 512, 3
del net
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FPS, 30)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, camera_width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, camera_height)
time.sleep(1)
while cap.isOpened():
t1 = time.time()
#ret, frame = cap.read()
#if not ret:
# break
frame = cv2.imread('data/input/000003.jpg')
camera_height, camera_width, channels = frame.shape[:3]
prepimg = frame[:, :, ::-1].copy()
prepimg = Image.fromarray(prepimg)
prepimg = prepimg.resize((512, 256), Image.ANTIALIAS)
if prepimg.mode == "RGBA":
prepimg = prepimg.convert("RGB")
t2 = time.perf_counter()
exec_net.start_async(request_id=0, inputs={input_blob: prepimg})
if exec_net.requests[0].wait(-1) == 0:
outputs = exec_net.requests[0].outputs[out_blob] # (1, 19, 256, 256)
print(outputs[0].shape)
print("SegmentationTime = {:.7f}".format(time.perf_counter() - t2))
outputs = outputs[0] # (19, 256, 512)
outputs = np.argmax(outputs, axis=0) # (256, 512)
# View
image = Image.fromarray(np.uint8(outputs), mode="P")
image.putpalette(palette)
image = image.convert("RGB")
image = image.resize((camera_width, camera_height))
image.save("2.jpg")
image = np.asarray(image)
cv2.putText(image, fps, (camera_width-180,15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (38,0,255), 1, cv2.LINE_AA)
cv2.imshow("Result", image)
if cv2.waitKey(1)&0xFF == ord('q'):
break
elapsedTime = time.time() - t1
fps = "(Playback) {:.1f} FPS".format(1/elapsedTime)
cv2.destroyAllWindows()
del exec_net
del plugin
60ms という猛烈スピードで推論されたが、結果がハチャメチャになった。
どこがバグっているのだろう。。。
#◆ 参考にさせていただいた記事、謝辞
ammo0613さんに先を越されてしまいました。。。 (´;Д;`)
AIを始めよう!OpenVINOのインストールからデモの実行まで - ammo0613 - Qiita
AIを始めよう!OpenVINOで使うモデルを整備する - ammo0613 - Qiita
AIを始めよう!PythonでOpenVINOの仕組みを理解する - ammo0613 - Qiita
#◆ 本日のまとめ
- 純正SDK の NCSDK v1 / v2 はバグだらけでMovidius社は修正する気も無いみたいだし、ほんとダメね
- NCSDKで実行できなかったモデルが、OpenVINOなら実行できることが分かった
- OpenVINO、かなり速くてSDKとしての完成度が高い
- Intel x86/64系で第7世代以降のCPUを搭載した端末なら、OpenVINOはカナリお勧め
- Neural Compute Stick 2 。。。 現時点では期待したほどパフォーマンスが上がらないので、あまりオススメしないかも。。。
- 個人的には ARMベースの TX2やら何やらを購入するより、導入コストとパフォーマンスのバランスを鑑みて、Intel CPU のLattePanda単体で購入したほうが潰しが効きそう、と感じた (正直、Stickはめちゃくちゃ遅いし、ARM非対応ならゴミ同然)
- 次回、Intelが提供してくれた Semantic Segmentation モデルでリアルタイムセグメンテーションして遊ぼうと思う
★★備忘★★
https://software.intel.com/en-us/articles/OpenVINO-InferEngine#CPU%20Extensions
https://software.intel.com/en-us/articles/OpenVINO-InferEngine#Adding%20your%20own%20kernels
#◆ 次回記事
CPU単体で無理やり RealTime Semantic Segmentaion [1 FPS / CPU only]
#Introducing Ubuntu 16.04 + OpenVINO to Latte Panda Alpha 864 (without OS included) and enjoying Semantic Segmentation with Neural Compute Stick and Neural Compute Stick 2
#◆ Introduction
Finally CPU only
realizes segmentation with speed as shown below.
Last article, [Detection rate approx. 30FPS] RaspberryPi3 Model B(plus none) is slightly later than TX2, acquires object detection rate of MobilenetSSD and corresponds to MultiModel (VOC+WIDER FACE).
↓ Youtube plays on click (Neural Compute Stick + MobilenetSSD + RaspberryPi)
It is not an exaggeration to say that we tried OpenVINO which is unlikely to be influenced by the quality of SDK because the quality of NCSDK is remarkably low.
I introduced Ubuntu 16.04 to Latte Panda Alpha 864 (without OS) acquired by reservation sale in early November and introduced OpenVINO to verify the operation of the custom segmentation model of Neural Compute Stick and Neural Compute Stick 2 I do.
The purpose of raising Latte Panda Alpha is to verify the usefulness of Neural Compute Stick + OpenVINO on a single board computer.
Because it completely exceeds the level to tackle with hobby in terms of cost, everyone should never manage.
OpenVINO converts models generated by Caffe, TensorFlow, MXNet, Kaldi, ONNX into intermediate binary of a common format (IR [Intermediate Representation of the model]), commonly via an inference engine API (Inference Engine).
The execution platform does not correspond to the ARM architecture, it only supports Intel's x86 / 64 series CPU.
1. Develop Multiplatform Computer Vision Solutions - Intel Developer Zone
2. Install the Intel® Distribution of OpenVINO™ toolkit for Linux - Intel Developer Zone
3. How to Integrate the Inference Engine in Your Application - Intel Inference Engine Developer Guide
4. Accelerate Deep Learning Inference with Integrated Intel® Processor Graphics Rev 2.0 - Intel Developer Zone
#◆ Appearance of Latte Panda Alpha
1. Outer case1
2. Outer case2
3.Inside box
4.Supplied package (Case not included)
5.Sense of size compared to cigarette box (The length and breadth are somewhat larger than RaspberryPi、On the other hand, thin、About half the thickness of the cigarette box)
#◆ Specification of LattePanda Alpha
- Price:
- OS-less version:$358 (¥40,000)
- Win10 bundle version:$398 (¥45,000)
- CPU:
- Intel 7th Gen Core m3-7y30
- Core:
- 1.6-2.6GHz Dual-Core,Four-Thread
- Benchmark (PassMark):
- Up to 3500, double computing power compared with same price range products in the market
- Graphics:
- Intel HD Graphics 615, 300-900MHz
- RAM:
- 8G LPDDR3 1866MHz Dual-Channel
- Memory:
- 64GB eMMC V5.0l
- External Memory:
- 1x M.2 M Key, PCIe 4x, Supports NVMe SSD and SATA SSD
- 1x M.2 E Key, PCIe 2x,Supports USB2.0, UART, PCM
- Connectivity:
- Wi-Fi 802.11 AC, 2.4G & 5G
- Dual Band Bluetooth 4.2
- Gigabyte Ethernet
- USB Ports:
- 3x USB 3.0 Type A
- 1x USB Type C, supports PD, DP, USB 3.0
- Display:
- HDMI Output
- Type-C DP Support
- Extendable eDP touch displays
- Co-processor:
- Arduino Leonardo
- GPIO & Other Features:
- 2x 50p GPIOs including I2C
- I2S, USB
- RS232
- UART
- RT
- Power Managemen
- Extendable power button
- OS Support:
- Windows 10 Pro
- Linux Ubuntu
#◆ Parts used for kitting
- Windows 10 PC (Anything is OK if you can create USB boot media for Ubuntu 1604)
- LattePanda Alpha
- Intel Movidius Neural Compute Stick v1 / v2
- USB Memory 16GB
- HDMI cable
- HDMI display
- USB keyboard
- USB mouse
#◆ Installation / use software
- Ubuntu 16.04 x86_64
- OpenVINO toolkit 2018 R4 (2018.4.420)
- Python 3.5
- OpenCV 3.4.3 (pip3 install)
- Rufus v3.3
- Tensorflow v1.11.0 (pip3 install)
#◆ Installation of Ubuntu 16.04
##● Working with Windows 10 PC (Create USB flash drive of Ubuntu1604)
1.Ubuntu 16.04.5 Desktop Image Download (1.5GB)
http://releases.ubuntu.com/releases/16.04/ubuntu-16.04.5-desktop-amd64.iso
2.Download USB flash drive creation tool Rufus
Official page - Rufus - Japanese
Download link https://github.com/pbatard/rufus/releases/download/v3.3/rufus-3.3.exe
3.Insert USB memory into Windows 10 PC
4.Start Rufus(rufus-3.3.exe)
、Writing an Ubuntu 16.04 image in DD
mode
Rufus main screen (DD mode designation dialog is displayed after pressing the start button)
Specify DD mode
State of writing
5.Remove USB memory from Windows 10 PC
##● Working with LattePanda Alpha 864
6.Connect the Wi-Fi antenna, keyboard, mouse, HDMI cable / display, USB memory to LattePanda Alpha and finally connect the power
例1) Wi-Fi antenna connection (There are two antennas)
例2) Connecting the HDMI cable
例3) All parts connected state
例4) Connect the power supply Tpye-C cable
When the Type-C cable is connected, the red LED for energization confirmation is always on and the blue LED lights momentarily.
Wait for the blue LED to blink, press and hold the power button for 3 seconds to turn on the power, and the blue LED is always on.
7.Latte Panda Alpha's power turns on and at the same time hits the keyboard's Esc
key
8.Boot
→ Boot Option #1
select, and Enter
9.USB memory name + Partition1
select, and Enter
10.Save & Exit
→ Save Changes and Exit
select, and Enter
11.Yes
select, and Enter
12.Install Ubuntu
select, and Enter
13.Wait for a while
14.English
select, and Continue
15.When connecting to Wi-Fi, Connect to this network
sekect、and Select the SSID from the list Connect
16.Enter the Wi-Fi password, and Connect
17.Install third-party software for graphics and Wi-Fi hardware, Flash, MP3 and other media
select, and Continue
18.Erase disk and install Ubuntu
select, and Install Now
19.Continue
20.Tokyo
select, and Continue
21.Select Japanese
respectively from the left and right columns, Continue
22.Enter user ID, terminal name, password, Continue
23.Wait for a while
24.Restart Now
※Rebooting starts but if it does not work, disconnect the power cable once and turn it on again
25.Ubuntu 16.04 startup completed
26.After logging on, start the terminal and update
$ sudo apt-get update
$ sudo apt-get upgrade
Official installation procedure
http://docs.lattepanda.com/content/alpha_edition/power_on/
#◆ Installation of OpenVINO
OpenVINO version to be installed: 2018.4.420
##● Installation of OpenVINO main unit
Official tutorial
##● Additional installation for Intel Movidius Neural Compute Stick v1 / v2
Execute the following command.
$ cd ~
$ sudo usermod -a -G users "$(whoami)"
$ sudo cat <<EOF > 97-usbboot.rules
SUBSYSTEM=="usb", ATTRS{idProduct}=="2150", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
EOF
$ sudo cp 97-usbboot.rules /etc/udev/rules.d/
$ sudo udevadm control --reload-rules
$ sudo udevadm trigger
$ sudo ldconfig
$ sudo rm 97-usbboot.rules
Use the following command to create a symbolic manually.
$ cd /opt/intel/common/mdf/lib64
$ sudo mv igfxcmrt64.so igfxcmrt64.so.org
$ sudo ln -s libigfxcmrt64.so igfxcmrt64.so
$ cd /opt/intel/mediasdk/lib64
$ sudo mv libmfxhw64.so.1 libmfxhw64.so.1.org
$ sudo mv libmfx.so.1 libmfx.so.1.org
$ sudo mv libva-glx.so.2 libva-glx.so.2.org
$ sudo mv libva.so.2 libva.so.2.org
$ sudo mv libigdgmm.so.1 libigdgmm.so.1.org
$ sudo mv libva-drm.so.2 libva-drm.so.2.org
$ sudo mv libva-x11.so.2 libva-x11.so.2.org
$ sudo ln -s libmfxhw64.so.1.28 libmfxhw64.so.1
$ sudo ln -s libmfx.so.1.28 libmfx.so.1
$ sudo ln -s libva-glx.so.2.300.0 libva-glx.so.2
$ sudo ln -s libva.so.2.300.0 libva.so.2
$ sudo ln -s libigdgmm.so.1.0.0 libigdgmm.so.1
$ sudo ln -s libva-drm.so.2.300.0 libva-drm.so.2
$ sudo ln -s libva-x11.so.2.300.0 libva-x11.so.2
Run sudo ldconfig
again.
$ cd ~
$ sudo ldconfig
Introduced by default OpenCV 4.0.0-pre
has a bug in Gstreamer and it did not work properly, so reintroduce OpenCV 3.4.3
on its own.
Execute the following command.
$ sudo -H pip3 install opencv-python==3.4.3.18
$ nano ~/.bashrc
export PYTHONPATH=/usr/local/lib/python3.5/dist-packages/cv2:$PYTHONPATH
$ source ~/.bashrc
Official installation procedure
Intel®Movidius™Neural Compute Stick and Intel®Neural Compute Stick 2 additional installation procedure
##● Upgrade to Tensorflow v1.11.0
Upgrade to old version Tensorflow v1.9.0
, introduced by default, to Tensorflow v1.11.0
, since subsequent model optimizer processing will fail.
$ python3 -c 'import tensorflow as tf; print(tf.__version__)'
1.9.0
$ sudo -H pip3 install pip --upgrade
$ sudo -H pip3 install tensorflow==1.11.0 --upgrade
##● Settings for offloading custom layer behavior to Tensorflow
Intel official tutorial - Offloading Computations to TensorFlow*
You can offload custom layer operations not supported by the OpenVINO standard API to Tensorflow side.
Execute the following command and self-build the inference engine layer using the Tensorflow runtime.
However, there are bugs in the scripts provided by Intel, so it is necessary to manually correct them.
Also, it is necessary to introduce Bazel
at this timing 0.18.1
.
As of November 17, 2018, 0.19.0
or more, the inference engine layer can not build normally, so be careful.
In the case of a terminal not fully loaded with RAM like LattePanda Alpha, for example 1GB of RAM,
sudo -H $HOME/bin/bazel build --config monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so
↓
sudo -H $HOME/bin/bazel --host_jvm_args=-Xmx512m build --config monolithic --local_resources 1024.0,0.5,0.5 //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so
If you read like this, the build may succeed.
$ sudo apt-get install -y git pkg-config zip g++ zlib1g-dev unzip
$ cd ~
$ wget https://github.com/bazelbuild/bazel/releases/download/0.18.1/bazel-0.18.1-installer-linux-x86_64.sh
$ sudo chmod +x bazel-0.18.1-installer-linux-x86_64.sh
$ ./bazel-0.18.1-installer-linux-x86_64.sh --user
$ echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
$ source ~/.bashrc
$ cd /opt
$ sudo git clone -b v1.11.0 https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ sudo git checkout -b v1.11.0
$ echo 'export TF_ROOT_DIR=/opt/tensorflow' >> ~/.bashrc
$ source ~/.bashrc
$ sudo nano /opt/intel/computer_vision_sdk/bin/setupvars.sh
#Before
INSTALLDIR=/opt/intel//computer_vision_sdk_2018.4.420
↓
#After
INSTALLDIR=/opt/intel/computer_vision_sdk_2018.4.420
$ source /opt/intel/computer_vision_sdk/bin/setupvars.sh
$ sudo nano /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer/tf_call_ie_layer/build.sh
#Before
bazel build --config=monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so
↓
#After
sudo -H $HOME/bin/bazel build --config monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so
$ sudo -E /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer/tf_call_ie_layer/build.sh
The inference engine layer is generated in the following path.
/opt/tensorflow/bazel-bin/tensorflow/cc/inference_engine_layer/libtensorflow_call_layer.so
In this case, when you run python, there is no access privilege to /opt
under the ordinary user Permission denied
Because an error occurs, change the placement place.
$ su -
$ cp /opt/tensorflow/bazel-bin/tensorflow/cc/inference_engine_layer/libtensorflow_call_layer.so /usr/local/lib
$ exit
$ nano ~/.bashrc
export PYTHONPATH=$PYTHONPATH:/usr/local/lib
$ source ~/.bashrc
$ sudo ldconfig
#◆ Taste of the demonstration program
##● Sample image classification
Execute the following command.
$ cd /opt/intel/computer_vision_sdk/deployment_tools/demo
$ ./demo_squeezenet_download_convert_run.sh
Load the image shown below. . .
It seems that 80% probability sports car
recognized.
###################################################
Run Inference Engine classification sample
Run ./classification_sample -d CPU -i /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png -m /home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.xml
[ INFO ] InferenceEngine:
API version ............ 1.4
Build .................. 17328
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ] /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png
[ INFO ] Loading plugin
API version ............ 1.4
Build .................. lnx_20181004
Description ....... MKLDNNPlugin
[ INFO ] Loading network files:
/home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.xml
/home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.bin
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (787, 259) to (227, 227)
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (1 iterations)
[ INFO ] Processing output blobs
Top 10 results:
Image /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png
817 0.8363345 label sports car, sport car
511 0.0946488 label convertible
479 0.0419131 label car wheel
751 0.0091071 label racer, race car, racing car
436 0.0068161 label beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon
656 0.0037564 label minivan
586 0.0025741 label half track
717 0.0016069 label pickup, pickup truck
864 0.0012027 label tow truck, tow car, wrecker
581 0.0005882 label grille, radiator grille
total inference time: 6.5318211
Average running time of one iteration: 6.5318211 ms
Throughput: 153.0966609 FPS
[ INFO ] Execution successful
###################################################
Demo completed successfully.
##● Sample of three step inference
It seems to be a sample that makes three levels of inference run continuously in separate learning models.
- Car detection
- Detection of license plate
- Character recognition in the identified license plate
Execute the following command.
$ cd /opt/intel/computer_vision_sdk/deployment_tools/demo
$ ./demo_security_barrier_camera.sh
##● Various sample programs other than the above
Intel® Distribution of OpenVINO™ Toolkit - Inference Engine Samples
#◆ Proprietary model conversion and execution sample script
Official tutorial - Using the Model Optimizer to Convert TensorFlow* Models
Official tutorial - Model Optimizer Developer Guide - TensorFlow* Models with Custom Layers
Sample script below to convert .pb
(FreezeGraph) of Tensorflow
to IR format
for OpenVINO.
$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer
$ python3 mo_tf.py --input_model <INPUT_MODEL>.pb
**Conversion command options and explanation**
optional arguments:
-h, --help show this help message and exit
--framework {tf,caffe,mxnet,kaldi,onnx}
Name of the framework used to train the input model.
Framework-agnostic parameters:
--input_model INPUT_MODEL, -w INPUT_MODEL, -m INPUT_MODEL
Tensorflow*: a file with a pre-trained model (binary
or text .pb file after freezing). Caffe*: a model
proto file with model weights
--model_name MODEL_NAME, -n MODEL_NAME
Model_name parameter passed to the final create_ir
transform. This parameter is used to name a network in
a generated IR and output .xml/.bin files.
--output_dir OUTPUT_DIR, -o OUTPUT_DIR
Directory that stores the generated IR. By default, it
is the directory from where the Model Optimizer is
launched.
--input_shape INPUT_SHAPE
Input shape(s) that should be fed to an input node(s)
of the model. Shape is defined as a comma-separated
list of integer numbers enclosed in parentheses or
square brackets, for example [1,3,227,227] or
(1,227,227,3), where the order of dimensions depends
on the framework input layout of the model. For
example, [N,C,H,W] is used for Caffe* models and
[N,H,W,C] for TensorFlow* models. Model Optimizer
performs necessary transformations to convert the
shape to the layout required by Inference Engine
(N,C,H,W). The shape should not contain undefined
dimensions (? or -1) and should fit the dimensions
defined in the input operation of the graph. If there
are multiple inputs in the model, --input_shape should
contain definition of shape for each input separated
by a comma, for example: [1,3,227,227],[2,4] for a
model with two inputs with 4D and 2D shapes.
--scale SCALE, -s SCALE
All input values coming from original network inputs
will be divided by this value. When a list of inputs
is overridden by the --input parameter, this scale is
not applied for any input that does not match with the
original input of the model.
--reverse_input_channels
Switch the input channels order from RGB to BGR (or
vice versa). Applied to original inputs of the model
if and only if a number of channels equals 3. Applied
after application of --mean_values and --scale_values
options, so numbers in --mean_values and
--scale_values go in the order of channels used in the
original model.
--log_level {CRITICAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}
Logger level
--input INPUT The name of the input operation of the given model.
Usually this is a name of the input placeholder of the
model.
--output OUTPUT The name of the output operation of the model. For
TensorFlow*, do not add :0 to this name.
--mean_values MEAN_VALUES, -ms MEAN_VALUES
Mean values to be used for the input image per
channel. Values to be provided in the (R,G,B) or
[R,G,B] format. Can be defined for desired input of
the model, for example: "--mean_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--scale_values SCALE_VALUES
Scale values to be used for the input image per
channel. Values are provided in the (R,G,B) or [R,G,B]
format. Can be defined for desired input of the model,
for example: "--scale_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--data_type {FP16,FP32,half,float}
Data type for all intermediate tensors and weights. If
original model is in FP32 and --data_type=FP16 is
specified, all model weights and biases are quantized
to FP16.
--disable_fusing Turn off fusing of linear operations to Convolution
--disable_resnet_optimization
Turn off resnet optimization
--finegrain_fusing FINEGRAIN_FUSING
Regex for layers/operations that won't be fused.
Example: --finegrain_fusing Convolution1,.*Scale.*
--disable_gfusing Turn off fusing of grouped convolutions
--move_to_preprocess Move mean values to IR preprocess section
--extensions EXTENSIONS
Directory or a comma separated list of directories
with extensions. To disable all extensions including
those that are placed at the default location, pass an
empty string.
--batch BATCH, -b BATCH
Input batch size
--version Version of Model Optimizer
--silent Prevent any output messages except those that
correspond to log level equals ERROR, that can be set
with the following option: --log_level. By default,
log level is already ERROR.
--freeze_placeholder_with_value FREEZE_PLACEHOLDER_WITH_VALUE
Replaces input layer with constant node with provided
value, e.g.: "node_name->True"
--generate_deprecated_IR_V2
Force to generate legacy/deprecated IR V2 to work with
previous versions of the Inference Engine. The
resulting IR may or may not be correctly loaded by
Inference Engine API (including the most recent and
old versions of Inference Engine) and provided as a
partially-validated backup option for specific
deployment scenarios. Use it at your own discretion.
By default, without this option, the Model Optimizer
generates IR V3.
**Tensorflow-specific conversion command options and explanation**
TensorFlow*-specific parameters:
--input_model_is_text
TensorFlow*: treat the input model file as a text
protobuf format. If not specified, the Model Optimizer
treats it as a binary file by default.
--input_checkpoint INPUT_CHECKPOINT
TensorFlow*: variables file to load.
--input_meta_graph INPUT_META_GRAPH
Tensorflow*: a file with a meta-graph of the model
before freezing
--saved_model_dir SAVED_MODEL_DIR
TensorFlow*: directory representing non frozen model
--saved_model_tags SAVED_MODEL_TAGS
Group of tag(s) of the MetaGraphDef to load, in string
format, separated by ','. For tag-set contains
multiple tags, all tags must be passed in.
--offload_unsupported_operations_to_tf
TensorFlow*: automatically offload unsupported
operations to TensorFlow*
--tensorflow_subgraph_patterns TENSORFLOW_SUBGRAPH_PATTERNS
TensorFlow*: a list of comma separated patterns that
will be applied to TensorFlow* node names to infer a
part of the graph using TensorFlow*.
--tensorflow_operation_patterns TENSORFLOW_OPERATION_PATTERNS
TensorFlow*: a list of comma separated patterns that
will be applied to TensorFlow* node type (ops) to
infer these operations using TensorFlow*.
--tensorflow_custom_operations_config_update TENSORFLOW_CUSTOM_OPERATIONS_CONFIG_UPDATE
TensorFlow*: update the configuration file with node
name patterns with input/output nodes information.
--tensorflow_use_custom_operations_config TENSORFLOW_USE_CUSTOM_OPERATIONS_CONFIG
TensorFlow*: use the configuration file with custom
operation description.
--tensorflow_object_detection_api_pipeline_config TENSORFLOW_OBJECT_DETECTION_API_PIPELINE_CONFIG
TensorFlow*: path to the pipeline configuration file
used to generate model created with help of Object
Detection API.
--tensorboard_logdir TENSORBOARD_LOGDIR
TensorFlow*: dump the input graph to a given directory
that should be used with TensorBoard.
--tensorflow_custom_layer_libraries TENSORFLOW_CUSTOM_LAYER_LIBRARIES
TensorFlow*: comma separated list of shared libraries
with TensorFlow* custom operations implementation.
--disable_nhwc_to_nchw
Disables default translation from NHWC to NCHW
#◆ Self-generated model・Semantic Segmentation 「UNet」 Conversion
First of all, I will try from UNet
whose structure is super simple.
The .pb file is placed in TensorflowLite-UNet - PINTO0309 - Github
This is a model of Semantic Segmentation that I have learned only Person class
.
TensorflowLite-UNet/model/semanticsegmentation_frozen_person_32.pb
(31.1MB)
##● Conversion to data type FP16
Execute the following command.
--input_model
is the name of the .pb file to be converted (FreezeGraph name)
--output_dir
is output destination path of converted lr file
--input
is input node name (placeholder name)
--output
is output node name
--data_type
is data precision type name after conversion [FP16/FP32/half/float]
--batch
is forced substitution of input batch size
--scale
is normalization of value range
--mean_values
is specify the average subtraction value of BGR value in pixel units
--offload_unsupported_operations_to_tf
is specification for offloading Tensorflow's custom layer that can not be processed by OpenVINO to Tensorflow side
$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer
$ sudo mkdir -p 01_pbmodels/UNet
$ sudo mkdir -p 10_lrmodels/UNet/FP16
$ sudo wget https://github.com/PINTO0309/TensorflowLite-UNet/raw/master/model/semanticsegmentation_frozen_person_32.pb -P 01_pbmodels/UNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
--output_dir 10_lrmodels/UNet/FP16 \
--input input \
--output output/BiasAdd \
--data_type FP16 \
--batch 1
<Reference POST for calculating RGB average value>
https://forums.fast.ai/t/images-normalization/4058
https://github.com/DrSleep/tensorflow-deeplab-resnet/issues/106
# Calculate RGB average value per image
mean = np.mean(jpgimg, axis=(0, 1))
meanB += mean[0]
meanG += mean[1]
meanR += mean[2]
# Calculate RGB average values of all learning images
print("meanB =", meanB / imgcnt)
print("meanG =", meanG / imgcnt)
print("meanR =", meanR / imgcnt)
Since it converted from the model of FP32 to FP16, the apparent file size became 15.5MB which is half the size before conversion.
**lr conversion log**
Model Optimizer arguments:
Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16
- IR output name: semanticsegmentation_frozen_person_32
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: output/BiasAdd
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP16
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: False
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d
[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16/semanticsegmentation_frozen_person_32.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16/semanticsegmentation_frozen_person_32.bin
[ SUCCESS ] Total execution time: 3.86 seconds.
##● Conversion to data type FP32
Execute the following command.
$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer
$ sudo mkdir -p 01_pbmodels/UNet
$ sudo mkdir -p 10_lrmodels/UNet/FP32
$ sudo wget https://github.com/PINTO0309/TensorflowLite-UNet/raw/master/model/semanticsegmentation_frozen_person_32.pb -P 01_pbmodels/UNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
--output_dir 10_lrmodels/UNet/FP32 \
--input input \
--output output/BiasAdd \
--data_type FP32 \
--batch 1
**lr conversion log**
Model Optimizer arguments:
Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32
- IR output name: semanticsegmentation_frozen_person_32
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: output/BiasAdd
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP32
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: False
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d
[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.bin
[ SUCCESS ] Total execution time: 3.70 seconds.
#◆ Construction and execution of UNet execution environment by OpenVINO
import sys
import cv2
import numpy as np
from PIL import Image
import time
from openvino.inference_engine import IENetwork, IEPlugin
model_xml='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.xml'
model_bin='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.bin'
net = IENetwork.from_ir(model=model_xml, weights=model_bin)
seg_image = Image.open("data/input/009649.png")
palette = seg_image.getpalette() # Get a color palette
index_void = 2 # Define index_void Back Ground
camera_width = 320
camera_height = 240
fps = ""
elapsedTime = 0
plugin = IEPlugin(device="HETERO:MYRIAD,CPU")
plugin.set_config({"TARGET_FALLBACK": "HETERO:MYRIAD,CPU"})
plugin.set_initial_affinity(net)
#plugin = IEPlugin(device="MYRIAD")
#plugin = IEPlugin(device="CPU")
exec_net = plugin.load(network=net)
input_blob = next(iter(net.inputs)) #input_blob = 'input'
out_blob = next(iter(net.outputs)) #out_blob = 'output/BiasAdd'
n, c, h, w = net.inputs[input_blob].shape #n, c, h, w = 1, 3, 256, 256
del net
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FPS, 30)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, camera_width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, camera_height)
time.sleep(1)
while cap.isOpened():
t1 = time.time()
ret, frame = cap.read()
if not ret:
break
#frame = cv2.imread('data/input/000003.jpg')
prepimg = frame[:, :, ::-1].copy()
#prepimg = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
prepimg = Image.fromarray(prepimg)
prepimg = prepimg.resize((256, 256), Image.ANTIALIAS)
prepimg = np.asarray(prepimg) / 255.0
prepimg = prepimg.transpose((2, 0, 1)).reshape((1, c, h, w))
t2 = time.perf_counter()
exec_net.start_async(request_id=0, inputs={input_blob: prepimg})
if exec_net.requests[0].wait(-1) == 0:
outputs = exec_net.requests[0].outputs[out_blob] # (1, 3, 256, 256)
print("SegmentationTime = {:.7f}".format(time.perf_counter() - t2))
outputs = outputs.transpose((2, 3, 1, 0)).reshape((h, w, c)) # (256, 256 3)
outputs = cv2.resize(outputs, (camera_width, camera_height)) # (240, 320, 3)
# View
res = np.argmax(outputs, axis=2)
if index_void is not None:
res = np.where(res == index_void, 0, res)
image = Image.fromarray(np.uint8(res), mode="P")
image.putpalette(palette)
image = image.convert("RGB")
image = np.asarray(image)
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
image = cv2.addWeighted(frame, 1, image, 0.9, 0)
cv2.putText(image, fps, (camera_width-180,15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (38,0,255), 1, cv2.LINE_AA)
cv2.imshow("Result", image)
if cv2.waitKey(1)&0xFF == ord('q'):
break
elapsedTime = time.time() - t1
fps = "(Playback) {:.1f} FPS".format(1/elapsedTime)
cv2.destroyAllWindows()
del exec_net
del plugin
◆ Measurement result of processing speed
With USB camera shooting, the performance of 4 FPS to 5 FPS was out.