Edited at

LattePanda Alpha 864 (OS付属無し) にUbuntu16.04+OpenVINOを導入してNeural Compute Stick(NCS1) と Neural Compute Stick 2(NCS2) で爆速Semantic Segmentationを楽しむ

TensorflowLite-UNet GitHub stars1

Tensorflow-ENet GitHub stars3

Tensorflow-ENet2 GitHub stars3

ICNet-tensorflow GitHub stars3

OpenVINO-ADAS GitHub stars3

OpenVINO-DeeplabV3 GitHub stars3

I wrote an English translation at the end of the article, here


◆ はじめに

まず、衝撃的な結論から先に言おう。

OpenVINOでIntelのCPU向けにモデルを最適化した場合、Neural Compute Stick 2を使用するよりも、CPU実行のほうが推論が速い。

公式フォーラムで各国のエンジニアと議論した内容は コチラ

あふれ出るワクワクの衝動を抑えきれない、とにかくセッカチで超絶困ったさんなあなたは コチラ から評価結果へショートカット可能。

最終的に シングルボードコンピュータのCPUのみ で下図のようなスピードでのセグメンテーションを実現する。

自動運転で使えるスピードには程遠いが、CPU Onlyということを鑑みるとめちゃくちゃ速い。

分かる人にしか分からない結果。

sample.gif

MobileNet-SSD + RaspberryPi で Neural Compute Stick は使い倒した。

↓ クリックでYoutube再生される

Screenshot 2018-11-26 00:06:00.png

Qiita記事 にしたり、 Githubにコミット したり、この縁あって 本家Tensorflowリポジトリ にPullRequestしてみたり、CaffeとTensorflowのDeepLearning基礎の基礎を学べたことで、自己成長という点では完全にモトがとれたと思っている。

2018/11/14 Neural Compute Stick 2 (初代の8倍の性能) (←本家URLのリンク) の発売が決定したが、初代発売後1年経過した今でもなおSDKの品質が とてつもなく低過ぎ かつ NCS2 は RaspberryPi(ARM)には対応していない ので、購入するかどうかは少しだけ迷ったが、やっぱり我慢できずに購入してしまった。

v1 と全く同じサイズ。 そして、Intel買収先の Movidius の文字が消えた。

できればもう少し横幅を縮めて欲しかったなぁ。。。

ncs2-angled-down-lid-off-500x334.png aaa.jpg

Ver2は、わずか3本で Jetson TX2 の性能を凌駕するということで、ワクワクさんの心がうずく。

Neural Compute Stick v1 ... 100GFLOPS

Neural Compute Stick v2 ... 800GFLOPS (単純に8倍してみた)

TX2 ... 2TFLOPS

実は NCSDK の品質が著しく低いため、SDKの品質に左右されそうにない OpenVINO を試したくなったと言っても過言ではない。

今回は、11月初旬に予約販売で手に入れた LattePanda Alpha 864 (OS無し)Ubuntu16.04 を導入し、更に OpenVINO を導入して Neural Compute Stick および Neural Compute Stick 2 のカスタムセグメンテーションモデルの動作検証を行う。

LattePanda Alpha を調達した目的は、 シングルボードコンピュータ上での Neural Compute Stick + OpenVINO の有用性検証のため。

コスト面において趣味で取り組むレベルを完全に超えているため、みなさんは決してマネしないように。

OpenVINOCaffe, TensorFlow, MXNet, Kaldi, ONNX でそれぞれ生成したモデルを、共通フォーマットの中間バイナリ(IR [Intermediate representation of the model])へ変換し、推論エンジンAPI(Inference Engine)を経由して共通的に実行できる、というもの。

なお、実行基盤は ARMアーキテクチャ には対応しておらず、Intel の x86/64系CPU にしか対応していない。

02.jpg

1. Develop Multiplatform Computer Vision Solutions - Intel Developer Zone

2. Install the Intel® Distribution of OpenVINO™ toolkit for Linux - Intel Developer Zone

3. How to Integrate the Inference Engine in Your Application - Intel Inference Engine Developer Guide

4. Accelerate Deep Learning Inference with Integrated Intel® Processor Graphics Rev 2.0 - Intel Developer Zone


◆ LattePanda Alpha の外観

1.外箱1

03.jpg

2.外箱2 (渋い)

04.jpg

3.中箱 (パンダの顔があしらってある)

05.jpg

4.同梱品一式 (ケースは付属しない)

06.jpg

5.タバコの箱と比較したサイズ感 (縦横はRaspberryPiより若干大きいが、反面で薄く、タバコの箱の半分ぐらいの厚み)

08.jpg


◆ LattePanda Alpha のスペック

無駄にハイスペック。



  • 価格:


    • OS無し版:$358 (¥40,000)

    • Win10 バンドル版:$398 (¥45,000)




  • CPU:


    • Intel 7th Gen Core m3-7y30




  • Core:


    • 1.6-2.6GHz Dual-Core,Four-Thread




  • Benchmark (PassMark):


    • Up to 3500, double computing power compared with same price range products in the market




  • Graphics:


    • Intel HD Graphics 615, 300-900MHz




  • RAM:


    • 8G LPDDR3 1866MHz Dual-Channel




  • Memory:


    • 64GB eMMC V5.0l




  • External Memory:


    • 1x M.2 M Key, PCIe 4x, Supports NVMe SSD and SATA SSD

    • 1x M.2 E Key, PCIe 2x,Supports USB2.0, UART, PCM




  • Connectivity:


    • Wi-Fi 802.11 AC, 2.4G & 5G (gitekimark2.png 技適マーク有り)

    • Dual Band Bluetooth 4.2

    • Gigabyte Ethernet




  • USB Ports:


    • 3x USB 3.0 Type A

    • 1x USB Type C, supports PD, DP, USB 3.0




  • Display:


    • HDMI Output

    • Type-C DP Support

    • Extendable eDP touch displays




  • Co-processor:


    • Arduino Leonardo




  • GPIO & Other Features:


    • 2x 50p GPIOs including I2C

    • I2S, USB

    • RS232

    • UART

    • RT

    • Power Managemen

    • Extendable power button




  • OS Support:


    • Windows 10 Pro

    • Linux Ubuntu




◆ キッティングに使用した部材


  • Windows 10 PC (Ubuntu1604のUSB起動メディアが作成できる環境なら何でもOK)

  • LattePanda Alpha

  • Intel Movidius Neural Compute Stick v1 / v2

  • USBメモリ 16GB

  • HDMIケーブル

  • HDMIディスプレイ

  • USBキーボード

  • USBマウス


◆ 導入・使用ソフト


  • Ubuntu 16.04 x86_64

  • OpenVINO toolkit 2018 R4 (2018.4.420)

  • Python 3.5

  • OpenCV 3.4.3 (pip3インストール)

  • Rufus v3.3

  • Tensorflow v1.11.0 (pip3インストール)


◆ Ubuntu16.04のインストール


● Windows 10 PC での作業 (Ubuntu1604のUSBフラッシュドライブ作成)

1.Ubuntu16.04.5 Desktop のイメージダウンロード (1.5GB)

http://releases.ubuntu.com/releases/16.04/ubuntu-16.04.5-desktop-amd64.iso

2.USBフラッシュドライブ作成用ツール Rufus のダウンロード

rufus-128.png 公式ページ - Rufus - Japanese

ダウンロード用リンク https://github.com/pbatard/rufus/releases/download/v3.3/rufus-3.3.exe

3.USBメモリをWindows10PCへ挿入する

4.Rufus(rufus-3.3.exe) を起動し、DD モードでUbuntu16.04イメージを書き込む

Rufusのメイン画面 (スタートボタン押下後にDDモード指定のダイアログが表示される)

01.png

DDモードの指定

02.png

書き込み中の様子

03.png

5.USBメモリをWindows10PCから外す


● LattePanda Alpha 864 での作業

6.Wi-Fiアンテナ、キーボード、マウス、HDMIケーブル/ディスプレイ、USBメモリを LattePanda Alphaへ接続し、最後に電源を接続する

例1) Wi-Fiアンテナの接続 (Alphaの場合、アンテナは2本ある)

ezgif.com-optimize1.gif

例2) HDMIケーブルの接続

ezgif.com-optimize2.gif

例3) 全部材接続済みの様子 (電源OFF、青色LEDが消灯した瞬間を撮影してしまった)

12.jpg

例4) 電源のTpye-Cケーブルを接続

Type-Cケーブルを接続すると、通電確認用の赤色LEDが常時点灯状態になり、青色のLEDが一瞬点灯する。

青色LEDが明滅状態になるのを待ってから、電源ボタンを3秒間長押しすると電源がONになり、青色LEDが常時点灯状態になる。

ezgif.com-optimize3.gif

7.LattePanda Alpha の電源がONになると同時にキーボードの Esc キーを連打する

8.BootBoot Option #1 を選択して Enter

DSC_0110.jpg

9.USBメモリの名前 + Partition1 を選択して Enter

DSC_0111.jpg

10.Save & ExitSave Changes and Exit を選択して Enter

DSC_0112.jpg

11.Yes を選択して Enter

DSC_0113.jpg

12.Install Ubuntu を選択して Enter

DSC_0114.jpg

13.しばらく待ち

DSC_0115.jpg

14.English を選択して Continue

DSC_0116.jpg

15.Wi-Fiに接続する場合は、 Connect to this network を選択し、一覧からSSIDを選択して Connect

DSC_0117.jpg

16.Wi-Fiのパスワードを入力して Connect

DSC_0118.jpg

17.Install third-party software for graphics and Wi-Fi hardware, Flash, MP3 and other media を選択し、Continue

DSC_0119.jpg

18.Erase disk and install Ubuntu を選択し、Install Now

DSC_0120.jpg

19.Continue

DSC_0121.jpg

20.Tokyo を選択し、Continue

DSC_0122.jpg

21.左右の欄からそれぞれ Japanese を選択し、Continue

DSC_0123.jpg

22.ユーザIDや端末名、パスワードを入力し、Continue

DSC_0124.jpg

23.しばらく待ち

DSC_0125.jpg

24.Restart Now

※再起動が始まるが、うまくいかない場合は一度電源ケーブルを抜き差しして再度電源をONにする

DSC_0126.jpg

25.Ubuntu16.04の起動完了、あっけなく正常起動した。

DSC_0127.jpg

26.ログオンしたあとでターミナルを起動し、アップデートだけ行っておく。


アップデートコマンド

$ sudo apt-get update

$ sudo apt-get upgrade

う〜む、シングルボードコンピュータとは思えない異常な快適さ。

公式インストール手順

http://docs.lattepanda.com/content/alpha_edition/power_on/


◆ OpenVINOのインストール

インストール対象のOpenVINOバージョン: 2018.4.420


● OpenVINO本体のインストール

AIを始めよう!OpenVINOのインストールからデモの実行まで - Qiita - ammo0613さん の記事を参考に OpenVINO を導入する。

しっかりと手順を記載いただいているため、ココでは取り立てて記載をしない。

ただし、ツールキットがバージョンアップするごとに少しづつコマンドスクリプトが変更されているため、 公式のチュートリアル を併せて参照しながら作業を進めることを推奨する。


● Intel Movidius Neural Compute Stick v1/v2 のための追加インストール

下記のコマンドを実行する。


USBアクセスルールの更新

$ cd ~

$ sudo usermod -a -G users "$(whoami)"
$ sudo cat <<EOF > 97-usbboot.rules
SUBSYSTEM=="usb", ATTRS{idProduct}=="2150", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
EOF

$ sudo cp 97-usbboot.rules /etc/udev/rules.d/
$ sudo udevadm control --reload-rules
$ sudo udevadm trigger
$ sudo ldconfig
$ sudo rm 97-usbboot.rules


sudo ldconfig を実行したときに下記のようなエラーが発生した。

シンボリックが正しく張れていないようだ。


sudo_ldconfig時のエラー内容

alpha@LattePandaAlpha:~$ sudo ldconfig

/sbin/ldconfig.real: /opt/intel/common/mdf/lib64/igfxcmrt64.so is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libmfxhw64.so.1 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libmfx.so.1 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libva-glx.so.2 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libva.so.2 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libigdgmm.so.1 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libva-drm.so.2 is not a symbolic link
/sbin/ldconfig.real: /opt/intel/mediasdk/lib64/libva-x11.so.2 is not a symbolic link

調べると、各ファイル下記のような状況になっていた。

さすがIntel、 期待を裏切ったためしが無いぜ。


エラーに関連しそうなファイルたち

./igfxcmrt64.so

./libigfxcmrt64.so

./libmfxhw64.so
./libmfxhw64.so.1
./libmfxhw64.so.1.28

./libmfx.so
./libmfx.so.1
./libmfx.so.1.28

./libva-glx.so
./libva-glx.so.2
./libva-glx.so.2.300.0

./libva.so
./libva.so.2
./libva.so.2.300.0

./libigdgmm.so
./libigdgmm.so.1
./libigdgmm.so.1.0.0

./libva-drm.so
./libva-drm.so.2
./libva-drm.so.2.300.0

./libva-x11.so
./libva-x11.so.2
./libva-x11.so.2.300.0


下記コマンドにより、シンボリックをマニュアルで作成する。


シンボリックのマニュアル生成コマンド

$ cd /opt/intel/common/mdf/lib64

$ sudo mv igfxcmrt64.so igfxcmrt64.so.org
$ sudo ln -s libigfxcmrt64.so igfxcmrt64.so

$ cd /opt/intel/mediasdk/lib64
$ sudo mv libmfxhw64.so.1 libmfxhw64.so.1.org
$ sudo mv libmfx.so.1 libmfx.so.1.org
$ sudo mv libva-glx.so.2 libva-glx.so.2.org
$ sudo mv libva.so.2 libva.so.2.org
$ sudo mv libigdgmm.so.1 libigdgmm.so.1.org
$ sudo mv libva-drm.so.2 libva-drm.so.2.org
$ sudo mv libva-x11.so.2 libva-x11.so.2.org
$ sudo ln -s libmfxhw64.so.1.28 libmfxhw64.so.1
$ sudo ln -s libmfx.so.1.28 libmfx.so.1
$ sudo ln -s libva-glx.so.2.300.0 libva-glx.so.2
$ sudo ln -s libva.so.2.300.0 libva.so.2
$ sudo ln -s libigdgmm.so.1.0.0 libigdgmm.so.1
$ sudo ln -s libva-drm.so.2.300.0 libva-drm.so.2
$ sudo ln -s libva-x11.so.2.300.0 libva-x11.so.2

気を取り直してもう一度 sudo ldconfig を実行する。


sudo_ldconfigの再実行

$ cd ~

$ sudo ldconfig

今度は正常に終了した。

デフォルトで導入される OpenCV4.0.0-pre には Gstreamer のバグが有ってまともに動かなかったので、自力で OpenCV3.4.3 を導入し直す。

下記コマンドを実行する。


OpenCV3.4.3の導入

$ sudo -H pip3 install opencv-python==3.4.3.18

$ nano ~/.bashrc
export PYTHONPATH=/usr/local/lib/python3.5/dist-packages/cv2:$PYTHONPATH

$ source ~/.bashrc

公式インストール手順

Intel®Movidius™Neural Compute StickおよびIntel®Neural Compute Stick 2の追加インストール手順


● Tensorflow v1.11.0 へのアップグレード

後続のモデルオプティマイザの処理でエラーが発生するため、デフォルトで導入される古いバージョンの Tensorflow v1.9.0 を、 Tensorflow v1.11.0 へアップグレードする。


Tensorflowのバージョン確認用コマンド例

$ python3 -c 'import tensorflow as tf; print(tf.__version__)'

1.9.0


Tensorflow_v1.11.0へのアップグレードコマンド(一応この際なのでpipもアップグレード)

$ sudo -H pip3 install pip --upgrade

$ sudo -H pip3 install tensorflow==1.11.0 --upgrade


● カスタムレイヤの動作をTensorflowへオフロードするための設定

Intel公式チュートリアル - Offloading Computations to TensorFlow*

OpenVINOの標準APIでサポートされないカスタムレイヤの操作をTensorflow側にオフロードすることができる。

特定のオペレーションだけを切り出して Tensorflow側 に動作を一任することができる仕組みは面白い。

下記コマンドを実行し、Tensorflowランタイムを使用して推論エンジンレイヤを自力ビルドする。

ただし、Intelが提供するスクリプトにバグがあるため、一部マニュアルで修正する必要がある。

また、このタイミングで導入する Bazel0.18.1 である必要がある。

2018年11月17日時点では 0.19.0 以上だと推論エンジンレイヤが正常にビルドできないため注意。

LattePanda AlphaのようにRAMを潤沢に搭載していない端末、例えば RAM 1GB の場合は、

sudo -H $HOME/bin/bazel build --config monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so



sudo -H $HOME/bin/bazel --host_jvm_args=-Xmx512m build --config monolithic --local_resources 1024.0,0.5,0.5 //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so

のように読み替えると成功する可能性がある。


TensorFlowランタイムで推論エンジンレイヤを構築_LattePandaAlphaで46分

$ sudo apt-get install -y git pkg-config zip g++ zlib1g-dev unzip

$ cd ~
$ wget https://github.com/bazelbuild/bazel/releases/download/0.18.1/bazel-0.18.1-installer-linux-x86_64.sh
$ sudo chmod +x bazel-0.18.1-installer-linux-x86_64.sh
$ ./bazel-0.18.1-installer-linux-x86_64.sh --user
$ echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
$ source ~/.bashrc
$ cd /opt
$ sudo git clone -b v1.11.0 https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ sudo git checkout -b v1.11.0
$ echo 'export TF_ROOT_DIR=/opt/tensorflow' >> ~/.bashrc
$ source ~/.bashrc
$ sudo nano /opt/intel/computer_vision_sdk/bin/setupvars.sh

#Before
INSTALLDIR=/opt/intel//computer_vision_sdk_2018.4.420

#After
INSTALLDIR=/opt/intel/computer_vision_sdk_2018.4.420

$ source /opt/intel/computer_vision_sdk/bin/setupvars.sh
$ sudo nano /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer/tf_call_ie_layer/build.sh

#Before
bazel build --config=monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so

#After
sudo -H $HOME/bin/bazel build --config monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so

$ sudo -E /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer/tf_call_ie_layer/build.sh


推論エンジンレイヤは下記のパスに生成される。


libtensorflow_call_layer.soのパス

/opt/tensorflow/bazel-bin/tensorflow/cc/inference_engine_layer/libtensorflow_call_layer.so


このままでは python 実行時に、一般ユーザーによる /opt 配下へのアクセス権限が無く Permission denied エラーが発生するため、配置先を変更する。


.soの配置先変更

$ su -

$ cp /opt/tensorflow/bazel-bin/tensorflow/cc/inference_engine_layer/libtensorflow_call_layer.so /usr/local/lib
$ exit
$ nano ~/.bashrc
export PYTHONPATH=$PYTHONPATH:/usr/local/lib
$ source ~/.bashrc
$ sudo ldconfig


◆ デモプログラムの味見


● 画像分類のサンプル

下記のコマンドを実行する。


Image_Classification_Sample_SqueezeNet

$ cd /opt/intel/computer_vision_sdk/deployment_tools/demo

$ ./demo_squeezenet_download_convert_run.sh

下図の画像を読み込んで。。。

car.jpg

80%の確率で スポーツカー と認識したようだ。

表面上は普通過ぎてつまらんね。

ただ、 153 FPS とか異次元の計測値が出ているw


結果

###################################################


Run Inference Engine classification sample

Run ./classification_sample -d CPU -i /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png -m /home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.xml

[ INFO ] InferenceEngine:
API version ............ 1.4
Build .................. 17328
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ] /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png
[ INFO ] Loading plugin

API version ............ 1.4
Build .................. lnx_20181004
Description ....... MKLDNNPlugin
[ INFO ] Loading network files:
/home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.xml
/home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.bin
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (787, 259) to (227, 227)
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (1 iterations)
[ INFO ] Processing output blobs

Top 10 results:

Image /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png

817 0.8363345 label sports car, sport car
511 0.0946488 label convertible
479 0.0419131 label car wheel
751 0.0091071 label racer, race car, racing car
436 0.0068161 label beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon
656 0.0037564 label minivan
586 0.0025741 label half track
717 0.0016069 label pickup, pickup truck
864 0.0012027 label tow truck, tow car, wrecker
581 0.0005882 label grille, radiator grille

total inference time: 6.5318211
Average running time of one iteration: 6.5318211 ms

Throughput: 153.0966609 FPS

[ INFO ] Execution successful

###################################################

Demo completed successfully.



● 3段階推論のサンプル

3段階の推論を別々の学習モデルで連続実行させるサンプルのようだ。

アイデアとしては、私のような者でも思いつくほどにありきたり。


  • 車の検出(黒い車 だとか 白い車だとかの属性含む)

  • ナンバープレートの検出

  • 識別したナンバープレート内の文字認識

下記のコマンドを実行する。


3段階推論のサンプル

$ cd /opt/intel/computer_vision_sdk/deployment_tools/demo

$ ./demo_security_barrier_camera.sh

サンプルなので出来て当たり前。 面白みに欠ける。

license-plate.jpeg


● 上記以外の各種サンプルプログラム

Intel® Distribution of OpenVINO™ Toolkit - Inference Engine Samples


◆ 独自モデルの変換と実行サンプルスクリプト

公式チュートリアル - Using the Model Optimizer to Convert TensorFlow* Models

公式チュートリアル - Model Optimizer Developer Guide - TensorFlow* Models with Custom Layers



下記が Tensorflow.pb (FreezeGraph) を OpenVINO用の IR形式 に変換するサンプルスクリプト。


変換コマンド

$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer

$ python3 mo_tf.py --input_model <INPUT_MODEL>.pb

変換コマンドのオプションと説明


変換コマンドのオプション

optional arguments:

-h, --help show this help message and exit
--framework {tf,caffe,mxnet,kaldi,onnx}
Name of the framework used to train the input model.

Framework-agnostic parameters:
--input_model INPUT_MODEL, -w INPUT_MODEL, -m INPUT_MODEL
Tensorflow*: a file with a pre-trained model (binary
or text .pb file after freezing). Caffe*: a model
proto file with model weights
--model_name MODEL_NAME, -n MODEL_NAME
Model_name parameter passed to the final create_ir
transform. This parameter is used to name a network in
a generated IR and output .xml/.bin files.
--output_dir OUTPUT_DIR, -o OUTPUT_DIR
Directory that stores the generated IR. By default, it
is the directory from where the Model Optimizer is
launched.
--input_shape INPUT_SHAPE
Input shape(s) that should be fed to an input node(s)
of the model. Shape is defined as a comma-separated
list of integer numbers enclosed in parentheses or
square brackets, for example [1,3,227,227] or
(1,227,227,3), where the order of dimensions depends
on the framework input layout of the model. For
example, [N,C,H,W] is used for Caffe* models and
[N,H,W,C] for TensorFlow* models. Model Optimizer
performs necessary transformations to convert the
shape to the layout required by Inference Engine
(N,C,H,W). The shape should not contain undefined
dimensions (? or -1) and should fit the dimensions
defined in the input operation of the graph. If there
are multiple inputs in the model, --input_shape should
contain definition of shape for each input separated
by a comma, for example: [1,3,227,227],[2,4] for a
model with two inputs with 4D and 2D shapes.
--scale SCALE, -s SCALE
All input values coming from original network inputs
will be divided by this value. When a list of inputs
is overridden by the --input parameter, this scale is
not applied for any input that does not match with the
original input of the model.
--reverse_input_channels
Switch the input channels order from RGB to BGR (or
vice versa). Applied to original inputs of the model
if and only if a number of channels equals 3. Applied
after application of --mean_values and --scale_values
options, so numbers in --mean_values and
--scale_values go in the order of channels used in the
original model.
--log_level {CRITICAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}
Logger level
--input INPUT The name of the input operation of the given model.
Usually this is a name of the input placeholder of the
model.
--output OUTPUT The name of the output operation of the model. For
TensorFlow*, do not add :0 to this name.
--mean_values MEAN_VALUES, -ms MEAN_VALUES
Mean values to be used for the input image per
channel. Values to be provided in the (R,G,B) or
[R,G,B] format. Can be defined for desired input of
the model, for example: "--mean_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--scale_values SCALE_VALUES
Scale values to be used for the input image per
channel. Values are provided in the (R,G,B) or [R,G,B]
format. Can be defined for desired input of the model,
for example: "--scale_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--data_type {FP16,FP32,half,float}
Data type for all intermediate tensors and weights. If
original model is in FP32 and --data_type=FP16 is
specified, all model weights and biases are quantized
to FP16.
--disable_fusing Turn off fusing of linear operations to Convolution
--disable_resnet_optimization
Turn off resnet optimization
--finegrain_fusing FINEGRAIN_FUSING
Regex for layers/operations that won't be fused.
Example: --finegrain_fusing Convolution1,.*Scale.*
--disable_gfusing Turn off fusing of grouped convolutions
--move_to_preprocess Move mean values to IR preprocess section
--extensions EXTENSIONS
Directory or a comma separated list of directories
with extensions. To disable all extensions including
those that are placed at the default location, pass an
empty string.
--batch BATCH, -b BATCH
Input batch size
--version Version of Model Optimizer
--silent Prevent any output messages except those that
correspond to log level equals ERROR, that can be set
with the following option: --log_level. By default,
log level is already ERROR.
--freeze_placeholder_with_value FREEZE_PLACEHOLDER_WITH_VALUE
Replaces input layer with constant node with provided
value, e.g.: "node_name->True"
--generate_deprecated_IR_V2
Force to generate legacy/deprecated IR V2 to work with
previous versions of the Inference Engine. The
resulting IR may or may not be correctly loaded by
Inference Engine API (including the most recent and
old versions of Inference Engine) and provided as a
partially-validated backup option for specific
deployment scenarios. Use it at your own discretion.
By default, without this option, the Model Optimizer
generates IR V3.



Tensorflow固有の変換コマンドのオプションと説明


Tensorflow固有の変換コマンドのオプション

TensorFlow*-specific parameters:

--input_model_is_text
TensorFlow*: treat the input model file as a text
protobuf format. If not specified, the Model Optimizer
treats it as a binary file by default.
--input_checkpoint INPUT_CHECKPOINT
TensorFlow*: variables file to load.
--input_meta_graph INPUT_META_GRAPH
Tensorflow*: a file with a meta-graph of the model
before freezing
--saved_model_dir SAVED_MODEL_DIR
TensorFlow*: directory representing non frozen model
--saved_model_tags SAVED_MODEL_TAGS
Group of tag(s) of the MetaGraphDef to load, in string
format, separated by ','. For tag-set contains
multiple tags, all tags must be passed in.
--offload_unsupported_operations_to_tf
TensorFlow*: automatically offload unsupported
operations to TensorFlow*
--tensorflow_subgraph_patterns TENSORFLOW_SUBGRAPH_PATTERNS
TensorFlow*: a list of comma separated patterns that
will be applied to TensorFlow* node names to infer a
part of the graph using TensorFlow*.
--tensorflow_operation_patterns TENSORFLOW_OPERATION_PATTERNS
TensorFlow*: a list of comma separated patterns that
will be applied to TensorFlow* node type (ops) to
infer these operations using TensorFlow*.
--tensorflow_custom_operations_config_update TENSORFLOW_CUSTOM_OPERATIONS_CONFIG_UPDATE
TensorFlow*: update the configuration file with node
name patterns with input/output nodes information.
--tensorflow_use_custom_operations_config TENSORFLOW_USE_CUSTOM_OPERATIONS_CONFIG
TensorFlow*: use the configuration file with custom
operation description.
--tensorflow_object_detection_api_pipeline_config TENSORFLOW_OBJECT_DETECTION_API_PIPELINE_CONFIG
TensorFlow*: path to the pipeline configuration file
used to generate model created with help of Object
Detection API.
--tensorboard_logdir TENSORBOARD_LOGDIR
TensorFlow*: dump the input graph to a given directory
that should be used with TensorBoard.
--tensorflow_custom_layer_libraries TENSORFLOW_CUSTOM_LAYER_LIBRARIES
TensorFlow*: comma separated list of shared libraries
with TensorFlow* custom operations implementation.
--disable_nhwc_to_nchw
Disables default translation from NHWC to NCHW



◆ 自力生成モデル・Semantic Segmentation 「UNet」 の変換

さて、ようやくここからが今回の検証の本題。

Intel Movidius Neural Compute Stick の 純正SDK 「NCSDK v2.x」 では実行できなかったモデルが、OpenVINO上では動作するかどうかを検証する。

私としては、このモデルがNCSによる推論に成功するだけで歓喜。

なお、この記事は LattePanda Alpha の性能を検証することが目的ではないことを改めて周知させていただく。

まずは、構造が超シンプルな UNet からトライする。

.pbファイルは TensorflowLite-UNet - PINTO0309 - Github に配置してあるものを使用する。

こちらは、 Personクラス のみに限定して学習させた Semantic Segmentation のモデルだ。

TensorflowLite-UNet/model/semanticsegmentation_frozen_person_32.pb (31.1MB)


● データタイプ FP16 への変換

下記のコマンドを実行する。

--input_model は、変換対象とする.pbファイル名 (FreezeGraph名)

--output_dir は、変換後の lrファイル の出力先パス

--input は、入力ノード名 (プレースホルダ名)

--output は、出力ノード名

--data_type は、変換後のデータ精度型名 [FP16/FP32/half/float]

--batch は、入力バッチサイズの強制置換 (.pbの入力形状がバッチサイズ不定 [-1, 256, 256, 3] のような時に -1 の部分を強制的に置換することができる、OpenVINOは バッチサイズ = -1 を許容しないらしい)

--scale は、BGR各値を 255 (UInt8) で割り算し、0~1 の値範囲へ正規化するときに使用する指定

--mean_values は、ピクセル単位での BGR値 の平均減算値を指定

--offload_unsupported_operations_to_tf は、OpenVINOで処理できないTensorflowのカスタムレイヤーをTensorflow側にオフロードして処理させるための指定


自作「UNet」モデルのIR_FP16への変換スクリプト

$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer

$ sudo mkdir -p 01_pbmodels/UNet
$ sudo mkdir -p 10_lrmodels/UNet/FP16
$ sudo wget https://github.com/PINTO0309/TensorflowLite-UNet/raw/master/model/semanticsegmentation_frozen_person_32.pb -P 01_pbmodels/UNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
--output_dir 10_lrmodels/UNet/FP16 \
--input input \
--output output/BiasAdd \
--data_type FP16 \
--batch 1

<RGB平均値算出の参考POST>

https://forums.fast.ai/t/images-normalization/4058

https://github.com/DrSleep/tensorflow-deeplab-resnet/issues/106


PYTHONによるRGB平均算出のサンプルロジック

# 画像1枚あたりのRGB平均値算出

mean = np.mean(jpgimg, axis=(0, 1))
meanB += mean[0]
meanG += mean[1]
meanR += mean[2]
# 全学習画像のRGB平均値算出
print("meanB =", meanB / imgcnt)
print("meanG =", meanG / imgcnt)
print("meanR =", meanR / imgcnt)

どうにか変換には成功したようだ。

FP32のモデルからFP16へ変換したため、見た目上のファイルサイズが変換前の半分の15.5MBになった。

lr変換ログ


変換ログ

Model Optimizer arguments:

Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16
- IR output name: semanticsegmentation_frozen_person_32
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: output/BiasAdd
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP16
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: False
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d

[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16/semanticsegmentation_frozen_person_32.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16/semanticsegmentation_frozen_person_32.bin
[ SUCCESS ] Total execution time: 3.86 seconds.





SITWXV~4.jpg


● データタイプ FP32 への変換

下記のコマンドを実行する。


自作「UNet」モデルのIR_FP32への変換スクリプト

$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer

$ sudo mkdir -p 01_pbmodels/UNet
$ sudo mkdir -p 10_lrmodels/UNet/FP32
$ sudo wget https://github.com/PINTO0309/TensorflowLite-UNet/raw/master/model/semanticsegmentation_frozen_person_32.pb -P 01_pbmodels/UNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
--output_dir 10_lrmodels/UNet/FP32 \
--input input \
--output output/BiasAdd \
--data_type FP32 \
--batch 1

こちらも成功したようだ。

元先で精度を変更していないため、最終アウトプットのファイルサイズに変化は無い。

lr変換ログ


変換ログ

Model Optimizer arguments:

Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32
- IR output name: semanticsegmentation_frozen_person_32
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: output/BiasAdd
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP32
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: False
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d

[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.bin
[ SUCCESS ] Total execution time: 3.70 seconds.





SKTIWX~T.jpg


◆ 自力生成モデル・Semantic Segmentation 「ENet」 の変換(その1)

こちらもNCSのSDK 「NCSDK v2.x」 では実行できなかったモデル。

ちなみに、本家の Tensorflow Lite 上でも現時点では素の状態で動作しない。

こちらも動作すれば歓喜。

.pbファイルは TensorFlow-ENet - PINTO0309 - Github に配置してあるものを使用する。


● 【失敗】 データタイプ FP16 への変換

下記のコマンドを実行する。


自作「ENet」モデルのIR_FP16への変換スクリプト

$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer

$ sudo mkdir -p 01_pbmodels/ENet
$ sudo mkdir -p 10_lrmodels/ENet/FP16
$ sudo wget https://github.com/PINTO0309/TensorFlow-ENet/raw/pinto0309work/checkpoint/semanticsegmentation_enet.pb -P 01_pbmodels/ENet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/ENet/semanticsegmentation_enet.pb \
--output_dir 10_lrmodels/ENet/FP16 \
--input input \
--output ENet/logits_to_softmax \
--data_type FP16 \
--batch 1 \
--offload_unsupported_operations_to_tf \
--tensorflow_operation_patterns Range,ScatterNd

ダメだ、 ScatterNd の EagerExecution で何故か型変換エラーが発生する。

どのような要素が足りていないのかが分からない。。。


エラー内容

[ ERROR ]  Cannot infer shapes or values for node "TFSubgraphCall_2743".

[ ERROR ] Error converting shape to a TensorShape: only size-1 arrays can be converted to Python scalars.
[ ERROR ]
[ ERROR ] It can happen due to bug in custom shape infer function <function tf_subgraph_infer at 0x7fc3fc967400>.
[ ERROR ] Or because the node inputs have incorrect values/shapes.
[ ERROR ] Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ] Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ] Stopped shape/value propagation at "TFSubgraphCall_2743" node.
For more information please refer to Model Optimizer FAQ (<INSTALL_DIR>/deployment_tools/documentation/docs/MO_FAQ.html), question #38.


◆ 自力生成モデル・Semantic Segmentation 「ENet」 の変換(その2)

オジサン、あきらめない! ( ✧Д✧) カッ!!

今度はコチラのリポジトリを拝借して、CPU対応とモデルサイズ縮小の独自カスタマイズを実施。

segmentation - fregu856 - Github

カスタマイズ後

Tensorflow-ENet2 - PINTO0309 - Github


● 【失敗】 データタイプ FP16 への変換

下記のコマンドを実行する。


「ENet」モデルのIR_FP16への変換スクリプト

$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer

$ sudo mkdir -p 01_pbmodels/ENet
$ sudo mkdir -p 10_lrmodels/ENet/FP16
$ sudo wget https://github.com/PINTO0309/Tensorflow-ENet2/raw/master/training_logs/best_model/semanticsegmentation_frozen_enet.pb -P 01_pbmodels/ENet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/ENet/semanticsegmentation_frozen_enet.pb \
--output_dir 10_lrmodels/ENet/FP16 \
--input imgs_ph \
--output fullconv/Relu \
--data_type FP16 \
--batch 1 \
--offload_unsupported_operations_to_tf \
--tensorflow_operation_patterns Range,ScatterNd

ダメだ。 同じく ScatterNd の EagerExecution で何故か型変換エラーが発生する。

ScatterNd を使用せずに Upsampling するにはどうしたら良いのか。。。分からない。。。


エラー内容

[ ERROR ]  Cannot infer shapes or values for node "TFSubgraphCall_1695".

[ ERROR ] Error converting shape to a TensorShape: only size-1 arrays can be converted to Python scalars.
[ ERROR ]
[ ERROR ] It can happen due to bug in custom shape infer function <function tf_subgraph_infer at 0x7f425c167400>.
[ ERROR ] Or because the node inputs have incorrect values/shapes.
[ ERROR ] Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ] Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ] Stopped shape/value propagation at "TFSubgraphCall_1695" node.
For more information please refer to Model Optimizer FAQ (<INSTALL_DIR>/deployment_tools/documentation/docs/MO_FAQ.html), question #38.

★Unpoolingの別実装参考

https://assiaben.github.io/posts/2018-06-tf-unpooling/

https://github.com/assiaben/w/blob/master/unpooling/unpool_test.py


◆ 他力本願モデル・ADAS(先進運転支援システム)向け Semantic Segmentation モデルの変換

くやしいけど、Intelが公式に公開してくれているサンプルモデルを使う。

使うと言っても、OpenVINO導入時に FP16 と FP32 それぞれの精度で変換済みのモデルが勝手にインストールされているようだ。

今回は Neural Compute Stick を使用するため、 FP16 の方を後続の工程で使用することにする。

/opt/intel/computer_vision_sdk/deployment_tools/intel_models/semantic-segmentation-adas-000/FP16

/opt/intel/computer_vision_sdk/deployment_tools/intel_models/semantic-segmentation-adas-000/FP32

semantic-segmentation-adas-0001.bin

semantic-segmentation-adas-0001.xml


◆ 自力生成モデル・Semantic Segmentation 「ICNet」の変換

.pbファイルは ICNet-tensorflow - PINTO0309 - Github に配置してあるものを使用する。


● データタイプ FP16 への変換

下記のコマンドを実行する。


「ICNet」モデルのIR_FP16への変換スクリプト

$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer

$ sudo mkdir -p 01_pbmodels/ICNet
$ sudo mkdir -p 10_lrmodels/ICNet/FP16
$ sudo wget https://github.com/PINTO0309/ICNet-tensorflow/raw/pinto0309work/snapshots/semanticsegmentation_ICNet.pb -P 01_pbmodels/ICNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/ICNet/semanticsegmentation_ICNet.pb \
--output_dir 10_lrmodels/ICNet/FP16 \
--input input \
--output ResizeBilinear_19 \
--data_type FP16

成功したようだ。

lr変換ログ


変換ログ

Model Optimizer arguments:

Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/ICNet/semanticsegmentation_ICNet.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP16
- IR output name: semanticsegmentation_ICNet
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: ResizeBilinear_19
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP16
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: True
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d

[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP16/semanticsegmentation_ICNet.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP16/semanticsegmentation_ICNet.bin
[ SUCCESS ] Total execution time: 6.58 seconds.





S3SVMM~H.PNG


● データタイプ FP32 への変換

下記のコマンドを実行する。


「ICNet」モデルのIR_FP32への変換スクリプト

$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer

$ sudo mkdir -p 01_pbmodels/ICNet
$ sudo mkdir -p 10_lrmodels/ICNet/FP32
$ sudo wget https://github.com/PINTO0309/ICNet-tensorflow/raw/pinto0309work/snapshots/semanticsegmentation_ICNet.pb -P 01_pbmodels/ICNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/ICNet/semanticsegmentation_ICNet.pb \
--output_dir 10_lrmodels/ICNet/FP32 \
--input input \
--output ResizeBilinear_19 \
--data_type FP32

こちらも成功したようだ。

lr変換ログ


変換ログ

Model Optimizer arguments:

Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/ICNet/semanticsegmentation_ICNet.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP32
- IR output name: semanticsegmentation_ICNet
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: ResizeBilinear_19
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP32
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: True
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d

[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP32/semanticsegmentation_ICNet.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP32/semanticsegmentation_ICNet.bin
[ SUCCESS ] Total execution time: 8.47 seconds.





STROU9~D.jpg


◆ OpenVINOによる UNet 実行環境の構築と実行

本当は ENet を実装したかったが、ScatterNd の変換エラーがどうしてもクリアできなかったため、仕方なく UNet を実装してみる。


リアルタイムセグメンテーション用UNet実行プログラムサンプル

import sys

import cv2
import numpy as np
from PIL import Image
import time
from openvino.inference_engine import IENetwork, IEPlugin

model_xml='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.xml'
model_bin='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.bin'
net = IENetwork.from_ir(model=model_xml, weights=model_bin)
seg_image = Image.open("data/input/009649.png")
palette = seg_image.getpalette() # Get a color palette
index_void = 2 # Define index_void Back Ground
camera_width = 320
camera_height = 240
fps = ""
elapsedTime = 0

plugin = IEPlugin(device="HETERO:MYRIAD,CPU")
plugin.set_config({"TARGET_FALLBACK": "HETERO:MYRIAD,CPU"})
plugin.set_initial_affinity(net)

#plugin = IEPlugin(device="MYRIAD")
#plugin = IEPlugin(device="CPU")

exec_net = plugin.load(network=net)

input_blob = next(iter(net.inputs)) #input_blob = 'input'
out_blob = next(iter(net.outputs)) #out_blob = 'output/BiasAdd'
n, c, h, w = net.inputs[input_blob].shape #n, c, h, w = 1, 3, 256, 256

del net

cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FPS, 30)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, camera_width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, camera_height)
time.sleep(1)

while cap.isOpened():
t1 = time.time()
ret, frame = cap.read()
if not ret:
break
#frame = cv2.imread('data/input/000003.jpg')
prepimg = frame[:, :, ::-1].copy()
#prepimg = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
prepimg = Image.fromarray(prepimg)
prepimg = prepimg.resize((256, 256), Image.ANTIALIAS)
prepimg = np.asarray(prepimg) / 255.0
prepimg = prepimg.transpose((2, 0, 1)).reshape((1, c, h, w))

t2 = time.perf_counter()
exec_net.start_async(request_id=0, inputs={input_blob: prepimg})

if exec_net.requests[0].wait(-1) == 0:
outputs = exec_net.requests[0].outputs[out_blob] # (1, 3, 256, 256)
print("SegmentationTime = {:.7f}".format(time.perf_counter() - t2))
outputs = outputs.transpose((2, 3, 1, 0)).reshape((h, w, c)) # (256, 256 3)
outputs = cv2.resize(outputs, (camera_width, camera_height)) # (240, 320, 3)

# View
res = np.argmax(outputs, axis=2)
if index_void is not None:
res = np.where(res == index_void, 0, res)
image = Image.fromarray(np.uint8(res), mode="P")
image.putpalette(palette)
image = image.convert("RGB")

image = np.asarray(image)
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
image = cv2.addWeighted(frame, 1, image, 0.9, 0)

cv2.putText(image, fps, (camera_width-180,15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (38,0,255), 1, cv2.LINE_AA)
cv2.imshow("Result", image)

if cv2.waitKey(1)&0xFF == ord('q'):
break
elapsedTime = time.time() - t1
fps = "(Playback) {:.1f} FPS".format(1/elapsedTime)

cv2.destroyAllWindows()
del exec_net
del plugin


000003.jpg

4.jpg

◆ 処理速度の計測結果

Inference_Time2.jpg

おいっ! CPUのほうが速いぞ! Σ(゚ロ゚;)

まさかとは思うが、Intel 7th Gen Core m3-7y30 のほうが、専用AIチップの Myriad X より性能が良い、なんてことはないよね???

MKL-DNN ってそんなに強力なの???

本気で懐疑的過ぎて、即座に公式フォーラムへ issue を挙げてしまった。

とにかく、現状の計測方法では、何故か Neural Compute Stick を使わないほうがパフォーマンスが良い。

ただし、Neural Compute Stick v1 よりも Neural Compute Stick v2 のほうが2倍以上のパフォーマンスが出ていることは確かだ。

また、RaspberryPi3 の ARM CPU 単独で同じモデルを処理させたときは、11秒掛かっていたため、超爆速パフォーマンスになっているのも確かだ。

ちなみに参考に記載したプログラムは、コメント部を調整すればUSBカメラで撮影した動画をリアルタイムにセグメンテーションできるようにしてある。

USBカメラ撮影で、おおむね 4 FPS 〜 5 FPS の性能が出ていた。

しかしながら、精度は極悪なので使えたものではないけれど。。。

【2018/11/29追記】

海外のエンジニアと協力して検証し、 Intel Celeron になら勝てる、との結論に至るw 検証結果は コチラ

◆ 公式フォーラムへ投稿したissue

https://software.intel.com/en-us/forums/computer-vision/topic/800215


◆ 【一部成功】OpenVINOによるADAS用セグメンテーション実行環境の構築と実行

ダウンロード可能なモデルの一覧は下記を参照。

Intel公式のチュートリアルは雑過ぎるため、あえて Github上 の OpenCV のリポジトリから拝借しても良い。

内容を突き合わせしたが、OpenCVの内容のほうが新しいようだ。

ちなみに、OpenCVのリポジトリのほうが、ダウンロード可能なモデルの種類が豊富だ。

あくまで参考までに記載するため、あえて実施しなくてもこの先の検証は継続可能。

OpenCV - Github - Public Topologies Downloader


OpenCVのモデルダウンロード用リポジトリからの各種モデルのダウンロードのサンプル

$ sudo -H pip3 install pyyaml requests

$ cd ~
$ git clone https://github.com/opencv/open_model_zoo.git
$ cd open_model_zoo/model_downloader
$ ./downloader.py --name semantic-segmentation-adas-0001
$ ./downloader.py --name semantic-segmentation-adas-0001-fp16
$ ./downloader.py --name チョメチョメ
 :

では本題。上記を実施していなくてもココから先の作業を実施すれば良い。

下記コマンドを実行し、サンプルプログラムをビルドする。


サンプルプログラムビルド用のシェルスクリプトの実行

$ sudo /opt/intel/computer_vision_sdk/deployment_tools/inference_engine/samples/build_samples.sh


何故か home/<username>/inference_engine_samples_build/intel64/Release 配下にビルド済みバイナリが生成される。


サンプルプログラムフォルダへ移動と使い方表示

$ cd ~/inference_engine_samples_build/intel64

$ sudo chmod 777 Release
$ cd Release
$ ./segmentation_demo -h

[ INFO ] InferenceEngine:
API version ............ 1.4
Build .................. 17328
[ INFO ] Parsing input parameters

segmentation_demo [OPTION]
Options:

-h Print a usage message.
-i "<path>" Required. Path to an .bmp image.
-m "<path>" Required. Path to an .xml file with a trained model.
-l "<absolute_path>" Required for MKLDNN (CPU)-targeted custom layers. Absolute path to a shared library with the kernels impl.
Or
-c "<absolute_path>" Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels desc.
-pp "<path>" Path to a plugin folder.
-d "<device>" Specify the target device to infer on: CPU, GPU, FPGA or MYRIAD is acceptable. The demo will look for a suitable plugin for a specified device (CPU by default).
-ni "<integer>" Number of iterations (default 1)
-pc Enables per-layer performance report


まずは、 Neural Compute Stick モード で実行する。


SemanticSegmentationサンプルプログラムの実行

$ ./segmentation_demo \

-i test.png \
-m /opt/intel/computer_vision_sdk/deployment_tools/intel_models/semantic-segmentation-adas-000/FP16/semantic-segmentation-adas-0001.xml \
-d MYRIAD \
-pc

動かない。。。 ナメてんのか? いんてる。

まぁ、Python API 自体がまだお試し版リリースであることは理解している。


実行エラーログ

[ INFO ] InferenceEngine: 

API version ............ 1.4
Build .................. 17328
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ] ./test.png
[ INFO ] Loading plugin

API version ............ 1.4
Build .................. 17328
Description ....... myriadPlugin
[ INFO ] Loading network files
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (512, 256) to (2048, 1024)
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ ERROR ] Cannot convert layer "argmax" due to unsupported layer type "ArgMax"


ArgMax ぐらい、モデルの外で自力でプログラムを書いても1〜2行のことなので、変換前モデルを捜索するも、何故か caffemodel を非公開にしていらっしゃる。

呆れる。。。 品質の低さとユーザー軽視は、彼の ○i○r○s○f○ より酷い。

しかし、オジサンはイチイチメゲない。 次に行ってみよう。

次は、 CPU モード で実行してみる。


SemanticSegmentationサンプルプログラムの実行

$ ./segmentation_demo \

-i test.png \
-m /opt/intel/computer_vision_sdk/deployment_tools/intel_models/semantic-segmentation-adas-000/FP16/semantic-segmentation-adas-0001.xml \
-d CPU \
-pc

下図のテスト用画像をインプットすると。。。

test.jpg

なかなか綺麗にセグメンテーションされた。

CPU実行にもかかわらず、推論時間は 909 ms だった。

かなり速い!!

out_0.jpg


◆ OpenVINOによる ICNet 実行環境の構築

最後の砦、 ICNet。 エッジセグメンテーションの未来は君の双肩に掛かっている。

下記プログラムを実行する。

今回は CPUエクステンションライブラリ を有効にする。


リアルタイムセグメンテーション用ICNet実行プログラムサンプル

import sys

import cv2
import numpy as np
from PIL import Image
import time
from openvino.inference_engine import IENetwork, IEPlugin

model_xml='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP32/semanticsegmentation_ICNet.xml'
model_bin='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/ICNet/FP32/semanticsegmentation_ICNet.bin'
net = IENetwork.from_ir(model=model_xml, weights=model_bin)
seg_image = Image.open("data/input/009649.png")
palette = seg_image.getpalette() # Get a color palette
index_void = 2 # Define index_void Back Ground
camera_width = 320
camera_height = 240
fps = ""
elapsedTime = 0

#plugin = IEPlugin(device="HETERO:MYRIAD,CPU")
#plugin.set_config({"TARGET_FALLBACK": "HETERO:MYRIAD,CPU"})
#plugin.set_initial_affinity(net)

#plugin = IEPlugin(device="MYRIAD")
plugin = IEPlugin(device="CPU")

plugin.add_cpu_extension("/home/alpha/inference_engine_samples_build/intel64/Release/lib/libcpu_extension.so")
exec_net = plugin.load(network=net)

input_blob = next(iter(net.inputs)) #input_blob = 'input'
out_blob = next(iter(net.outputs)) #out_blob = 'ResizeBilinear_19'
#print(net.inputs[input_blob].shape)
h, w, c = net.inputs[input_blob].shape #h, w, c = 256, 512, 3

del net

cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FPS, 30)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, camera_width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, camera_height)
time.sleep(1)

while cap.isOpened():
t1 = time.time()
#ret, frame = cap.read()
#if not ret:
# break
frame = cv2.imread('data/input/000003.jpg')
camera_height, camera_width, channels = frame.shape[:3]
prepimg = frame[:, :, ::-1].copy()
prepimg = Image.fromarray(prepimg)
prepimg = prepimg.resize((512, 256), Image.ANTIALIAS)
if prepimg.mode == "RGBA":
prepimg = prepimg.convert("RGB")
t2 = time.perf_counter()

exec_net.start_async(request_id=0, inputs={input_blob: prepimg})

if exec_net.requests[0].wait(-1) == 0:
outputs = exec_net.requests[0].outputs[out_blob] # (1, 19, 256, 256)

print(outputs[0].shape)
print("SegmentationTime = {:.7f}".format(time.perf_counter() - t2))
outputs = outputs[0] # (19, 256, 512)
outputs = np.argmax(outputs, axis=0) # (256, 512)

# View
image = Image.fromarray(np.uint8(outputs), mode="P")
image.putpalette(palette)
image = image.convert("RGB")
image = image.resize((camera_width, camera_height))

image.save("2.jpg")
image = np.asarray(image)

cv2.putText(image, fps, (camera_width-180,15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (38,0,255), 1, cv2.LINE_AA)
cv2.imshow("Result", image)

if cv2.waitKey(1)&0xFF == ord('q'):
break
elapsedTime = time.time() - t1
fps = "(Playback) {:.1f} FPS".format(1/elapsedTime)

cv2.destroyAllWindows()
del exec_net
del plugin


60ms という猛烈スピードで推論されたが、結果がハチャメチャになった。

どこがバグっているのだろう。。。


◆ 参考にさせていただいた記事、謝辞

ammo0613さんに先を越されてしまいました。。。 (´;Д;`)

AIを始めよう!OpenVINOのインストールからデモの実行まで - ammo0613 - Qiita

AIを始めよう!OpenVINOで使うモデルを整備する - ammo0613 - Qiita

AIを始めよう!PythonでOpenVINOの仕組みを理解する - ammo0613 - Qiita


◆ 本日のまとめ


  • 純正SDK の NCSDK v1 / v2 はバグだらけでMovidius社は修正する気も無いみたいだし、ほんとダメね

  • NCSDKで実行できなかったモデルが、OpenVINOなら実行できることが分かった

  • OpenVINO、かなり速くてSDKとしての完成度が高い

  • Intel x86/64系で第7世代以降のCPUを搭載した端末なら、OpenVINOはカナリお勧め

  • Neural Compute Stick 2 。。。 現時点では期待したほどパフォーマンスが上がらないので、あまりオススメしないかも。。。

  • 個人的には ARMベースの TX2やら何やらを購入するより、導入コストとパフォーマンスのバランスを鑑みて、Intel CPU のLattePanda単体で購入したほうが潰しが効きそう、と感じた (正直、Stickはめちゃくちゃ遅いし、ARM非対応ならゴミ同然)

  • 次回、Intelが提供してくれた Semantic Segmentation モデルでリアルタイムセグメンテーションして遊ぼうと思う

★★備忘★★

https://software.intel.com/en-us/articles/OpenVINO-InferEngine#CPU%20Extensions

https://software.intel.com/en-us/articles/OpenVINO-InferEngine#Adding%20your%20own%20kernels


◆ 次回記事

CPU単体で無理やり RealTime Semantic Segmentaion [1 FPS / CPU only]


Introducing Ubuntu 16.04 + OpenVINO to Latte Panda Alpha 864 (without OS included) and enjoying Semantic Segmentation with Neural Compute Stick and Neural Compute Stick 2


◆ Introduction

Finally CPU only realizes segmentation with speed as shown below.

sample.gif

Last article, [Detection rate approx. 30FPS] RaspberryPi3 Model B(plus none) is slightly later than TX2, acquires object detection rate of MobilenetSSD and corresponds to MultiModel (VOC+WIDER FACE).

↓ Youtube plays on click (Neural Compute Stick + MobilenetSSD + RaspberryPi)

Screenshot 2018-11-26 00:06:00.png

It is not an exaggeration to say that we tried OpenVINO which is unlikely to be influenced by the quality of SDK because the quality of NCSDK is remarkably low.

I introduced Ubuntu 16.04 to Latte Panda Alpha 864 (without OS) acquired by reservation sale in early November and introduced OpenVINO to verify the operation of the custom segmentation model of Neural Compute Stick and Neural Compute Stick 2 I do.

The purpose of raising Latte Panda Alpha is to verify the usefulness of Neural Compute Stick + OpenVINO on a single board computer.

Because it completely exceeds the level to tackle with hobby in terms of cost, everyone should never manage.

OpenVINO converts models generated by Caffe, TensorFlow, MXNet, Kaldi, ONNX into intermediate binary of a common format (IR [Intermediate Representation of the model]), commonly via an inference engine API (Inference Engine).

The execution platform does not correspond to the ARM architecture, it only supports Intel's x86 / 64 series CPU.

02.jpg

1. Develop Multiplatform Computer Vision Solutions - Intel Developer Zone

2. Install the Intel® Distribution of OpenVINO™ toolkit for Linux - Intel Developer Zone

3. How to Integrate the Inference Engine in Your Application - Intel Inference Engine Developer Guide

4. Accelerate Deep Learning Inference with Integrated Intel® Processor Graphics Rev 2.0 - Intel Developer Zone


◆ Appearance of Latte Panda Alpha

1. Outer case1

03.jpg

2. Outer case2

04.jpg

3.Inside box

05.jpg

4.Supplied package (Case not included)

06.jpg

5.Sense of size compared to cigarette box (The length and breadth are somewhat larger than RaspberryPi、On the other hand, thin、About half the thickness of the cigarette box)

08.jpg


◆ Specification of LattePanda Alpha



  • Price:


    • OS-less version:$358 (¥40,000)

    • Win10 bundle version:$398 (¥45,000)




  • CPU:


    • Intel 7th Gen Core m3-7y30




  • Core:


    • 1.6-2.6GHz Dual-Core,Four-Thread




  • Benchmark (PassMark):


    • Up to 3500, double computing power compared with same price range products in the market




  • Graphics:


    • Intel HD Graphics 615, 300-900MHz




  • RAM:


    • 8G LPDDR3 1866MHz Dual-Channel




  • Memory:


    • 64GB eMMC V5.0l




  • External Memory:


    • 1x M.2 M Key, PCIe 4x, Supports NVMe SSD and SATA SSD

    • 1x M.2 E Key, PCIe 2x,Supports USB2.0, UART, PCM




  • Connectivity:


    • Wi-Fi 802.11 AC, 2.4G & 5G

    • Dual Band Bluetooth 4.2

    • Gigabyte Ethernet




  • USB Ports:


    • 3x USB 3.0 Type A

    • 1x USB Type C, supports PD, DP, USB 3.0




  • Display:


    • HDMI Output

    • Type-C DP Support

    • Extendable eDP touch displays




  • Co-processor:


    • Arduino Leonardo




  • GPIO & Other Features:


    • 2x 50p GPIOs including I2C

    • I2S, USB

    • RS232

    • UART

    • RT

    • Power Managemen

    • Extendable power button




  • OS Support:


    • Windows 10 Pro

    • Linux Ubuntu




◆ Parts used for kitting


  • Windows 10 PC (Anything is OK if you can create USB boot media for Ubuntu 1604)

  • LattePanda Alpha

  • Intel Movidius Neural Compute Stick v1 / v2

  • USB Memory 16GB

  • HDMI cable

  • HDMI display

  • USB keyboard

  • USB mouse


◆ Installation / use software


  • Ubuntu 16.04 x86_64

  • OpenVINO toolkit 2018 R4 (2018.4.420)

  • Python 3.5

  • OpenCV 3.4.3 (pip3 install)

  • Rufus v3.3

  • Tensorflow v1.11.0 (pip3 install)


◆ Installation of Ubuntu 16.04


● Working with Windows 10 PC (Create USB flash drive of Ubuntu1604)

1.Ubuntu 16.04.5 Desktop Image Download (1.5GB)

http://releases.ubuntu.com/releases/16.04/ubuntu-16.04.5-desktop-amd64.iso

2.Download USB flash drive creation tool Rufus

rufus-128.png Official page - Rufus - Japanese

Download link https://github.com/pbatard/rufus/releases/download/v3.3/rufus-3.3.exe

3.Insert USB memory into Windows 10 PC

4.Start Rufus(rufus-3.3.exe)、Writing an Ubuntu 16.04 image in DD mode

Rufus main screen (DD mode designation dialog is displayed after pressing the start button)

01.png

Specify DD mode

02.png

State of writing

03.png

5.Remove USB memory from Windows 10 PC


● Working with LattePanda Alpha 864

6.Connect the Wi-Fi antenna, keyboard, mouse, HDMI cable / display, USB memory to LattePanda Alpha and finally connect the power

例1) Wi-Fi antenna connection (There are two antennas)

ezgif.com-optimize1.gif

例2) Connecting the HDMI cable

ezgif.com-optimize2.gif

例3) All parts connected state

12.jpg

例4) Connect the power supply Tpye-C cable

When the Type-C cable is connected, the red LED for energization confirmation is always on and the blue LED lights momentarily.

Wait for the blue LED to blink, press and hold the power button for 3 seconds to turn on the power, and the blue LED is always on.

ezgif.com-optimize3.gif

7.Latte Panda Alpha's power turns on and at the same time hits the keyboard's Esc key

8.BootBoot Option #1 select, and Enter

DSC_0110.jpg

9.USB memory name + Partition1 select, and Enter

DSC_0111.jpg

10.Save & ExitSave Changes and Exit select, and Enter

DSC_0112.jpg

11.Yes select, and Enter

DSC_0113.jpg

12.Install Ubuntu select, and Enter

DSC_0114.jpg

13.Wait for a while

DSC_0115.jpg

14.English select, and Continue

DSC_0116.jpg

15.When connecting to Wi-Fi, Connect to this network sekect、and Select the SSID from the list Connect

DSC_0117.jpg

16.Enter the Wi-Fi password, and Connect

DSC_0118.jpg

17.Install third-party software for graphics and Wi-Fi hardware, Flash, MP3 and other media select, and Continue

DSC_0119.jpg

18.Erase disk and install Ubuntu select, and Install Now

DSC_0120.jpg

19.Continue

DSC_0121.jpg

20.Tokyo select, and Continue

DSC_0122.jpg

21.Select Japanese respectively from the left and right columns, Continue

DSC_0123.jpg

22.Enter user ID, terminal name, password, Continue

DSC_0124.jpg

23.Wait for a while

DSC_0125.jpg

24.Restart Now

※Rebooting starts but if it does not work, disconnect the power cable once and turn it on again

DSC_0126.jpg

25.Ubuntu 16.04 startup completed

DSC_0127.jpg

26.After logging on, start the terminal and update


Update_command

$ sudo apt-get update

$ sudo apt-get upgrade

Official installation procedure

http://docs.lattepanda.com/content/alpha_edition/power_on/


◆ Installation of OpenVINO

OpenVINO version to be installed: 2018.4.420


● Installation of OpenVINO main unit

Official tutorial


● Additional installation for Intel Movidius Neural Compute Stick v1 / v2

Execute the following command.


Update_USB_access_rule

$ cd ~

$ sudo usermod -a -G users "$(whoami)"
$ sudo cat <<EOF > 97-usbboot.rules
SUBSYSTEM=="usb", ATTRS{idProduct}=="2150", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
EOF

$ sudo cp 97-usbboot.rules /etc/udev/rules.d/
$ sudo udevadm control --reload-rules
$ sudo udevadm trigger
$ sudo ldconfig
$ sudo rm 97-usbboot.rules


Use the following command to create a symbolic manually.


Symbolic_manual_generation_command

$ cd /opt/intel/common/mdf/lib64

$ sudo mv igfxcmrt64.so igfxcmrt64.so.org
$ sudo ln -s libigfxcmrt64.so igfxcmrt64.so

$ cd /opt/intel/mediasdk/lib64
$ sudo mv libmfxhw64.so.1 libmfxhw64.so.1.org
$ sudo mv libmfx.so.1 libmfx.so.1.org
$ sudo mv libva-glx.so.2 libva-glx.so.2.org
$ sudo mv libva.so.2 libva.so.2.org
$ sudo mv libigdgmm.so.1 libigdgmm.so.1.org
$ sudo mv libva-drm.so.2 libva-drm.so.2.org
$ sudo mv libva-x11.so.2 libva-x11.so.2.org
$ sudo ln -s libmfxhw64.so.1.28 libmfxhw64.so.1
$ sudo ln -s libmfx.so.1.28 libmfx.so.1
$ sudo ln -s libva-glx.so.2.300.0 libva-glx.so.2
$ sudo ln -s libva.so.2.300.0 libva.so.2
$ sudo ln -s libigdgmm.so.1.0.0 libigdgmm.so.1
$ sudo ln -s libva-drm.so.2.300.0 libva-drm.so.2
$ sudo ln -s libva-x11.so.2.300.0 libva-x11.so.2

Run sudo ldconfig again.


Rerun_sudo_ldconfig

$ cd ~

$ sudo ldconfig

Introduced by default OpenCV 4.0.0-pre has a bug in Gstreamer and it did not work properly, so reintroduce OpenCV 3.4.3 on its own.

Execute the following command.


Introduction_of_OpenCV3.4.3

$ sudo -H pip3 install opencv-python==3.4.3.18

$ nano ~/.bashrc
export PYTHONPATH=/usr/local/lib/python3.5/dist-packages/cv2:$PYTHONPATH

$ source ~/.bashrc

Official installation procedure

Intel®Movidius™Neural Compute Stick and Intel®Neural Compute Stick 2 additional installation procedure


● Upgrade to Tensorflow v1.11.0

Upgrade to old version Tensorflow v1.9.0, introduced by default, to Tensorflow v1.11.0, since subsequent model optimizer processing will fail.


Tensorflow_version_check_command_example

$ python3 -c 'import tensorflow as tf; print(tf.__version__)'

1.9.0


Upgrade_command_to_Tensorflow_v1.11.0

$ sudo -H pip3 install pip --upgrade

$ sudo -H pip3 install tensorflow==1.11.0 --upgrade


● Settings for offloading custom layer behavior to Tensorflow

Intel official tutorial - Offloading Computations to TensorFlow*

You can offload custom layer operations not supported by the OpenVINO standard API to Tensorflow side.

Execute the following command and self-build the inference engine layer using the Tensorflow runtime.

However, there are bugs in the scripts provided by Intel, so it is necessary to manually correct them.

Also, it is necessary to introduce Bazel at this timing 0.18.1.

As of November 17, 2018, 0.19.0 or more, the inference engine layer can not build normally, so be careful.

In the case of a terminal not fully loaded with RAM like LattePanda Alpha, for example 1GB of RAM,

sudo -H $HOME/bin/bazel build --config monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so



sudo -H $HOME/bin/bazel --host_jvm_args=-Xmx512m build --config monolithic --local_resources 1024.0,0.5,0.5 //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so

If you read like this, the build may succeed.


Build_inference_engine_layer_at_TensorFlow_runtime

$ sudo apt-get install -y git pkg-config zip g++ zlib1g-dev unzip

$ cd ~
$ wget https://github.com/bazelbuild/bazel/releases/download/0.18.1/bazel-0.18.1-installer-linux-x86_64.sh
$ sudo chmod +x bazel-0.18.1-installer-linux-x86_64.sh
$ ./bazel-0.18.1-installer-linux-x86_64.sh --user
$ echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
$ source ~/.bashrc
$ cd /opt
$ sudo git clone -b v1.11.0 https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ sudo git checkout -b v1.11.0
$ echo 'export TF_ROOT_DIR=/opt/tensorflow' >> ~/.bashrc
$ source ~/.bashrc
$ sudo nano /opt/intel/computer_vision_sdk/bin/setupvars.sh

#Before
INSTALLDIR=/opt/intel//computer_vision_sdk_2018.4.420

#After
INSTALLDIR=/opt/intel/computer_vision_sdk_2018.4.420

$ source /opt/intel/computer_vision_sdk/bin/setupvars.sh
$ sudo nano /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer/tf_call_ie_layer/build.sh

#Before
bazel build --config=monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so

#After
sudo -H $HOME/bin/bazel build --config monolithic //tensorflow/cc/inference_engine_layer:libtensorflow_call_layer.so

$ sudo -E /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer/tf_call_ie_layer/build.sh


The inference engine layer is generated in the following path.


libtensorflow_call_layer.so's_PATH

/opt/tensorflow/bazel-bin/tensorflow/cc/inference_engine_layer/libtensorflow_call_layer.so


In this case, when you run python, there is no access privilege to /opt under the ordinary user Permission denied Because an error occurs, change the placement place.


Changing_the_location_of_.so

$ su -

$ cp /opt/tensorflow/bazel-bin/tensorflow/cc/inference_engine_layer/libtensorflow_call_layer.so /usr/local/lib
$ exit
$ nano ~/.bashrc
export PYTHONPATH=$PYTHONPATH:/usr/local/lib
$ source ~/.bashrc
$ sudo ldconfig


◆ Taste of the demonstration program


● Sample image classification

Execute the following command.


Image_Classification_Sample_SqueezeNet

$ cd /opt/intel/computer_vision_sdk/deployment_tools/demo

$ ./demo_squeezenet_download_convert_run.sh

Load the image shown below. . .

car.jpg

It seems that 80% probability sports car recognized.


Result

###################################################


Run Inference Engine classification sample

Run ./classification_sample -d CPU -i /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png -m /home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.xml

[ INFO ] InferenceEngine:
API version ............ 1.4
Build .................. 17328
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ] /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png
[ INFO ] Loading plugin

API version ............ 1.4
Build .................. lnx_20181004
Description ....... MKLDNNPlugin
[ INFO ] Loading network files:
/home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.xml
/home/alpha/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.bin
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (787, 259) to (227, 227)
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (1 iterations)
[ INFO ] Processing output blobs

Top 10 results:

Image /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png

817 0.8363345 label sports car, sport car
511 0.0946488 label convertible
479 0.0419131 label car wheel
751 0.0091071 label racer, race car, racing car
436 0.0068161 label beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon
656 0.0037564 label minivan
586 0.0025741 label half track
717 0.0016069 label pickup, pickup truck
864 0.0012027 label tow truck, tow car, wrecker
581 0.0005882 label grille, radiator grille

total inference time: 6.5318211
Average running time of one iteration: 6.5318211 ms

Throughput: 153.0966609 FPS

[ INFO ] Execution successful

###################################################

Demo completed successfully.



● Sample of three step inference

It seems to be a sample that makes three levels of inference run continuously in separate learning models.


  • Car detection

  • Detection of license plate

  • Character recognition in the identified license plate

Execute the following command.


Sample_of_three_step_inference

$ cd /opt/intel/computer_vision_sdk/deployment_tools/demo

$ ./demo_security_barrier_camera.sh

license-plate.jpeg


● Various sample programs other than the above

Intel® Distribution of OpenVINO™ Toolkit - Inference Engine Samples


◆ Proprietary model conversion and execution sample script

Official tutorial - Using the Model Optimizer to Convert TensorFlow* Models

Official tutorial - Model Optimizer Developer Guide - TensorFlow* Models with Custom Layers



Sample script below to convert .pb (FreezeGraph) of Tensorflow to IR format for OpenVINO.


Convert_command

$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer

$ python3 mo_tf.py --input_model <INPUT_MODEL>.pb

Conversion command options and explanation


Conversion_command_options

optional arguments:

-h, --help show this help message and exit
--framework {tf,caffe,mxnet,kaldi,onnx}
Name of the framework used to train the input model.

Framework-agnostic parameters:
--input_model INPUT_MODEL, -w INPUT_MODEL, -m INPUT_MODEL
Tensorflow*: a file with a pre-trained model (binary
or text .pb file after freezing). Caffe*: a model
proto file with model weights
--model_name MODEL_NAME, -n MODEL_NAME
Model_name parameter passed to the final create_ir
transform. This parameter is used to name a network in
a generated IR and output .xml/.bin files.
--output_dir OUTPUT_DIR, -o OUTPUT_DIR
Directory that stores the generated IR. By default, it
is the directory from where the Model Optimizer is
launched.
--input_shape INPUT_SHAPE
Input shape(s) that should be fed to an input node(s)
of the model. Shape is defined as a comma-separated
list of integer numbers enclosed in parentheses or
square brackets, for example [1,3,227,227] or
(1,227,227,3), where the order of dimensions depends
on the framework input layout of the model. For
example, [N,C,H,W] is used for Caffe* models and
[N,H,W,C] for TensorFlow* models. Model Optimizer
performs necessary transformations to convert the
shape to the layout required by Inference Engine
(N,C,H,W). The shape should not contain undefined
dimensions (? or -1) and should fit the dimensions
defined in the input operation of the graph. If there
are multiple inputs in the model, --input_shape should
contain definition of shape for each input separated
by a comma, for example: [1,3,227,227],[2,4] for a
model with two inputs with 4D and 2D shapes.
--scale SCALE, -s SCALE
All input values coming from original network inputs
will be divided by this value. When a list of inputs
is overridden by the --input parameter, this scale is
not applied for any input that does not match with the
original input of the model.
--reverse_input_channels
Switch the input channels order from RGB to BGR (or
vice versa). Applied to original inputs of the model
if and only if a number of channels equals 3. Applied
after application of --mean_values and --scale_values
options, so numbers in --mean_values and
--scale_values go in the order of channels used in the
original model.
--log_level {CRITICAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}
Logger level
--input INPUT The name of the input operation of the given model.
Usually this is a name of the input placeholder of the
model.
--output OUTPUT The name of the output operation of the model. For
TensorFlow*, do not add :0 to this name.
--mean_values MEAN_VALUES, -ms MEAN_VALUES
Mean values to be used for the input image per
channel. Values to be provided in the (R,G,B) or
[R,G,B] format. Can be defined for desired input of
the model, for example: "--mean_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--scale_values SCALE_VALUES
Scale values to be used for the input image per
channel. Values are provided in the (R,G,B) or [R,G,B]
format. Can be defined for desired input of the model,
for example: "--scale_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--data_type {FP16,FP32,half,float}
Data type for all intermediate tensors and weights. If
original model is in FP32 and --data_type=FP16 is
specified, all model weights and biases are quantized
to FP16.
--disable_fusing Turn off fusing of linear operations to Convolution
--disable_resnet_optimization
Turn off resnet optimization
--finegrain_fusing FINEGRAIN_FUSING
Regex for layers/operations that won't be fused.
Example: --finegrain_fusing Convolution1,.*Scale.*
--disable_gfusing Turn off fusing of grouped convolutions
--move_to_preprocess Move mean values to IR preprocess section
--extensions EXTENSIONS
Directory or a comma separated list of directories
with extensions. To disable all extensions including
those that are placed at the default location, pass an
empty string.
--batch BATCH, -b BATCH
Input batch size
--version Version of Model Optimizer
--silent Prevent any output messages except those that
correspond to log level equals ERROR, that can be set
with the following option: --log_level. By default,
log level is already ERROR.
--freeze_placeholder_with_value FREEZE_PLACEHOLDER_WITH_VALUE
Replaces input layer with constant node with provided
value, e.g.: "node_name->True"
--generate_deprecated_IR_V2
Force to generate legacy/deprecated IR V2 to work with
previous versions of the Inference Engine. The
resulting IR may or may not be correctly loaded by
Inference Engine API (including the most recent and
old versions of Inference Engine) and provided as a
partially-validated backup option for specific
deployment scenarios. Use it at your own discretion.
By default, without this option, the Model Optimizer
generates IR V3.



Tensorflow-specific conversion command options and explanation


Tensorflow-specific_conversion_command_options

TensorFlow*-specific parameters:

--input_model_is_text
TensorFlow*: treat the input model file as a text
protobuf format. If not specified, the Model Optimizer
treats it as a binary file by default.
--input_checkpoint INPUT_CHECKPOINT
TensorFlow*: variables file to load.
--input_meta_graph INPUT_META_GRAPH
Tensorflow*: a file with a meta-graph of the model
before freezing
--saved_model_dir SAVED_MODEL_DIR
TensorFlow*: directory representing non frozen model
--saved_model_tags SAVED_MODEL_TAGS
Group of tag(s) of the MetaGraphDef to load, in string
format, separated by ','. For tag-set contains
multiple tags, all tags must be passed in.
--offload_unsupported_operations_to_tf
TensorFlow*: automatically offload unsupported
operations to TensorFlow*
--tensorflow_subgraph_patterns TENSORFLOW_SUBGRAPH_PATTERNS
TensorFlow*: a list of comma separated patterns that
will be applied to TensorFlow* node names to infer a
part of the graph using TensorFlow*.
--tensorflow_operation_patterns TENSORFLOW_OPERATION_PATTERNS
TensorFlow*: a list of comma separated patterns that
will be applied to TensorFlow* node type (ops) to
infer these operations using TensorFlow*.
--tensorflow_custom_operations_config_update TENSORFLOW_CUSTOM_OPERATIONS_CONFIG_UPDATE
TensorFlow*: update the configuration file with node
name patterns with input/output nodes information.
--tensorflow_use_custom_operations_config TENSORFLOW_USE_CUSTOM_OPERATIONS_CONFIG
TensorFlow*: use the configuration file with custom
operation description.
--tensorflow_object_detection_api_pipeline_config TENSORFLOW_OBJECT_DETECTION_API_PIPELINE_CONFIG
TensorFlow*: path to the pipeline configuration file
used to generate model created with help of Object
Detection API.
--tensorboard_logdir TENSORBOARD_LOGDIR
TensorFlow*: dump the input graph to a given directory
that should be used with TensorBoard.
--tensorflow_custom_layer_libraries TENSORFLOW_CUSTOM_LAYER_LIBRARIES
TensorFlow*: comma separated list of shared libraries
with TensorFlow* custom operations implementation.
--disable_nhwc_to_nchw
Disables default translation from NHWC to NCHW



◆ Self-generated model・Semantic Segmentation 「UNet」 Conversion

First of all, I will try from UNet whose structure is super simple.

The .pb file is placed in TensorflowLite-UNet - PINTO0309 - Github

This is a model of Semantic Segmentation that I have learned only Person class.

TensorflowLite-UNet/model/semanticsegmentation_frozen_person_32.pb (31.1MB)


● Conversion to data type FP16

Execute the following command.

--input_model is the name of the .pb file to be converted (FreezeGraph name)

--output_dir is output destination path of converted lr file

--input is input node name (placeholder name)

--output is output node name

--data_type is data precision type name after conversion [FP16/FP32/half/float]

--batch is forced substitution of input batch size

--scale is normalization of value range

--mean_values is specify the average subtraction value of BGR value in pixel units

--offload_unsupported_operations_to_tf is specification for offloading Tensorflow's custom layer that can not be processed by OpenVINO to Tensorflow side


Conversion_script_of_my_own_"UNet"_model_to_IR_FP16

$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer

$ sudo mkdir -p 01_pbmodels/UNet
$ sudo mkdir -p 10_lrmodels/UNet/FP16
$ sudo wget https://github.com/PINTO0309/TensorflowLite-UNet/raw/master/model/semanticsegmentation_frozen_person_32.pb -P 01_pbmodels/UNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
--output_dir 10_lrmodels/UNet/FP16 \
--input input \
--output output/BiasAdd \
--data_type FP16 \
--batch 1

<Reference POST for calculating RGB average value>

https://forums.fast.ai/t/images-normalization/4058

https://github.com/DrSleep/tensorflow-deeplab-resnet/issues/106


Sample_logic_for_average_calculation_of_RGB_by_PYTHON

# Calculate RGB average value per image

mean = np.mean(jpgimg, axis=(0, 1))
meanB += mean[0]
meanG += mean[1]
meanR += mean[2]
# Calculate RGB average values ​​of all learning images
print("meanB =", meanB / imgcnt)
print("meanG =", meanG / imgcnt)
print("meanR =", meanR / imgcnt)

Since it converted from the model of FP32 to FP16, the apparent file size became 15.5MB which is half the size before conversion.

lr conversion log


conversion_log

Model Optimizer arguments:

Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16
- IR output name: semanticsegmentation_frozen_person_32
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: output/BiasAdd
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP16
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: False
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d

[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16/semanticsegmentation_frozen_person_32.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP16/semanticsegmentation_frozen_person_32.bin
[ SUCCESS ] Total execution time: 3.86 seconds.





SITWXV~4.jpg


● Conversion to data type FP32

Execute the following command.


Conversion_script_of_my_own_"UNet"_model_to_IR_FP32

$ cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer

$ sudo mkdir -p 01_pbmodels/UNet
$ sudo mkdir -p 10_lrmodels/UNet/FP32
$ sudo wget https://github.com/PINTO0309/TensorflowLite-UNet/raw/master/model/semanticsegmentation_frozen_person_32.pb -P 01_pbmodels/UNet
$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
--output_dir 10_lrmodels/UNet/FP32 \
--input input \
--output output/BiasAdd \
--data_type FP32 \
--batch 1

lr conversion log


conversion_log

Model Optimizer arguments:

Common parameters:
- Path to the Input Model: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb
- Path for generated IR: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32
- IR output name: semanticsegmentation_frozen_person_32
- Log level: ERROR
- Batch: 1
- Input layers: input
- Output layers: output/BiasAdd
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP32
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: False
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Offload unsupported operations: False
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Operations to offload: None
- Patterns to offload: None
- Use the config file: None
Model Optimizer version: 1.4.292.6ef7232d

[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.xml
[ SUCCESS ] BIN file: /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.bin
[ SUCCESS ] Total execution time: 3.70 seconds.





SKTIWX~T.jpg


◆ Construction and execution of UNet execution environment by OpenVINO


UNet_executable_program_sample_for_real-time_segmentation

import sys

import cv2
import numpy as np
from PIL import Image
import time
from openvino.inference_engine import IENetwork, IEPlugin

model_xml='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.xml'
model_bin='/opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/10_lrmodels/UNet/FP32/semanticsegmentation_frozen_person_32.bin'
net = IENetwork.from_ir(model=model_xml, weights=model_bin)
seg_image = Image.open("data/input/009649.png")
palette = seg_image.getpalette() # Get a color palette
index_void = 2 # Define index_void Back Ground
camera_width = 320
camera_height = 240
fps = ""
elapsedTime = 0

plugin = IEPlugin(device="HETERO:MYRIAD,CPU")
plugin.set_config({"TARGET_FALLBACK": "HETERO:MYRIAD,CPU"})
plugin.set_initial_affinity(net)

#plugin = IEPlugin(device="MYRIAD")
#plugin = IEPlugin(device="CPU")

exec_net = plugin.load(network=net)

input_blob = next(iter(net.inputs)) #input_blob = 'input'
out_blob = next(iter(net.outputs)) #out_blob = 'output/BiasAdd'
n, c, h, w = net.inputs[input_blob].shape #n, c, h, w = 1, 3, 256, 256

del net

cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FPS, 30)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, camera_width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, camera_height)
time.sleep(1)

while cap.isOpened():
t1 = time.time()
ret, frame = cap.read()
if not ret:
break
#frame = cv2.imread('data/input/000003.jpg')
prepimg = frame[:, :, ::-1].copy()
#prepimg = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
prepimg = Image.fromarray(prepimg)
prepimg = prepimg.resize((256, 256), Image.ANTIALIAS)
prepimg = np.asarray(prepimg) / 255.0
prepimg = prepimg.transpose((2, 0, 1)).reshape((1, c, h, w))

t2 = time.perf_counter()
exec_net.start_async(request_id=0, inputs={input_blob: prepimg})

if exec_net.requests[0].wait(-1) == 0:
outputs = exec_net.requests[0].outputs[out_blob] # (1, 3, 256, 256)
print("SegmentationTime = {:.7f}".format(time.perf_counter() - t2))
outputs = outputs.transpose((2, 3, 1, 0)).reshape((h, w, c)) # (256, 256 3)
outputs = cv2.resize(outputs, (camera_width, camera_height)) # (240, 320, 3)

# View
res = np.argmax(outputs, axis=2)
if index_void is not None:
res = np.where(res == index_void, 0, res)
image = Image.fromarray(np.uint8(res), mode="P")
image.putpalette(palette)
image = image.convert("RGB")

image = np.asarray(image)
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
image = cv2.addWeighted(frame, 1, image, 0.9, 0)

cv2.putText(image, fps, (camera_width-180,15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (38,0,255), 1, cv2.LINE_AA)
cv2.imshow("Result", image)

if cv2.waitKey(1)&0xFF == ord('q'):
break
elapsedTime = time.time() - t1
fps = "(Playback) {:.1f} FPS".format(1/elapsedTime)

cv2.destroyAllWindows()
del exec_net
del plugin


000003.jpg

4.jpg

◆ Measurement result of processing speed

Inference_Time2.jpg

With USB camera shooting, the performance of 4 FPS to 5 FPS was out.