More than 5 years have passed since last update.

深層学習。OpenVINOでINT8(on CPU)の効果を確認する。

Last updated at 2020-05-05Posted at 2020-05-05

はじめに

意外と、OpenVINOでINT8を扱った記事がないので、記事にする。
デバイスはCPUとする。

ここでは、速度改善の効果を示す。
改善効果の確認は、OpenVINOで提供されているbenchmarkを使用した。

検討条件

CPU：Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz
OpenVINOバージョン:openvino_2019.3.379
モデル：face-detection-retail-0004
　　　　　（スペックは、↓）
　

(出典：https://docs.openvinotoolkit.org/2019_R1/_face_detection_retail_0004_description_face_detection_retail_0004.html）

モデルの取得元

※ INT8のモデルが提供されているモデルは、一部です。

結果

FP32

C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool>python benchmark_app.py -m face-detection-retail-0004_FP32.xml -i car.png -l cpu_extension_avx2.dll -d CPU
[Step 1/11] Parsing and validating input arguments
benchmark_app.py:21: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] CPU extensions is loaded cpu_extension_avx2.dll
[ INFO ] InferenceEngine:
         API version............. 2.1.32974
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 32974

[Step 3/11] Reading the Intermediate Representation network
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1, precision: MIXED
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 300 300
[ WARNING ] Some binary input files will be ignored: only 0 files are required from 1
[ WARNING ] Some image input files will be duplicated: 4 files are required, but only 1 were provided
[ INFO ] Infer Request 0 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 1 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 2 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 3 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      6664 iterations
Duration:   60087.78 ms
Latency:    33.34 ms
Throughput: 110.90 FPS

↓2回目FPS等の結果の値のみ。

Count:      6172 iterations
Duration:   60087.01 ms
Latency:    34.10 ms
Throughput: 102.72 FPS

INT8

C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool>python benchmark_app.py -m face-detection-retail-0004_I8.xml -i car.png -l cpu_extension_avx2.dll -d CPU
[Step 1/11] Parsing and validating input arguments
benchmark_app.py:21: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] CPU extensions is loaded cpu_extension_avx2.dll
[ INFO ] InferenceEngine:
         API version............. 2.1.32974
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 32974

[Step 3/11] Reading the Intermediate Representation network
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1, precision: MIXED
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 300 300
[ WARNING ] Some binary input files will be ignored: only 0 files are required from 1
[ WARNING ] Some image input files will be duplicated: 4 files are required, but only 1 were provided
[ INFO ] Infer Request 0 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 1 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 2 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 3 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      9404 iterations
Duration:   60045.11 ms
Latency:    23.62 ms
Throughput: 156.62 FPS

↓2回目FPS等の結果の値のみ。

Count:      9384 iterations
Duration:   60043.70 ms
Latency:    23.63 ms
Throughput: 156.29 FPS

結論

FP32に比してINT8は、約1.5倍、速くなる。
贅沢を言えば、4倍が期待値かもしれないが。。。どこに消えるんですかね。

補足

CPUのオプション？

python benchmark_app.py -m face-detection-retail-0004_I8.xml -i car.png -l cpu_extension_avx2.dll -d CPU

あまりよくわかっていませんが、CPUによって？？、

-l cpu_extension_avx2.dll

等の指定が必要。

INT8のモデルを自分で作成する場合

INT8のモデルを自分で作成する場合は、以下？。
わりと難しそう！！

出典：https://docs.openvinotoolkit.org/2019_R3/_docs_Workbench_DG_Int_8_Quantization.html

まとめ

特にありません。
コメントなどあれば、お願いします。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up