5
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

深層学習。OpenVINOでINT8(on CPU)の効果を確認する。

Last updated at Posted at 2020-05-05

はじめに

意外と、OpenVINOでINT8を扱った記事がないので、記事にする。
デバイスはCPUとする。

ここでは、速度改善の効果を示す。
改善効果の確認は、OpenVINOで提供されているbenchmarkを使用した。

検討条件

検討条件

CPU:Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz
OpenVINOバージョン:openvino_2019.3.379
モデル:face-detection-retail-0004
     (スペックは、↓)
 

(出典:https://docs.openvinotoolkit.org/2019_R1/_face_detection_retail_0004_description_face_detection_retail_0004.html)

spec_face.png

モデルの取得元

※ INT8のモデルが提供されているモデルは、一部です。

結果

FP32

C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool>python benchmark_app.py -m face-detection-retail-0004_FP32.xml -i car.png -l cpu_extension_avx2.dll -d CPU
[Step 1/11] Parsing and validating input arguments
benchmark_app.py:21: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] CPU extensions is loaded cpu_extension_avx2.dll
[ INFO ] InferenceEngine:
         API version............. 2.1.32974
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 32974

[Step 3/11] Reading the Intermediate Representation network
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1, precision: MIXED
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 300 300
[ WARNING ] Some binary input files will be ignored: only 0 files are required from 1
[ WARNING ] Some image input files will be duplicated: 4 files are required, but only 1 were provided
[ INFO ] Infer Request 0 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 1 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 2 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 3 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      6664 iterations
Duration:   60087.78 ms
Latency:    33.34 ms
Throughput: 110.90 FPS

↓2回目FPS等の結果の値のみ。

Count:      6172 iterations
Duration:   60087.01 ms
Latency:    34.10 ms
Throughput: 102.72 FPS

INT8

C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool>python benchmark_app.py -m face-detection-retail-0004_I8.xml -i car.png -l cpu_extension_avx2.dll -d CPU
[Step 1/11] Parsing and validating input arguments
benchmark_app.py:21: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] CPU extensions is loaded cpu_extension_avx2.dll
[ INFO ] InferenceEngine:
         API version............. 2.1.32974
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 32974

[Step 3/11] Reading the Intermediate Representation network
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1, precision: MIXED
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 300 300
[ WARNING ] Some binary input files will be ignored: only 0 files are required from 1
[ WARNING ] Some image input files will be duplicated: 4 files are required, but only 1 were provided
[ INFO ] Infer Request 0 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 1 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 2 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 3 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      9404 iterations
Duration:   60045.11 ms
Latency:    23.62 ms
Throughput: 156.62 FPS

↓2回目FPS等の結果の値のみ。

Count:      9384 iterations
Duration:   60043.70 ms
Latency:    23.63 ms
Throughput: 156.29 FPS

結論

FP32に比してINT8は、約1.5倍、速くなる。
贅沢を言えば、4倍が期待値かもしれないが。。。どこに消えるんですかね。

補足

CPUのオプション?

python benchmark_app.py -m face-detection-retail-0004_I8.xml -i car.png -l cpu_extension_avx2.dll -d CPU

あまりよくわかっていませんが、CPUによって??、

-l cpu_extension_avx2.dll

等の指定が必要。

INT8のモデルを自分で作成する場合

INT8のモデルを自分で作成する場合は、以下?。
わりと難しそう!!

出典:https://docs.openvinotoolkit.org/2019_R3/_docs_Workbench_DG_Int_8_Quantization.html

int8.png

まとめ

特にありません。
コメントなどあれば、お願いします。

5
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
5
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?