はじめに
意外と、OpenVINOでINT8を扱った記事がないので、記事にする。
デバイスはCPUとする。
ここでは、速度改善の効果を示す。
改善効果の確認は、OpenVINOで提供されているbenchmarkを使用した。
検討条件
検討条件
CPU:Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz
OpenVINOバージョン:openvino_2019.3.379
モデル:face-detection-retail-0004
(スペックは、↓)
モデルの取得元
※ INT8のモデルが提供されているモデルは、一部です。
結果
FP32
C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool>python benchmark_app.py -m face-detection-retail-0004_FP32.xml -i car.png -l cpu_extension_avx2.dll -d CPU
[Step 1/11] Parsing and validating input arguments
benchmark_app.py:21: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ] -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading Inference Engine
[ INFO ] CPU extensions is loaded cpu_extension_avx2.dll
[ INFO ] InferenceEngine:
API version............. 2.1.32974
[ INFO ] Device info
CPU
MKLDNNPlugin............ version 2.1
Build................... 32974
[Step 3/11] Reading the Intermediate Representation network
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1, precision: MIXED
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 300 300
[ WARNING ] Some binary input files will be ignored: only 0 files are required from 1
[ WARNING ] Some image input files will be duplicated: 4 files are required, but only 1 were provided
[ INFO ] Infer Request 0 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 1 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 2 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 3 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count: 6664 iterations
Duration: 60087.78 ms
Latency: 33.34 ms
Throughput: 110.90 FPS
↓2回目FPS等の結果の値のみ。
Count: 6172 iterations
Duration: 60087.01 ms
Latency: 34.10 ms
Throughput: 102.72 FPS
INT8
C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool>python benchmark_app.py -m face-detection-retail-0004_I8.xml -i car.png -l cpu_extension_avx2.dll -d CPU
[Step 1/11] Parsing and validating input arguments
benchmark_app.py:21: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ] -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading Inference Engine
[ INFO ] CPU extensions is loaded cpu_extension_avx2.dll
[ INFO ] InferenceEngine:
API version............. 2.1.32974
[ INFO ] Device info
CPU
MKLDNNPlugin............ version 2.1
Build................... 32974
[Step 3/11] Reading the Intermediate Representation network
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1, precision: MIXED
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 300 300
[ WARNING ] Some binary input files will be ignored: only 0 files are required from 1
[ WARNING ] Some image input files will be duplicated: 4 files are required, but only 1 were provided
[ INFO ] Infer Request 0 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 1 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 2 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[ INFO ] Infer Request 3 filling
[ INFO ] Prepare image C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\car.png
[ WARNING ] Image is resized from ((259, 787)) to ((300, 300))
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count: 9404 iterations
Duration: 60045.11 ms
Latency: 23.62 ms
Throughput: 156.62 FPS
↓2回目FPS等の結果の値のみ。
Count: 9384 iterations
Duration: 60043.70 ms
Latency: 23.63 ms
Throughput: 156.29 FPS
結論
FP32に比してINT8は、約1.5倍、速くなる。
贅沢を言えば、4倍が期待値かもしれないが。。。どこに消えるんですかね。
補足
CPUのオプション?
python benchmark_app.py -m face-detection-retail-0004_I8.xml -i car.png -l cpu_extension_avx2.dll -d CPU
あまりよくわかっていませんが、CPUによって??、
-l cpu_extension_avx2.dll
等の指定が必要。
INT8のモデルを自分で作成する場合
INT8のモデルを自分で作成する場合は、以下?。
わりと難しそう!!
出典:https://docs.openvinotoolkit.org/2019_R3/_docs_Workbench_DG_Int_8_Quantization.html
まとめ
特にありません。
コメントなどあれば、お願いします。