LoginSignup
2
1

More than 3 years have passed since last update.

ROCm2.3でTensorflow-rocm1.13.1+MIOpen1.8.0の性能が向上したかどうかの検証

Last updated at Posted at 2019-04-25

https://qiita.com/_JG1WWK/items/1db6504c77894c0f08aa#rx470-16gb-4
AMD_GPUでTensorFlow benchmarksを行い深層学習性能のおおよその性能を検証する(仮)
https://qiita.com/_JG1WWK/items/6bae45d55d9421e24e4a
RX570 16GB版のTensorflow-BenchMarks(Tensorflow-ROCm)結果

この記事のROCm2.3版です
2.3では機械学習性能が向上したという報告がありましたのでresnet50を中心に検証していきたいと思います。

ソフトウェア環境は以下の通り

ROCm-version

$  apt show rocm-libs -a
Package: rocm-libs
Version: 2.3.14
Priority: optional
Section: devel
Maintainer: Advanced Micro Devices Inc.
I nstalled-Size: 13.3 kB
Depends: rocfft, rocrand, hipblas, rocblas
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 766 B
APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

MIOpen-version

$ apt show miopen-hip -a
Package: miopen-hip
Version: 1.8.0-492700c
Priority: optional
Section: devel
Maintainer: Paul Fultz II <paul.fultz@amd.com>
Installed-Size: 95.3 MB
Depends: rocm-opencl-dev, rocm-utils, hip_hcc, miopengemm
Download-Size: 5,312 kB
APT-Manual-Installed: yes
APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages
Description: AMD's DNN Library

OS構成

$ uname -a
Linux rocm2 4.15.0-47-generic #50~16.04.1-Ubuntu SMP Fri Mar 15 16:06:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Tensorflow-benchmarksのダウンロード

$ git clone https://github.com/tensorflow/benchmarks.git -b cnn_tf_v1.13_compatible

Tensorflow-rocmの環境構築

condaのversionは以下の通りです(再現する場合はminicondaを入れてください https://qiita.com/_JG1WWK/items/1817b6488526778aa8f2)

$ conda -V
conda 4.5.12

ひとまずこんな感じでTensorflow-rocm1.13環境を立ち上げます

$ conda create -n tensorflowtest python=3.5
$ conda activate tensorflowtest
$ pip install tensorflow-rocm==1.13.1

piplist

Package              Version  
-------------------- ---------
absl-py              0.7.1    
astor                0.7.1    
certifi              2018.8.24
gast                 0.2.2    
grpcio               1.20.0   
h5py                 2.9.0    
Keras-Applications   1.0.7    
Keras-Preprocessing  1.0.9    
Markdown             3.1      
mock                 2.0.0    
numpy                1.16.2   
pbr                  5.1.3    
pip                  10.0.1   
protobuf             3.7.1    
setuptools           40.2.0   
six                  1.12.0   
tensorboard          1.13.1   
tensorflow-estimator 1.13.0   
tensorflow-rocm      1.13.2   
termcolor            1.1.0    
Werkzeug             0.15.2   
wheel                0.31.1   

テストするGPUはRadeonⅦ、VegaFE,RX570です

ハードウェア環境

CPU Xeon E5-2603 v4
MB msi-x99 Gaming7
RAM DDR4-2400 32GB
GPU0 NVIDIA GTX1080Ti (グラフィック表示用兼CUDA用)
GPU1 AMD Vega Frontier Edition
OS Ubuntu16.04.6 LST kernel version 4.15

実行コマンド系

InceptionV3

 python ./tf_cnn_benchmarks.py --num_gpus=1  --model inception3 --batch_size 32

inceptionV3 FP16適用

python ./tf_cnn_benchmarks.py --num_gpus=1  --model inception3 --batch_size 32 --use_fp16

Resnet50

python  ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32
Resnet50 Fp16適用
TF_ROCM_FUSION_ENABLE=1 python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16

Resnet152

 python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32
Resnet152 FP16適用
 TF_ROCM_FUSION_ENABLE=1 python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32 --use_fp16

ALexnet

 python ./tf_cnn_benchmarks.py --num_gpus=1 --model  alexnet --batch_size 32
Alexnet FP16適用
TF_ROCM_FUSION_ENABLE=1  python ./tf_cnn_benchmarks.py --num_gpus=1 --model alexnet  --batch_size 32  --use_fp16

VGG16

 python ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32

VGG16 FP16適用

 python ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32 --use_fp16

ベンチマーク

実行結果のまとめ

先に簡単なグラフですがまとめておきます。
ROCm2.3 Tensorlow-rocmベンチマーク(1).png

WPS Officeが手元になくあまり見やすいデーターではなくて申し訳ありません。
RX570 16GBのFP16実行は非常に不安定で正直使い物になる感じじゃなかったので参考データー扱いでおねがいします。
またVGG16では原因はわかりませんがFP32でのみRX570は完走できないなどgfx803での挙動が正直不安定な感じしました。

すべてのデーターが揃ってるわけではありませんが比較用にROCm2.1のデーターも示しておきます
test.png

Vega10&20でのROCm2.1 FP16での実行時の結果が以下の通りです(これも見づらくて申し訳ないです)

68747470733a2f2f71696974612d696d6167652d73746f72652e73332e616d617a6f6e6177732e636f6d2f302f3233323733322f62306338633234352d663737622d613265372d343165372d3066353336663862643863352e706e67.png

参考値ですがGTX1080TiのResnet50がimages/sec: 196.62なのでROCm2.3&RadeonⅦなら余裕でぶち抜けている感じになっているので
嬉しい限りです。

以下詳細データー

RadeonⅦ

InceptionV3

Step    Img/sec total_loss
1   images/sec: 122.7 +/- 0.0 (jitter = 0.0)    7.321
10  images/sec: 122.4 +/- 0.5 (jitter = 0.6)    7.308
20  images/sec: 122.4 +/- 0.3 (jitter = 0.5)    7.364
30  images/sec: 122.7 +/- 0.2 (jitter = 0.3)    7.307
40  images/sec: 122.8 +/- 0.2 (jitter = 0.2)    7.277
50  images/sec: 122.8 +/- 0.1 (jitter = 0.2)    7.235
60  images/sec: 122.9 +/- 0.1 (jitter = 0.3)    7.360
70  images/sec: 122.8 +/- 0.1 (jitter = 0.3)    7.308
80  images/sec: 122.9 +/- 0.1 (jitter = 0.3)    7.317
90  images/sec: 122.9 +/- 0.1 (jitter = 0.3)    7.340
100 images/sec: 122.9 +/- 0.1 (jitter = 0.3)    7.406
----------------------------------------------------------------
total images/sec: 122.81
----------------------------------------------------------------
InceptionV3 FP64適用
Step    Img/sec total_loss
1   images/sec: 152.1 +/- 0.0 (jitter = 0.0)    7.414
10  images/sec: 152.3 +/- 0.3 (jitter = 0.5)    7.220
20  images/sec: 152.1 +/- 0.2 (jitter = 0.6)    7.276
30  images/sec: 152.2 +/- 0.1 (jitter = 0.7)    7.347
40  images/sec: 151.9 +/- 0.2 (jitter = 0.7)    7.446
50  images/sec: 151.9 +/- 0.2 (jitter = 0.6)    7.227
60  images/sec: 152.0 +/- 0.2 (jitter = 0.7)    7.293
70  images/sec: 152.0 +/- 0.1 (jitter = 0.6)    7.286
80  images/sec: 152.0 +/- 0.1 (jitter = 0.6)    7.216
90  images/sec: 152.0 +/- 0.1 (jitter = 0.6)    7.404
100 images/sec: 152.1 +/- 0.1 (jitter = 0.6)    7.360
----------------------------------------------------------------
total images/sec: 151.98
----------------------------------------------------------------

Resnet50

python  ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32
Step    Img/sec total_loss
1   images/sec: 224.5 +/- 0.0 (jitter = 0.0)    8.169
10  images/sec: 225.9 +/- 1.3 (jitter = 3.8)    7.593
20  images/sec: 226.1 +/- 0.9 (jitter = 3.0)    7.696
30  images/sec: 226.9 +/- 0.7 (jitter = 1.5)    7.753
40  images/sec: 227.4 +/- 0.6 (jitter = 1.5)    8.007
50  images/sec: 227.9 +/- 0.5 (jitter = 1.4)    7.520
60  images/sec: 228.1 +/- 0.5 (jitter = 1.4)    7.990
70  images/sec: 227.9 +/- 0.4 (jitter = 1.3)    8.027
80  images/sec: 227.1 +/- 0.5 (jitter = 1.6)    7.931
90  images/sec: 227.1 +/- 0.4 (jitter = 1.4)    7.851
100 images/sec: 226.7 +/- 0.5 (jitter = 1.6)    7.795
----------------------------------------------------------------
total images/sec: 226.50
----------------------------------------------------------------

Resnet50 FP16適用

1   images/sec: 309.3 +/- 0.0 (jitter = 0.0)    7.813
10  images/sec: 308.4 +/- 0.4 (jitter = 0.9)    8.172
20  images/sec: 308.3 +/- 0.4 (jitter = 1.2)    7.805
30  images/sec: 308.5 +/- 0.3 (jitter = 1.2)    7.897
40  images/sec: 308.4 +/- 0.3 (jitter = 1.3)    8.042
50  images/sec: 307.5 +/- 0.6 (jitter = 1.6)    7.960
60  images/sec: 307.6 +/- 0.5 (jitter = 1.5)    7.726
70  images/sec: 307.9 +/- 0.5 (jitter = 1.4)    7.875
80  images/sec: 307.5 +/- 0.5 (jitter = 1.5)    7.825
90  images/sec: 307.5 +/- 0.5 (jitter = 1.6)    7.724
100 images/sec: 307.6 +/- 0.5 (jitter = 1.7)    8.145
----------------------------------------------------------------
total images/sec: 307.29
----------------------------------------------------------------

Resnet152

 python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32
Step    Img/sec total_loss
1   images/sec: 91.6 +/- 0.0 (jitter = 0.0) 8.999
10  images/sec: 91.2 +/- 0.1 (jitter = 0.2) 8.605
20  images/sec: 91.2 +/- 0.1 (jitter = 0.3) 8.592
30  images/sec: 90.9 +/- 0.1 (jitter = 0.4) 8.752
40  images/sec: 90.7 +/- 0.1 (jitter = 0.4) 8.607
50  images/sec: 90.7 +/- 0.1 (jitter = 0.4) 8.798
60  images/sec: 90.7 +/- 0.1 (jitter = 0.4) 8.670
70  images/sec: 90.6 +/- 0.1 (jitter = 0.4) 9.088
80  images/sec: 90.6 +/- 0.1 (jitter = 0.3) 8.885
90  images/sec: 90.7 +/- 0.1 (jitter = 0.3) 9.057
100 images/sec: 90.7 +/- 0.1 (jitter = 0.3) 8.767
----------------------------------------------------------------
total images/sec: 90.68
----------------------------------------------------------------

Resnet152 FP16適用

1   images/sec: 123.5 +/- 0.0 (jitter = 0.0)    9.183
10  images/sec: 122.1 +/- 0.6 (jitter = 0.8)    8.962
20  images/sec: 122.1 +/- 0.4 (jitter = 0.6)    8.808
30  images/sec: 121.7 +/- 0.4 (jitter = 0.8)    8.853
40  images/sec: 121.9 +/- 0.3 (jitter = 0.6)    9.003
50  images/sec: 122.1 +/- 0.2 (jitter = 0.5)    8.704
60  images/sec: 122.1 +/- 0.2 (jitter = 0.6)    8.862
70  images/sec: 122.1 +/- 0.2 (jitter = 0.5)    8.981
80  images/sec: 122.1 +/- 0.2 (jitter = 0.5)    8.838
90  images/sec: 122.2 +/- 0.2 (jitter = 0.5)    8.815
100 images/sec: 122.2 +/- 0.2 (jitter = 0.5)    8.645
----------------------------------------------------------------
total images/sec: 122.18
----------------------------------------------------------------

AlexNet

Step    Img/sec total_loss
1   images/sec: 530.6 +/- 0.0 (jitter = 0.0)    nan
10  images/sec: 530.0 +/- 1.9 (jitter = 4.0)    nan
20  images/sec: 530.4 +/- 1.9 (jitter = 4.5)    nan
30  images/sec: 529.1 +/- 1.3 (jitter = 2.2)    nan
40  images/sec: 529.6 +/- 1.2 (jitter = 2.4)    nan
50  images/sec: 528.8 +/- 1.1 (jitter = 2.3)    nan
60  images/sec: 529.0 +/- 1.0 (jitter = 2.0)    nan
70  images/sec: 529.3 +/- 0.9 (jitter = 2.6)    nan
80  images/sec: 529.8 +/- 0.8 (jitter = 3.3)    nan
90  images/sec: 530.1 +/- 0.7 (jitter = 4.3)    nan
100 images/sec: 530.3 +/- 0.7 (jitter = 4.7)    nan
----------------------------------------------------------------
total images/sec: 529.51
----------------------------------------------------------------

Alexnet FP64適用

Step    Img/sec total_loss
1   images/sec: 1215.9 +/- 0.0 (jitter = 0.0)   nan
10  images/sec: 1256.6 +/- 9.2 (jitter = 35.3)  nan
20  images/sec: 1264.8 +/- 7.3 (jitter = 39.2)  nan
30  images/sec: 1272.3 +/- 5.5 (jitter = 25.9)  nan
40  images/sec: 1276.2 +/- 4.4 (jitter = 24.6)  nan
50  images/sec: 1274.5 +/- 4.2 (jitter = 26.0)  nan
60  images/sec: 1275.0 +/- 3.6 (jitter = 25.2)  nan
70  images/sec: 1275.8 +/- 3.2 (jitter = 23.0)  nan
80  images/sec: 1275.0 +/- 3.0 (jitter = 23.2)  nan
90  images/sec: 1276.1 +/- 2.7 (jitter = 22.8)  nan
100 images/sec: 1275.9 +/- 2.6 (jitter = 22.7)  nan
----------------------------------------------------------------
total images/sec: 1270.20
----------------------------------------------------------------

なぜかこれだけROCm2.1比で特段高速化している印象がある

VGG16

Step    Img/sec total_loss
1   images/sec: 132.4 +/- 0.0 (jitter = 0.0)    7.296
10  images/sec: 132.6 +/- 0.1 (jitter = 0.4)    7.294
20  images/sec: 132.5 +/- 0.1 (jitter = 0.5)    7.294
30  images/sec: 132.4 +/- 0.1 (jitter = 0.7)    7.306
40  images/sec: 132.3 +/- 0.1 (jitter = 0.5)    7.231
50  images/sec: 132.2 +/- 0.1 (jitter = 0.5)    7.307
60  images/sec: 132.1 +/- 0.1 (jitter = 0.5)    7.281
70  images/sec: 132.0 +/- 0.1 (jitter = 0.6)    7.261
80  images/sec: 131.9 +/- 0.1 (jitter = 0.5)    7.291
90  images/sec: 131.9 +/- 0.1 (jitter = 0.5)    7.259
100 images/sec: 131.8 +/- 0.1 (jitter = 0.6)    7.273
----------------------------------------------------------------
total images/sec: 131.72
----------------------------------------------------------------
VGG16 FP16適用版
Step    Img/sec total_loss
1   images/sec: 192.9 +/- 0.0 (jitter = 0.0)    7.268
10  images/sec: 194.1 +/- 0.3 (jitter = 1.2)    7.284
20  images/sec: 194.3 +/- 0.2 (jitter = 1.0)    7.267
30  images/sec: 194.3 +/- 0.2 (jitter = 0.9)    7.282
40  images/sec: 194.1 +/- 0.1 (jitter = 0.9)    7.263
50  images/sec: 194.0 +/- 0.1 (jitter = 1.0)    7.291
60  images/sec: 193.9 +/- 0.1 (jitter = 1.0)    7.218
70  images/sec: 193.9 +/- 0.1 (jitter = 1.0)    7.246
80  images/sec: 193.9 +/- 0.1 (jitter = 1.0)    7.277
90  images/sec: 193.8 +/- 0.1 (jitter = 1.0)    7.245
100 images/sec: 193.7 +/- 0.1 (jitter = 1.1)    7.287
----------------------------------------------------------------
total images/sec: 193.55
----------------------------------------------------------------

VegaFE

InceptionV3

1   images/sec: 66.8 +/- 0.0 (jitter = 0.0) 7.310
10  images/sec: 66.5 +/- 0.2 (jitter = 0.4) 7.339
20  images/sec: 66.4 +/- 0.1 (jitter = 0.3) 7.340
30  images/sec: 66.5 +/- 0.1 (jitter = 0.2) 7.291
40  images/sec: 66.6 +/- 0.1 (jitter = 0.5) 7.274
50  images/sec: 66.7 +/- 0.1 (jitter = 0.7) 7.246
60  images/sec: 66.8 +/- 0.1 (jitter = 0.6) 7.311
70  images/sec: 66.9 +/- 0.1 (jitter = 0.5) 7.275
80  images/sec: 66.9 +/- 0.1 (jitter = 0.4) 7.308
90  images/sec: 66.8 +/- 0.1 (jitter = 0.4) 7.316
100 images/sec: 66.7 +/- 0.1 (jitter = 0.5) 7.372
----------------------------------------------------------------
total images/sec: 66.72
----------------------------------------------------------------```

InceptionV3 FP16

1   images/sec: 61.1 +/- 0.0 (jitter = 0.0) 7.406
10  images/sec: 61.0 +/- 0.1 (jitter = 0.3) 7.201
20  images/sec: 60.9 +/- 0.0 (jitter = 0.2) 7.289
30  images/sec: 60.9 +/- 0.0 (jitter = 0.2) 7.382
40  images/sec: 60.8 +/- 0.0 (jitter = 0.2) 7.443
50  images/sec: 60.8 +/- 0.0 (jitter = 0.2) 7.241
60  images/sec: 60.8 +/- 0.0 (jitter = 0.2) 7.320
70  images/sec: 60.7 +/- 0.0 (jitter = 0.2) 7.310
80  images/sec: 60.7 +/- 0.0 (jitter = 0.3) 7.262
90  images/sec: 60.7 +/- 0.0 (jitter = 0.3) 7.386
100 images/sec: 60.6 +/- 0.0 (jitter = 0.3) 7.374
----------------------------------------------------------------
total images/sec: 60.62
----------------------------------------------------------------

Resnet50

Step    Img/sec total_loss
1   images/sec: 126.7 +/- 0.0 (jitter = 0.0)    8.169
10  images/sec: 127.4 +/- 0.1 (jitter = 0.5)    7.593
20  images/sec: 127.4 +/- 0.2 (jitter = 0.5)    7.696
30  images/sec: 127.5 +/- 0.1 (jitter = 0.4)    7.753
40  images/sec: 127.5 +/- 0.1 (jitter = 0.5)    8.006
50  images/sec: 127.4 +/- 0.1 (jitter = 0.6)    7.520
60  images/sec: 127.5 +/- 0.1 (jitter = 0.6)    7.989
70  images/sec: 127.5 +/- 0.1 (jitter = 0.6)    8.028
80  images/sec: 127.5 +/- 0.1 (jitter = 0.6)    7.931
90  images/sec: 127.4 +/- 0.1 (jitter = 0.6)    7.850
100 images/sec: 127.4 +/- 0.1 (jitter = 0.6)    7.796
----------------------------------------------------------------
total images/sec: 127.32
----------------------------------------------------------------

Resnet50 FP16適用
Step    Img/sec total_loss
1   images/sec: 160.0 +/- 0.0 (jitter = 0.0)    7.823
10  images/sec: 161.0 +/- 0.2 (jitter = 1.0)    8.187
20  images/sec: 161.2 +/- 0.2 (jitter = 0.7)    7.801
30  images/sec: 161.0 +/- 0.1 (jitter = 0.8)    7.880
40  images/sec: 160.7 +/- 0.2 (jitter = 1.0)    8.048
50  images/sec: 160.5 +/- 0.2 (jitter = 1.1)    7.949
60  images/sec: 160.4 +/- 0.1 (jitter = 1.1)    7.727
70  images/sec: 160.3 +/- 0.1 (jitter = 1.1)    7.871
80  images/sec: 160.3 +/- 0.1 (jitter = 1.0)    7.826
90  images/sec: 160.2 +/- 0.1 (jitter = 1.0)    7.739
100 images/sec: 160.1 +/- 0.1 (jitter = 1.1)    8.152
----------------------------------------------------------------
total images/sec: 159.98
----------------------------------------------------------------

Resnet152

Step    Img/sec total_loss
1   images/sec: 60.4 +/- 0.0 (jitter = 0.0) 9.008
10  images/sec: 60.7 +/- 0.1 (jitter = 0.4) 8.577
20  images/sec: 60.8 +/- 0.1 (jitter = 0.3) 8.620
30  images/sec: 60.7 +/- 0.1 (jitter = 0.4) 8.702
40  images/sec: 60.1 +/- 0.2 (jitter = 0.6) 8.624
50  images/sec: 59.7 +/- 0.2 (jitter = 1.3) 8.804
60  images/sec: 59.3 +/- 0.2 (jitter = 2.1) 8.658
70  images/sec: 59.1 +/- 0.2 (jitter = 1.4) 9.081
80  images/sec: 58.9 +/- 0.2 (jitter = 1.1) 8.851
90  images/sec: 58.5 +/- 0.2 (jitter = 1.5) 9.021
100 images/sec: 58.2 +/- 0.2 (jitter = 2.2) 8.841
----------------------------------------------------------------
total images/sec: 58.21
----------------------------------------------------------------
Resnet152 FP16適用
Step    Img/sec total_loss
1   images/sec: 59.9 +/- 0.0 (jitter = 0.0) 9.191
10  images/sec: 60.2 +/- 0.2 (jitter = 0.4) 8.961
20  images/sec: 60.2 +/- 0.1 (jitter = 0.4) 8.863
30  images/sec: 60.2 +/- 0.1 (jitter = 0.4) 8.854
40  images/sec: 60.2 +/- 0.1 (jitter = 0.4) 8.994
50  images/sec: 60.2 +/- 0.1 (jitter = 0.4) 8.675
60  images/sec: 60.0 +/- 0.1 (jitter = 0.5) 8.864
70  images/sec: 59.7 +/- 0.1 (jitter = 0.6) 8.970
80  images/sec: 59.5 +/- 0.1 (jitter = 0.9) 8.899
90  images/sec: 59.2 +/- 0.1 (jitter = 1.2) 8.829
100 images/sec: 59.0 +/- 0.1 (jitter = 1.5) 8.709
----------------------------------------------------------------
total images/sec: 59.01
----------------------------------------------------------------

AlexNet

Step    Img/sec total_loss
1   images/sec: 487.2 +/- 0.0 (jitter = 0.0)    nan
10  images/sec: 483.0 +/- 1.1 (jitter = 2.5)    nan
20  images/sec: 481.7 +/- 0.8 (jitter = 4.5)    nan
30  images/sec: 480.6 +/- 0.6 (jitter = 2.8)    nan
40  images/sec: 481.5 +/- 0.6 (jitter = 5.2)    nan
50  images/sec: 481.8 +/- 0.5 (jitter = 4.3)    nan
60  images/sec: 482.2 +/- 0.5 (jitter = 2.5)    nan
70  images/sec: 482.3 +/- 0.4 (jitter = 2.3)    nan
80  images/sec: 482.5 +/- 0.4 (jitter = 2.3)    nan
90  images/sec: 482.6 +/- 0.4 (jitter = 2.1)    nan
100 images/sec: 482.6 +/- 0.3 (jitter = 2.2)    nan
----------------------------------------------------------------
total images/sec: 481.87
----------------------------------------------------------------
Alexnet FP16適用
Step    Img/sec total_loss
1   images/sec: 650.5 +/- 0.0 (jitter = 0.0)    nan
10  images/sec: 665.2 +/- 2.7 (jitter = 8.1)    nan
20  images/sec: 668.7 +/- 1.6 (jitter = 3.8)    nan
30  images/sec: 669.6 +/- 1.2 (jitter = 4.6)    nan
40  images/sec: 670.1 +/- 1.0 (jitter = 4.1)    nan
50  images/sec: 669.1 +/- 0.9 (jitter = 4.7)    nan
60  images/sec: 668.7 +/- 0.9 (jitter = 4.6)    nan
70  images/sec: 668.3 +/- 0.8 (jitter = 5.4)    nan
80  images/sec: 668.4 +/- 0.7 (jitter = 5.7)    nan
90  images/sec: 668.4 +/- 0.7 (jitter = 5.5)    nan
100 images/sec: 668.5 +/- 0.6 (jitter = 5.4)    nan
----------------------------------------------------------------
total images/sec: 667.06
----------------------------------------------------------------

VGG16

1   images/sec: 88.4 +/- 0.0 (jitter = 0.0) 7.342
10  images/sec: 88.3 +/- 0.2 (jitter = 0.4) 7.287
20  images/sec: 88.2 +/- 0.1 (jitter = 0.4) 7.265
30  images/sec: 88.2 +/- 0.1 (jitter = 0.5) 7.304
40  images/sec: 87.7 +/- 0.2 (jitter = 0.8) 7.266
50  images/sec: 87.0 +/- 0.2 (jitter = 1.2) 7.317
60  images/sec: 86.5 +/- 0.2 (jitter = 2.3) 7.262
70  images/sec: 86.3 +/- 0.2 (jitter = 2.6) 7.261
80  images/sec: 86.0 +/- 0.2 (jitter = 2.8) 7.273
90  images/sec: 85.4 +/- 0.3 (jitter = 2.9) 7.265
100 images/sec: 84.9 +/- 0.3 (jitter = 3.3) 7.287
----------------------------------------------------------------
total images/sec: 84.87
----------------------------------------------------------------
VGG16 FP16適用
1   images/sec: 59.3 +/- 0.0 (jitter = 0.0) 7.283
10  images/sec: 59.3 +/- 0.1 (jitter = 0.1) 7.300
20  images/sec: 59.3 +/- 0.0 (jitter = 0.1) 7.249
30  images/sec: 59.3 +/- 0.0 (jitter = 0.2) 7.272
40  images/sec: 59.1 +/- 0.1 (jitter = 0.2) 7.280
50  images/sec: 58.9 +/- 0.1 (jitter = 0.3) 7.295
60  images/sec: 58.6 +/- 0.1 (jitter = 0.6) 7.206
70  images/sec: 58.4 +/- 0.1 (jitter = 0.8) 7.255
80  images/sec: 58.1 +/- 0.1 (jitter = 1.3) 7.291
90  images/sec: 57.9 +/- 0.1 (jitter = 1.8) 7.257
100 images/sec: 57.6 +/- 0.1 (jitter = 2.1) 7.296
----------------------------------------------------------------
total images/sec: 57.62
----------------------------------------------------------------

RX570 16GB

RX570での測定においてはFP16では挙動が安定しなかったり逆に速度が遅くなったりと散々なので参考までにしてください

InceptionV3

1   images/sec: 48.0 +/- 0.0 (jitter = 0.0) 7.278
10  images/sec: 48.0 +/- 0.0 (jitter = 0.0) 7.330
20  images/sec: 48.0 +/- 0.0 (jitter = 0.0) 7.356
30  images/sec: 48.0 +/- 0.0 (jitter = 0.0) 7.296
40  images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.294
50  images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.322
60  images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.343
70  images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.265
80  images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.291
90  images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.343
100 images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.339
----------------------------------------------------------------
total images/sec: 47.90
----------------------------------------------------------------
InceptionV3 FP16適用
1   images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.380
10  images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.210
20  images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.247
30  images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.356
40  images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.423
50  images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.271
60  images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.324
70  images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.309
80  images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.277
90  images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.399
100 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.382
----------------------------------------------------------------
total images/sec: 27.24
----------------------------------------------------------------

Resnet50

1   images/sec: 71.1 +/- 0.0 (jitter = 0.0) 8.169
10  images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.593
20  images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.696
30  images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.753
40  images/sec: 71.1 +/- 0.0 (jitter = 0.1) 8.007
50  images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.520
60  images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.989
70  images/sec: 71.1 +/- 0.0 (jitter = 0.1) 8.028
80  images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.931
90  images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.851
100 images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.798
----------------------------------------------------------------
total images/sec: 71.10
----------------------------------------------------------------
Resnet50 FP16

fp16では途中で落ちてしまった為測定できなかったです

InternalError (see above for traceback): cuDNN launch failure : input shape ([32,128,28,28])
     [[node tower_0/v/cg/resnet_v13/conv12/batchnorm12/FusedBatchNormV2 (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:473) ]]
     [[node average_loss/Mean (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2907) ]]

Resnet152

1   images/sec: 27.2 +/- 0.0 (jitter = 0.0) 9.031
10  images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.569
20  images/sec: 27.2 +/- 0.0 (jitter = 0.0) 8.583
30  images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.728
40  images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.636
50  images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.795
60  images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.688
70  images/sec: 27.3 +/- 0.0 (jitter = 0.0) 9.031
80  images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.847
90  images/sec: 27.3 +/- 0.0 (jitter = 0.0) 9.035
100 images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.852
----------------------------------------------------------------
total images/sec: 27.26
----------------------------------------------------------------
Resnet152 FP16適用

こちらもベンチマーク完走できず

InternalError (see above for traceback): cuDNN launch failure : input shape ([32,512,28,28])
     [[node tower_0/v/cg/resnet_v13/conv11/batchnorm11/FusedBatchNormV2 (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:473) ]]
     [[node average_loss/Mean (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2907) ]]

AlexNet

1   images/sec: 354.2 +/- 0.0 (jitter = 0.0)    nan
10  images/sec: 354.4 +/- 0.3 (jitter = 0.8)    nan
20  images/sec: 354.3 +/- 0.2 (jitter = 0.6)    nan
30  images/sec: 354.2 +/- 0.2 (jitter = 0.6)    nan
40  images/sec: 354.2 +/- 0.1 (jitter = 0.4)    nan
50  images/sec: 354.1 +/- 0.1 (jitter = 0.5)    nan
60  images/sec: 354.1 +/- 0.1 (jitter = 0.5)    nan
70  images/sec: 354.1 +/- 0.1 (jitter = 0.5)    nan
80  images/sec: 354.1 +/- 0.1 (jitter = 0.5)    nan
90  images/sec: 354.1 +/- 0.1 (jitter = 0.5)    nan
100 images/sec: 354.1 +/- 0.1 (jitter = 0.5)    nan
----------------------------------------------------------------
total images/sec: 353.77
----------------------------------------------------------------
AlexNet FP16適用
Step    Img/sec total_loss
1   images/sec: 276.6 +/- 0.0 (jitter = 0.0)    nan
10  images/sec: 276.8 +/- 0.1 (jitter = 0.5)    nan
20  images/sec: 276.9 +/- 0.1 (jitter = 0.3)    nan
30  images/sec: 276.9 +/- 0.1 (jitter = 0.4)    nan
40  images/sec: 277.0 +/- 0.1 (jitter = 0.4)    nan
50  images/sec: 277.0 +/- 0.1 (jitter = 0.3)    nan
60  images/sec: 276.9 +/- 0.0 (jitter = 0.3)    nan
70  images/sec: 276.9 +/- 0.0 (jitter = 0.3)    nan
80  images/sec: 276.9 +/- 0.0 (jitter = 0.3)    nan
90  images/sec: 276.9 +/- 0.0 (jitter = 0.3)    nan
100 images/sec: 276.9 +/- 0.0 (jitter = 0.3)    nan
----------------------------------------------------------------
total images/sec: 276.70
----------------------------------------------------------------

VGG16

FP32で動かしたところコアダンプでベンチマーク完走できず

VGG16 FP16適用
Step    Img/sec total_loss
1   images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.262
10  images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.263
20  images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.281
30  images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.284
40  images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.266
50  images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.289
60  images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.219
70  images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.258
80  images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.305
90  images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.248
100 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.296
----------------------------------------------------------------
total images/sec: 20.60
----------------------------------------------------------------

ひとまずこれで全部です

2
1
2

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
1