https://qiita.com/_JG1WWK/items/1db6504c77894c0f08aa#rx470-16gb-4
AMD_GPUでTensorFlow benchmarksを行い深層学習性能のおおよその性能を検証する(仮)
https://qiita.com/_JG1WWK/items/6bae45d55d9421e24e4a
RX570 16GB版のTensorflow-BenchMarks(Tensorflow-ROCm)結果
この記事のROCm2.3版です
2.3では機械学習性能が向上したという報告がありましたのでresnet50を中心に検証していきたいと思います。
ソフトウェア環境は以下の通り
ROCm-version
$ apt show rocm-libs -a
Package: rocm-libs
Version: 2.3.14
Priority: optional
Section: devel
Maintainer: Advanced Micro Devices Inc.
I nstalled-Size: 13.3 kB
Depends: rocfft, rocrand, hipblas, rocblas
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 766 B
APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack
MIOpen-version
$ apt show miopen-hip -a
Package: miopen-hip
Version: 1.8.0-492700c
Priority: optional
Section: devel
Maintainer: Paul Fultz II <paul.fultz@amd.com>
Installed-Size: 95.3 MB
Depends: rocm-opencl-dev, rocm-utils, hip_hcc, miopengemm
Download-Size: 5,312 kB
APT-Manual-Installed: yes
APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages
Description: AMD's DNN Library
OS構成
$ uname -a
Linux rocm2 4.15.0-47-generic #50~16.04.1-Ubuntu SMP Fri Mar 15 16:06:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Tensorflow-benchmarksのダウンロード
$ git clone https://github.com/tensorflow/benchmarks.git -b cnn_tf_v1.13_compatible
Tensorflow-rocmの環境構築
condaのversionは以下の通りです(再現する場合はminicondaを入れてください https://qiita.com/_JG1WWK/items/1817b6488526778aa8f2)
$ conda -V
conda 4.5.12
ひとまずこんな感じでTensorflow-rocm1.13環境を立ち上げます
$ conda create -n tensorflowtest python=3.5
$ conda activate tensorflowtest
$ pip install tensorflow-rocm==1.13.1
piplist
Package Version
-------------------- ---------
absl-py 0.7.1
astor 0.7.1
certifi 2018.8.24
gast 0.2.2
grpcio 1.20.0
h5py 2.9.0
Keras-Applications 1.0.7
Keras-Preprocessing 1.0.9
Markdown 3.1
mock 2.0.0
numpy 1.16.2
pbr 5.1.3
pip 10.0.1
protobuf 3.7.1
setuptools 40.2.0
six 1.12.0
tensorboard 1.13.1
tensorflow-estimator 1.13.0
tensorflow-rocm 1.13.2
termcolor 1.1.0
Werkzeug 0.15.2
wheel 0.31.1
テストするGPUはRadeonⅦ、VegaFE,RX570です
#ハードウェア環境
CPU Xeon E5-2603 v4
MB msi-x99 Gaming7
RAM DDR4-2400 32GB
GPU0 NVIDIA GTX1080Ti (グラフィック表示用兼CUDA用)
GPU1 AMD Vega Frontier Edition
OS Ubuntu16.04.6 LST kernel version 4.15
#実行コマンド系
###InceptionV3
python ./tf_cnn_benchmarks.py --num_gpus=1 --model inception3 --batch_size 32
####inceptionV3 FP16適用
python ./tf_cnn_benchmarks.py --num_gpus=1 --model inception3 --batch_size 32 --use_fp16
###Resnet50
python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32
#####Resnet50 Fp16適用
TF_ROCM_FUSION_ENABLE=1 python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16
###Resnet152
python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32
#####Resnet152 FP16適用
TF_ROCM_FUSION_ENABLE=1 python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32 --use_fp16
###ALexnet
python ./tf_cnn_benchmarks.py --num_gpus=1 --model alexnet --batch_size 32
#####Alexnet FP16適用
TF_ROCM_FUSION_ENABLE=1 python ./tf_cnn_benchmarks.py --num_gpus=1 --model alexnet --batch_size 32 --use_fp16
###VGG16
python ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16 --batch_size 32
####VGG16 FP16適用
python ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16 --batch_size 32 --use_fp16
#ベンチマーク
#実行結果のまとめ
先に簡単なグラフですがまとめておきます。
WPS Officeが手元になくあまり見やすいデーターではなくて申し訳ありません。
RX570 16GBのFP16実行は非常に不安定で正直使い物になる感じじゃなかったので参考データー扱いでおねがいします。
またVGG16では原因はわかりませんがFP32でのみRX570は完走できないなどgfx803での挙動が正直不安定な感じしました。
すべてのデーターが揃ってるわけではありませんが比較用にROCm2.1のデーターも示しておきます
Vega10&20でのROCm2.1 FP16での実行時の結果が以下の通りです(これも見づらくて申し訳ないです)
参考値ですがGTX1080TiのResnet50がimages/sec: 196.62なのでROCm2.3&RadeonⅦなら余裕でぶち抜けている感じになっているので
嬉しい限りです。
#以下詳細データー
##RadeonⅦ
###InceptionV3
Step Img/sec total_loss
1 images/sec: 122.7 +/- 0.0 (jitter = 0.0) 7.321
10 images/sec: 122.4 +/- 0.5 (jitter = 0.6) 7.308
20 images/sec: 122.4 +/- 0.3 (jitter = 0.5) 7.364
30 images/sec: 122.7 +/- 0.2 (jitter = 0.3) 7.307
40 images/sec: 122.8 +/- 0.2 (jitter = 0.2) 7.277
50 images/sec: 122.8 +/- 0.1 (jitter = 0.2) 7.235
60 images/sec: 122.9 +/- 0.1 (jitter = 0.3) 7.360
70 images/sec: 122.8 +/- 0.1 (jitter = 0.3) 7.308
80 images/sec: 122.9 +/- 0.1 (jitter = 0.3) 7.317
90 images/sec: 122.9 +/- 0.1 (jitter = 0.3) 7.340
100 images/sec: 122.9 +/- 0.1 (jitter = 0.3) 7.406
----------------------------------------------------------------
total images/sec: 122.81
----------------------------------------------------------------
#####InceptionV3 FP64適用
Step Img/sec total_loss
1 images/sec: 152.1 +/- 0.0 (jitter = 0.0) 7.414
10 images/sec: 152.3 +/- 0.3 (jitter = 0.5) 7.220
20 images/sec: 152.1 +/- 0.2 (jitter = 0.6) 7.276
30 images/sec: 152.2 +/- 0.1 (jitter = 0.7) 7.347
40 images/sec: 151.9 +/- 0.2 (jitter = 0.7) 7.446
50 images/sec: 151.9 +/- 0.2 (jitter = 0.6) 7.227
60 images/sec: 152.0 +/- 0.2 (jitter = 0.7) 7.293
70 images/sec: 152.0 +/- 0.1 (jitter = 0.6) 7.286
80 images/sec: 152.0 +/- 0.1 (jitter = 0.6) 7.216
90 images/sec: 152.0 +/- 0.1 (jitter = 0.6) 7.404
100 images/sec: 152.1 +/- 0.1 (jitter = 0.6) 7.360
----------------------------------------------------------------
total images/sec: 151.98
----------------------------------------------------------------
###Resnet50
python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32
Step Img/sec total_loss
1 images/sec: 224.5 +/- 0.0 (jitter = 0.0) 8.169
10 images/sec: 225.9 +/- 1.3 (jitter = 3.8) 7.593
20 images/sec: 226.1 +/- 0.9 (jitter = 3.0) 7.696
30 images/sec: 226.9 +/- 0.7 (jitter = 1.5) 7.753
40 images/sec: 227.4 +/- 0.6 (jitter = 1.5) 8.007
50 images/sec: 227.9 +/- 0.5 (jitter = 1.4) 7.520
60 images/sec: 228.1 +/- 0.5 (jitter = 1.4) 7.990
70 images/sec: 227.9 +/- 0.4 (jitter = 1.3) 8.027
80 images/sec: 227.1 +/- 0.5 (jitter = 1.6) 7.931
90 images/sec: 227.1 +/- 0.4 (jitter = 1.4) 7.851
100 images/sec: 226.7 +/- 0.5 (jitter = 1.6) 7.795
----------------------------------------------------------------
total images/sec: 226.50
----------------------------------------------------------------
####Resnet50 FP16適用
1 images/sec: 309.3 +/- 0.0 (jitter = 0.0) 7.813
10 images/sec: 308.4 +/- 0.4 (jitter = 0.9) 8.172
20 images/sec: 308.3 +/- 0.4 (jitter = 1.2) 7.805
30 images/sec: 308.5 +/- 0.3 (jitter = 1.2) 7.897
40 images/sec: 308.4 +/- 0.3 (jitter = 1.3) 8.042
50 images/sec: 307.5 +/- 0.6 (jitter = 1.6) 7.960
60 images/sec: 307.6 +/- 0.5 (jitter = 1.5) 7.726
70 images/sec: 307.9 +/- 0.5 (jitter = 1.4) 7.875
80 images/sec: 307.5 +/- 0.5 (jitter = 1.5) 7.825
90 images/sec: 307.5 +/- 0.5 (jitter = 1.6) 7.724
100 images/sec: 307.6 +/- 0.5 (jitter = 1.7) 8.145
----------------------------------------------------------------
total images/sec: 307.29
----------------------------------------------------------------
###Resnet152
python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32
Step Img/sec total_loss
1 images/sec: 91.6 +/- 0.0 (jitter = 0.0) 8.999
10 images/sec: 91.2 +/- 0.1 (jitter = 0.2) 8.605
20 images/sec: 91.2 +/- 0.1 (jitter = 0.3) 8.592
30 images/sec: 90.9 +/- 0.1 (jitter = 0.4) 8.752
40 images/sec: 90.7 +/- 0.1 (jitter = 0.4) 8.607
50 images/sec: 90.7 +/- 0.1 (jitter = 0.4) 8.798
60 images/sec: 90.7 +/- 0.1 (jitter = 0.4) 8.670
70 images/sec: 90.6 +/- 0.1 (jitter = 0.4) 9.088
80 images/sec: 90.6 +/- 0.1 (jitter = 0.3) 8.885
90 images/sec: 90.7 +/- 0.1 (jitter = 0.3) 9.057
100 images/sec: 90.7 +/- 0.1 (jitter = 0.3) 8.767
----------------------------------------------------------------
total images/sec: 90.68
----------------------------------------------------------------
####Resnet152 FP16適用
1 images/sec: 123.5 +/- 0.0 (jitter = 0.0) 9.183
10 images/sec: 122.1 +/- 0.6 (jitter = 0.8) 8.962
20 images/sec: 122.1 +/- 0.4 (jitter = 0.6) 8.808
30 images/sec: 121.7 +/- 0.4 (jitter = 0.8) 8.853
40 images/sec: 121.9 +/- 0.3 (jitter = 0.6) 9.003
50 images/sec: 122.1 +/- 0.2 (jitter = 0.5) 8.704
60 images/sec: 122.1 +/- 0.2 (jitter = 0.6) 8.862
70 images/sec: 122.1 +/- 0.2 (jitter = 0.5) 8.981
80 images/sec: 122.1 +/- 0.2 (jitter = 0.5) 8.838
90 images/sec: 122.2 +/- 0.2 (jitter = 0.5) 8.815
100 images/sec: 122.2 +/- 0.2 (jitter = 0.5) 8.645
----------------------------------------------------------------
total images/sec: 122.18
----------------------------------------------------------------
###AlexNet
Step Img/sec total_loss
1 images/sec: 530.6 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 530.0 +/- 1.9 (jitter = 4.0) nan
20 images/sec: 530.4 +/- 1.9 (jitter = 4.5) nan
30 images/sec: 529.1 +/- 1.3 (jitter = 2.2) nan
40 images/sec: 529.6 +/- 1.2 (jitter = 2.4) nan
50 images/sec: 528.8 +/- 1.1 (jitter = 2.3) nan
60 images/sec: 529.0 +/- 1.0 (jitter = 2.0) nan
70 images/sec: 529.3 +/- 0.9 (jitter = 2.6) nan
80 images/sec: 529.8 +/- 0.8 (jitter = 3.3) nan
90 images/sec: 530.1 +/- 0.7 (jitter = 4.3) nan
100 images/sec: 530.3 +/- 0.7 (jitter = 4.7) nan
----------------------------------------------------------------
total images/sec: 529.51
----------------------------------------------------------------
####Alexnet FP64適用
Step Img/sec total_loss
1 images/sec: 1215.9 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 1256.6 +/- 9.2 (jitter = 35.3) nan
20 images/sec: 1264.8 +/- 7.3 (jitter = 39.2) nan
30 images/sec: 1272.3 +/- 5.5 (jitter = 25.9) nan
40 images/sec: 1276.2 +/- 4.4 (jitter = 24.6) nan
50 images/sec: 1274.5 +/- 4.2 (jitter = 26.0) nan
60 images/sec: 1275.0 +/- 3.6 (jitter = 25.2) nan
70 images/sec: 1275.8 +/- 3.2 (jitter = 23.0) nan
80 images/sec: 1275.0 +/- 3.0 (jitter = 23.2) nan
90 images/sec: 1276.1 +/- 2.7 (jitter = 22.8) nan
100 images/sec: 1275.9 +/- 2.6 (jitter = 22.7) nan
----------------------------------------------------------------
total images/sec: 1270.20
----------------------------------------------------------------
なぜかこれだけROCm2.1比で特段高速化している印象がある
###VGG16
Step Img/sec total_loss
1 images/sec: 132.4 +/- 0.0 (jitter = 0.0) 7.296
10 images/sec: 132.6 +/- 0.1 (jitter = 0.4) 7.294
20 images/sec: 132.5 +/- 0.1 (jitter = 0.5) 7.294
30 images/sec: 132.4 +/- 0.1 (jitter = 0.7) 7.306
40 images/sec: 132.3 +/- 0.1 (jitter = 0.5) 7.231
50 images/sec: 132.2 +/- 0.1 (jitter = 0.5) 7.307
60 images/sec: 132.1 +/- 0.1 (jitter = 0.5) 7.281
70 images/sec: 132.0 +/- 0.1 (jitter = 0.6) 7.261
80 images/sec: 131.9 +/- 0.1 (jitter = 0.5) 7.291
90 images/sec: 131.9 +/- 0.1 (jitter = 0.5) 7.259
100 images/sec: 131.8 +/- 0.1 (jitter = 0.6) 7.273
----------------------------------------------------------------
total images/sec: 131.72
----------------------------------------------------------------
#####VGG16 FP16適用版
Step Img/sec total_loss
1 images/sec: 192.9 +/- 0.0 (jitter = 0.0) 7.268
10 images/sec: 194.1 +/- 0.3 (jitter = 1.2) 7.284
20 images/sec: 194.3 +/- 0.2 (jitter = 1.0) 7.267
30 images/sec: 194.3 +/- 0.2 (jitter = 0.9) 7.282
40 images/sec: 194.1 +/- 0.1 (jitter = 0.9) 7.263
50 images/sec: 194.0 +/- 0.1 (jitter = 1.0) 7.291
60 images/sec: 193.9 +/- 0.1 (jitter = 1.0) 7.218
70 images/sec: 193.9 +/- 0.1 (jitter = 1.0) 7.246
80 images/sec: 193.9 +/- 0.1 (jitter = 1.0) 7.277
90 images/sec: 193.8 +/- 0.1 (jitter = 1.0) 7.245
100 images/sec: 193.7 +/- 0.1 (jitter = 1.1) 7.287
----------------------------------------------------------------
total images/sec: 193.55
----------------------------------------------------------------
##VegaFE
###InceptionV3
1 images/sec: 66.8 +/- 0.0 (jitter = 0.0) 7.310
10 images/sec: 66.5 +/- 0.2 (jitter = 0.4) 7.339
20 images/sec: 66.4 +/- 0.1 (jitter = 0.3) 7.340
30 images/sec: 66.5 +/- 0.1 (jitter = 0.2) 7.291
40 images/sec: 66.6 +/- 0.1 (jitter = 0.5) 7.274
50 images/sec: 66.7 +/- 0.1 (jitter = 0.7) 7.246
60 images/sec: 66.8 +/- 0.1 (jitter = 0.6) 7.311
70 images/sec: 66.9 +/- 0.1 (jitter = 0.5) 7.275
80 images/sec: 66.9 +/- 0.1 (jitter = 0.4) 7.308
90 images/sec: 66.8 +/- 0.1 (jitter = 0.4) 7.316
100 images/sec: 66.7 +/- 0.1 (jitter = 0.5) 7.372
----------------------------------------------------------------
total images/sec: 66.72
----------------------------------------------------------------```
####InceptionV3 FP16
1 images/sec: 61.1 +/- 0.0 (jitter = 0.0) 7.406
10 images/sec: 61.0 +/- 0.1 (jitter = 0.3) 7.201
20 images/sec: 60.9 +/- 0.0 (jitter = 0.2) 7.289
30 images/sec: 60.9 +/- 0.0 (jitter = 0.2) 7.382
40 images/sec: 60.8 +/- 0.0 (jitter = 0.2) 7.443
50 images/sec: 60.8 +/- 0.0 (jitter = 0.2) 7.241
60 images/sec: 60.8 +/- 0.0 (jitter = 0.2) 7.320
70 images/sec: 60.7 +/- 0.0 (jitter = 0.2) 7.310
80 images/sec: 60.7 +/- 0.0 (jitter = 0.3) 7.262
90 images/sec: 60.7 +/- 0.0 (jitter = 0.3) 7.386
100 images/sec: 60.6 +/- 0.0 (jitter = 0.3) 7.374
----------------------------------------------------------------
total images/sec: 60.62
----------------------------------------------------------------
###Resnet50
Step Img/sec total_loss
1 images/sec: 126.7 +/- 0.0 (jitter = 0.0) 8.169
10 images/sec: 127.4 +/- 0.1 (jitter = 0.5) 7.593
20 images/sec: 127.4 +/- 0.2 (jitter = 0.5) 7.696
30 images/sec: 127.5 +/- 0.1 (jitter = 0.4) 7.753
40 images/sec: 127.5 +/- 0.1 (jitter = 0.5) 8.006
50 images/sec: 127.4 +/- 0.1 (jitter = 0.6) 7.520
60 images/sec: 127.5 +/- 0.1 (jitter = 0.6) 7.989
70 images/sec: 127.5 +/- 0.1 (jitter = 0.6) 8.028
80 images/sec: 127.5 +/- 0.1 (jitter = 0.6) 7.931
90 images/sec: 127.4 +/- 0.1 (jitter = 0.6) 7.850
100 images/sec: 127.4 +/- 0.1 (jitter = 0.6) 7.796
----------------------------------------------------------------
total images/sec: 127.32
----------------------------------------------------------------
#####Resnet50 FP16適用
Step Img/sec total_loss
1 images/sec: 160.0 +/- 0.0 (jitter = 0.0) 7.823
10 images/sec: 161.0 +/- 0.2 (jitter = 1.0) 8.187
20 images/sec: 161.2 +/- 0.2 (jitter = 0.7) 7.801
30 images/sec: 161.0 +/- 0.1 (jitter = 0.8) 7.880
40 images/sec: 160.7 +/- 0.2 (jitter = 1.0) 8.048
50 images/sec: 160.5 +/- 0.2 (jitter = 1.1) 7.949
60 images/sec: 160.4 +/- 0.1 (jitter = 1.1) 7.727
70 images/sec: 160.3 +/- 0.1 (jitter = 1.1) 7.871
80 images/sec: 160.3 +/- 0.1 (jitter = 1.0) 7.826
90 images/sec: 160.2 +/- 0.1 (jitter = 1.0) 7.739
100 images/sec: 160.1 +/- 0.1 (jitter = 1.1) 8.152
----------------------------------------------------------------
total images/sec: 159.98
----------------------------------------------------------------
###Resnet152
Step Img/sec total_loss
1 images/sec: 60.4 +/- 0.0 (jitter = 0.0) 9.008
10 images/sec: 60.7 +/- 0.1 (jitter = 0.4) 8.577
20 images/sec: 60.8 +/- 0.1 (jitter = 0.3) 8.620
30 images/sec: 60.7 +/- 0.1 (jitter = 0.4) 8.702
40 images/sec: 60.1 +/- 0.2 (jitter = 0.6) 8.624
50 images/sec: 59.7 +/- 0.2 (jitter = 1.3) 8.804
60 images/sec: 59.3 +/- 0.2 (jitter = 2.1) 8.658
70 images/sec: 59.1 +/- 0.2 (jitter = 1.4) 9.081
80 images/sec: 58.9 +/- 0.2 (jitter = 1.1) 8.851
90 images/sec: 58.5 +/- 0.2 (jitter = 1.5) 9.021
100 images/sec: 58.2 +/- 0.2 (jitter = 2.2) 8.841
----------------------------------------------------------------
total images/sec: 58.21
----------------------------------------------------------------
#####Resnet152 FP16適用
Step Img/sec total_loss
1 images/sec: 59.9 +/- 0.0 (jitter = 0.0) 9.191
10 images/sec: 60.2 +/- 0.2 (jitter = 0.4) 8.961
20 images/sec: 60.2 +/- 0.1 (jitter = 0.4) 8.863
30 images/sec: 60.2 +/- 0.1 (jitter = 0.4) 8.854
40 images/sec: 60.2 +/- 0.1 (jitter = 0.4) 8.994
50 images/sec: 60.2 +/- 0.1 (jitter = 0.4) 8.675
60 images/sec: 60.0 +/- 0.1 (jitter = 0.5) 8.864
70 images/sec: 59.7 +/- 0.1 (jitter = 0.6) 8.970
80 images/sec: 59.5 +/- 0.1 (jitter = 0.9) 8.899
90 images/sec: 59.2 +/- 0.1 (jitter = 1.2) 8.829
100 images/sec: 59.0 +/- 0.1 (jitter = 1.5) 8.709
----------------------------------------------------------------
total images/sec: 59.01
----------------------------------------------------------------
###AlexNet
Step Img/sec total_loss
1 images/sec: 487.2 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 483.0 +/- 1.1 (jitter = 2.5) nan
20 images/sec: 481.7 +/- 0.8 (jitter = 4.5) nan
30 images/sec: 480.6 +/- 0.6 (jitter = 2.8) nan
40 images/sec: 481.5 +/- 0.6 (jitter = 5.2) nan
50 images/sec: 481.8 +/- 0.5 (jitter = 4.3) nan
60 images/sec: 482.2 +/- 0.5 (jitter = 2.5) nan
70 images/sec: 482.3 +/- 0.4 (jitter = 2.3) nan
80 images/sec: 482.5 +/- 0.4 (jitter = 2.3) nan
90 images/sec: 482.6 +/- 0.4 (jitter = 2.1) nan
100 images/sec: 482.6 +/- 0.3 (jitter = 2.2) nan
----------------------------------------------------------------
total images/sec: 481.87
----------------------------------------------------------------
######Alexnet FP16適用
Step Img/sec total_loss
1 images/sec: 650.5 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 665.2 +/- 2.7 (jitter = 8.1) nan
20 images/sec: 668.7 +/- 1.6 (jitter = 3.8) nan
30 images/sec: 669.6 +/- 1.2 (jitter = 4.6) nan
40 images/sec: 670.1 +/- 1.0 (jitter = 4.1) nan
50 images/sec: 669.1 +/- 0.9 (jitter = 4.7) nan
60 images/sec: 668.7 +/- 0.9 (jitter = 4.6) nan
70 images/sec: 668.3 +/- 0.8 (jitter = 5.4) nan
80 images/sec: 668.4 +/- 0.7 (jitter = 5.7) nan
90 images/sec: 668.4 +/- 0.7 (jitter = 5.5) nan
100 images/sec: 668.5 +/- 0.6 (jitter = 5.4) nan
----------------------------------------------------------------
total images/sec: 667.06
----------------------------------------------------------------
###VGG16
1 images/sec: 88.4 +/- 0.0 (jitter = 0.0) 7.342
10 images/sec: 88.3 +/- 0.2 (jitter = 0.4) 7.287
20 images/sec: 88.2 +/- 0.1 (jitter = 0.4) 7.265
30 images/sec: 88.2 +/- 0.1 (jitter = 0.5) 7.304
40 images/sec: 87.7 +/- 0.2 (jitter = 0.8) 7.266
50 images/sec: 87.0 +/- 0.2 (jitter = 1.2) 7.317
60 images/sec: 86.5 +/- 0.2 (jitter = 2.3) 7.262
70 images/sec: 86.3 +/- 0.2 (jitter = 2.6) 7.261
80 images/sec: 86.0 +/- 0.2 (jitter = 2.8) 7.273
90 images/sec: 85.4 +/- 0.3 (jitter = 2.9) 7.265
100 images/sec: 84.9 +/- 0.3 (jitter = 3.3) 7.287
----------------------------------------------------------------
total images/sec: 84.87
----------------------------------------------------------------
#####VGG16 FP16適用
1 images/sec: 59.3 +/- 0.0 (jitter = 0.0) 7.283
10 images/sec: 59.3 +/- 0.1 (jitter = 0.1) 7.300
20 images/sec: 59.3 +/- 0.0 (jitter = 0.1) 7.249
30 images/sec: 59.3 +/- 0.0 (jitter = 0.2) 7.272
40 images/sec: 59.1 +/- 0.1 (jitter = 0.2) 7.280
50 images/sec: 58.9 +/- 0.1 (jitter = 0.3) 7.295
60 images/sec: 58.6 +/- 0.1 (jitter = 0.6) 7.206
70 images/sec: 58.4 +/- 0.1 (jitter = 0.8) 7.255
80 images/sec: 58.1 +/- 0.1 (jitter = 1.3) 7.291
90 images/sec: 57.9 +/- 0.1 (jitter = 1.8) 7.257
100 images/sec: 57.6 +/- 0.1 (jitter = 2.1) 7.296
----------------------------------------------------------------
total images/sec: 57.62
----------------------------------------------------------------
##RX570 16GB
RX570での測定においてはFP16では挙動が安定しなかったり逆に速度が遅くなったりと散々なので参考までにしてください
###InceptionV3
1 images/sec: 48.0 +/- 0.0 (jitter = 0.0) 7.278
10 images/sec: 48.0 +/- 0.0 (jitter = 0.0) 7.330
20 images/sec: 48.0 +/- 0.0 (jitter = 0.0) 7.356
30 images/sec: 48.0 +/- 0.0 (jitter = 0.0) 7.296
40 images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.294
50 images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.322
60 images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.343
70 images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.265
80 images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.291
90 images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.343
100 images/sec: 47.9 +/- 0.0 (jitter = 0.0) 7.339
----------------------------------------------------------------
total images/sec: 47.90
----------------------------------------------------------------
#####InceptionV3 FP16適用
1 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.380
10 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.210
20 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.247
30 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.356
40 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.423
50 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.271
60 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.324
70 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.309
80 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.277
90 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.399
100 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 7.382
----------------------------------------------------------------
total images/sec: 27.24
----------------------------------------------------------------
###Resnet50
1 images/sec: 71.1 +/- 0.0 (jitter = 0.0) 8.169
10 images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.593
20 images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.696
30 images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.753
40 images/sec: 71.1 +/- 0.0 (jitter = 0.1) 8.007
50 images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.520
60 images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.989
70 images/sec: 71.1 +/- 0.0 (jitter = 0.1) 8.028
80 images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.931
90 images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.851
100 images/sec: 71.1 +/- 0.0 (jitter = 0.1) 7.798
----------------------------------------------------------------
total images/sec: 71.10
----------------------------------------------------------------
#####Resnet50 FP16
fp16では途中で落ちてしまった為測定できなかったです
InternalError (see above for traceback): cuDNN launch failure : input shape ([32,128,28,28])
[[node tower_0/v/cg/resnet_v13/conv12/batchnorm12/FusedBatchNormV2 (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:473) ]]
[[node average_loss/Mean (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2907) ]]
###Resnet152
1 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 9.031
10 images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.569
20 images/sec: 27.2 +/- 0.0 (jitter = 0.0) 8.583
30 images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.728
40 images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.636
50 images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.795
60 images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.688
70 images/sec: 27.3 +/- 0.0 (jitter = 0.0) 9.031
80 images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.847
90 images/sec: 27.3 +/- 0.0 (jitter = 0.0) 9.035
100 images/sec: 27.3 +/- 0.0 (jitter = 0.0) 8.852
----------------------------------------------------------------
total images/sec: 27.26
----------------------------------------------------------------
#####Resnet152 FP16適用
こちらもベンチマーク完走できず
InternalError (see above for traceback): cuDNN launch failure : input shape ([32,512,28,28])
[[node tower_0/v/cg/resnet_v13/conv11/batchnorm11/FusedBatchNormV2 (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:473) ]]
[[node average_loss/Mean (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2907) ]]
###AlexNet
1 images/sec: 354.2 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 354.4 +/- 0.3 (jitter = 0.8) nan
20 images/sec: 354.3 +/- 0.2 (jitter = 0.6) nan
30 images/sec: 354.2 +/- 0.2 (jitter = 0.6) nan
40 images/sec: 354.2 +/- 0.1 (jitter = 0.4) nan
50 images/sec: 354.1 +/- 0.1 (jitter = 0.5) nan
60 images/sec: 354.1 +/- 0.1 (jitter = 0.5) nan
70 images/sec: 354.1 +/- 0.1 (jitter = 0.5) nan
80 images/sec: 354.1 +/- 0.1 (jitter = 0.5) nan
90 images/sec: 354.1 +/- 0.1 (jitter = 0.5) nan
100 images/sec: 354.1 +/- 0.1 (jitter = 0.5) nan
----------------------------------------------------------------
total images/sec: 353.77
----------------------------------------------------------------
#####AlexNet FP16適用
Step Img/sec total_loss
1 images/sec: 276.6 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 276.8 +/- 0.1 (jitter = 0.5) nan
20 images/sec: 276.9 +/- 0.1 (jitter = 0.3) nan
30 images/sec: 276.9 +/- 0.1 (jitter = 0.4) nan
40 images/sec: 277.0 +/- 0.1 (jitter = 0.4) nan
50 images/sec: 277.0 +/- 0.1 (jitter = 0.3) nan
60 images/sec: 276.9 +/- 0.0 (jitter = 0.3) nan
70 images/sec: 276.9 +/- 0.0 (jitter = 0.3) nan
80 images/sec: 276.9 +/- 0.0 (jitter = 0.3) nan
90 images/sec: 276.9 +/- 0.0 (jitter = 0.3) nan
100 images/sec: 276.9 +/- 0.0 (jitter = 0.3) nan
----------------------------------------------------------------
total images/sec: 276.70
----------------------------------------------------------------
###VGG16
FP32で動かしたところコアダンプでベンチマーク完走できず
#####VGG16 FP16適用
Step Img/sec total_loss
1 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.262
10 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.263
20 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.281
30 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.284
40 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.266
50 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.289
60 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.219
70 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.258
80 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.305
90 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.248
100 images/sec: 20.6 +/- 0.0 (jitter = 0.0) 7.296
----------------------------------------------------------------
total images/sec: 20.60
----------------------------------------------------------------
ひとまずこれで全部です