2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

ROCm2.3でTensorflow-rocm1.13.1+MIOpen1.8.0の性能が向上したかどうかの検証

Last updated at Posted at 2019-04-25

https://qiita.com/_JG1WWK/items/1db6504c77894c0f08aa#rx470-16gb-4
AMD_GPUでTensorFlow benchmarksを行い深層学習性能のおおよその性能を検証する(仮)
https://qiita.com/_JG1WWK/items/6bae45d55d9421e24e4a
RX570 16GB版のTensorflow-BenchMarks(Tensorflow-ROCm)結果

この記事のROCm2.3版です
2.3では機械学習性能が向上したという報告がありましたのでresnet50を中心に検証していきたいと思います。

ソフトウェア環境は以下の通り

ROCm-version

$  apt show rocm-libs -a
Package: rocm-libs
Version: 2.3.14
Priority: optional
Section: devel
Maintainer: Advanced Micro Devices Inc.
I nstalled-Size: 13.3 kB
Depends: rocfft, rocrand, hipblas, rocblas
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 766 B
APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

MIOpen-version

$ apt show miopen-hip -a
Package: miopen-hip
Version: 1.8.0-492700c
Priority: optional
Section: devel
Maintainer: Paul Fultz II <paul.fultz@amd.com>
Installed-Size: 95.3 MB
Depends: rocm-opencl-dev, rocm-utils, hip_hcc, miopengemm
Download-Size: 5,312 kB
APT-Manual-Installed: yes
APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages
Description: AMD's DNN Library

OS構成

$ uname -a
Linux rocm2 4.15.0-47-generic #50~16.04.1-Ubuntu SMP Fri Mar 15 16:06:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Tensorflow-benchmarksのダウンロード

$ git clone https://github.com/tensorflow/benchmarks.git -b cnn_tf_v1.13_compatible

Tensorflow-rocmの環境構築

condaのversionは以下の通りです(再現する場合はminicondaを入れてください https://qiita.com/_JG1WWK/items/1817b6488526778aa8f2)

$ conda -V
conda 4.5.12

ひとまずこんな感じでTensorflow-rocm1.13環境を立ち上げます

$ conda create -n tensorflowtest python=3.5
$ conda activate tensorflowtest
$ pip install tensorflow-rocm==1.13.1

piplist

Package              Version  
-------------------- ---------
absl-py              0.7.1    
astor                0.7.1    
certifi              2018.8.24
gast                 0.2.2    
grpcio               1.20.0   
h5py                 2.9.0    
Keras-Applications   1.0.7    
Keras-Preprocessing  1.0.9    
Markdown             3.1      
mock                 2.0.0    
numpy                1.16.2   
pbr                  5.1.3    
pip                  10.0.1   
protobuf             3.7.1    
setuptools           40.2.0   
six                  1.12.0   
tensorboard          1.13.1   
tensorflow-estimator 1.13.0   
tensorflow-rocm      1.13.2   
termcolor            1.1.0    
Werkzeug             0.15.2   
wheel                0.31.1   

テストするGPUはRadeonⅦ、VegaFE,RX570です

#ハードウェア環境
CPU Xeon E5-2603 v4
MB msi-x99 Gaming7
RAM DDR4-2400 32GB
GPU0 NVIDIA GTX1080Ti (グラフィック表示用兼CUDA用)
GPU1 AMD Vega Frontier Edition
OS Ubuntu16.04.6 LST kernel version 4.15

#実行コマンド系

###InceptionV3

 python ./tf_cnn_benchmarks.py --num_gpus=1  --model inception3 --batch_size 32

####inceptionV3 FP16適用

python ./tf_cnn_benchmarks.py --num_gpus=1  --model inception3 --batch_size 32 --use_fp16

###Resnet50

python  ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32

#####Resnet50 Fp16適用

TF_ROCM_FUSION_ENABLE=1 python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16

###Resnet152

 python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32

#####Resnet152 FP16適用

 TF_ROCM_FUSION_ENABLE=1 python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32 --use_fp16

###ALexnet

 python ./tf_cnn_benchmarks.py --num_gpus=1 --model  alexnet --batch_size 32

#####Alexnet FP16適用

TF_ROCM_FUSION_ENABLE=1  python ./tf_cnn_benchmarks.py --num_gpus=1 --model alexnet  --batch_size 32  --use_fp16

###VGG16

 python ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32

####VGG16 FP16適用

 python ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32 --use_fp16

#ベンチマーク

#実行結果のまとめ
先に簡単なグラフですがまとめておきます。
ROCm2.3 Tensorlow-rocmベンチマーク(1).png

WPS Officeが手元になくあまり見やすいデーターではなくて申し訳ありません。
RX570 16GBのFP16実行は非常に不安定で正直使い物になる感じじゃなかったので参考データー扱いでおねがいします。
またVGG16では原因はわかりませんがFP32でのみRX570は完走できないなどgfx803での挙動が正直不安定な感じしました。

すべてのデーターが揃ってるわけではありませんが比較用にROCm2.1のデーターも示しておきます
test.png

Vega10&20でのROCm2.1 FP16での実行時の結果が以下の通りです(これも見づらくて申し訳ないです)

68747470733a2f2f71696974612d696d6167652d73746f72652e73332e616d617a6f6e6177732e636f6d2f302f3233323733322f62306338633234352d663737622d613265372d343165372d3066353336663862643863352e706e67.png

参考値ですがGTX1080TiのResnet50がimages/sec: 196.62なのでROCm2.3&RadeonⅦなら余裕でぶち抜けている感じになっているので
嬉しい限りです。

#以下詳細データー
##RadeonⅦ

###InceptionV3

Step	Img/sec	total_loss
1	images/sec: 122.7 +/- 0.0 (jitter = 0.0)	7.321
10	images/sec: 122.4 +/- 0.5 (jitter = 0.6)	7.308
20	images/sec: 122.4 +/- 0.3 (jitter = 0.5)	7.364
30	images/sec: 122.7 +/- 0.2 (jitter = 0.3)	7.307
40	images/sec: 122.8 +/- 0.2 (jitter = 0.2)	7.277
50	images/sec: 122.8 +/- 0.1 (jitter = 0.2)	7.235
60	images/sec: 122.9 +/- 0.1 (jitter = 0.3)	7.360
70	images/sec: 122.8 +/- 0.1 (jitter = 0.3)	7.308
80	images/sec: 122.9 +/- 0.1 (jitter = 0.3)	7.317
90	images/sec: 122.9 +/- 0.1 (jitter = 0.3)	7.340
100	images/sec: 122.9 +/- 0.1 (jitter = 0.3)	7.406
----------------------------------------------------------------
total images/sec: 122.81
----------------------------------------------------------------

#####InceptionV3 FP64適用

Step	Img/sec	total_loss
1	images/sec: 152.1 +/- 0.0 (jitter = 0.0)	7.414
10	images/sec: 152.3 +/- 0.3 (jitter = 0.5)	7.220
20	images/sec: 152.1 +/- 0.2 (jitter = 0.6)	7.276
30	images/sec: 152.2 +/- 0.1 (jitter = 0.7)	7.347
40	images/sec: 151.9 +/- 0.2 (jitter = 0.7)	7.446
50	images/sec: 151.9 +/- 0.2 (jitter = 0.6)	7.227
60	images/sec: 152.0 +/- 0.2 (jitter = 0.7)	7.293
70	images/sec: 152.0 +/- 0.1 (jitter = 0.6)	7.286
80	images/sec: 152.0 +/- 0.1 (jitter = 0.6)	7.216
90	images/sec: 152.0 +/- 0.1 (jitter = 0.6)	7.404
100	images/sec: 152.1 +/- 0.1 (jitter = 0.6)	7.360
----------------------------------------------------------------
total images/sec: 151.98
----------------------------------------------------------------

###Resnet50

python  ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32
Step	Img/sec	total_loss
1	images/sec: 224.5 +/- 0.0 (jitter = 0.0)	8.169
10	images/sec: 225.9 +/- 1.3 (jitter = 3.8)	7.593
20	images/sec: 226.1 +/- 0.9 (jitter = 3.0)	7.696
30	images/sec: 226.9 +/- 0.7 (jitter = 1.5)	7.753
40	images/sec: 227.4 +/- 0.6 (jitter = 1.5)	8.007
50	images/sec: 227.9 +/- 0.5 (jitter = 1.4)	7.520
60	images/sec: 228.1 +/- 0.5 (jitter = 1.4)	7.990
70	images/sec: 227.9 +/- 0.4 (jitter = 1.3)	8.027
80	images/sec: 227.1 +/- 0.5 (jitter = 1.6)	7.931
90	images/sec: 227.1 +/- 0.4 (jitter = 1.4)	7.851
100	images/sec: 226.7 +/- 0.5 (jitter = 1.6)	7.795
----------------------------------------------------------------
total images/sec: 226.50
----------------------------------------------------------------

####Resnet50 FP16適用

1	images/sec: 309.3 +/- 0.0 (jitter = 0.0)	7.813
10	images/sec: 308.4 +/- 0.4 (jitter = 0.9)	8.172
20	images/sec: 308.3 +/- 0.4 (jitter = 1.2)	7.805
30	images/sec: 308.5 +/- 0.3 (jitter = 1.2)	7.897
40	images/sec: 308.4 +/- 0.3 (jitter = 1.3)	8.042
50	images/sec: 307.5 +/- 0.6 (jitter = 1.6)	7.960
60	images/sec: 307.6 +/- 0.5 (jitter = 1.5)	7.726
70	images/sec: 307.9 +/- 0.5 (jitter = 1.4)	7.875
80	images/sec: 307.5 +/- 0.5 (jitter = 1.5)	7.825
90	images/sec: 307.5 +/- 0.5 (jitter = 1.6)	7.724
100	images/sec: 307.6 +/- 0.5 (jitter = 1.7)	8.145
----------------------------------------------------------------
total images/sec: 307.29
----------------------------------------------------------------

###Resnet152

 python ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32
Step	Img/sec	total_loss
1	images/sec: 91.6 +/- 0.0 (jitter = 0.0)	8.999
10	images/sec: 91.2 +/- 0.1 (jitter = 0.2)	8.605
20	images/sec: 91.2 +/- 0.1 (jitter = 0.3)	8.592
30	images/sec: 90.9 +/- 0.1 (jitter = 0.4)	8.752
40	images/sec: 90.7 +/- 0.1 (jitter = 0.4)	8.607
50	images/sec: 90.7 +/- 0.1 (jitter = 0.4)	8.798
60	images/sec: 90.7 +/- 0.1 (jitter = 0.4)	8.670
70	images/sec: 90.6 +/- 0.1 (jitter = 0.4)	9.088
80	images/sec: 90.6 +/- 0.1 (jitter = 0.3)	8.885
90	images/sec: 90.7 +/- 0.1 (jitter = 0.3)	9.057
100	images/sec: 90.7 +/- 0.1 (jitter = 0.3)	8.767
----------------------------------------------------------------
total images/sec: 90.68
----------------------------------------------------------------

####Resnet152 FP16適用

1	images/sec: 123.5 +/- 0.0 (jitter = 0.0)	9.183
10	images/sec: 122.1 +/- 0.6 (jitter = 0.8)	8.962
20	images/sec: 122.1 +/- 0.4 (jitter = 0.6)	8.808
30	images/sec: 121.7 +/- 0.4 (jitter = 0.8)	8.853
40	images/sec: 121.9 +/- 0.3 (jitter = 0.6)	9.003
50	images/sec: 122.1 +/- 0.2 (jitter = 0.5)	8.704
60	images/sec: 122.1 +/- 0.2 (jitter = 0.6)	8.862
70	images/sec: 122.1 +/- 0.2 (jitter = 0.5)	8.981
80	images/sec: 122.1 +/- 0.2 (jitter = 0.5)	8.838
90	images/sec: 122.2 +/- 0.2 (jitter = 0.5)	8.815
100	images/sec: 122.2 +/- 0.2 (jitter = 0.5)	8.645
----------------------------------------------------------------
total images/sec: 122.18
----------------------------------------------------------------

###AlexNet

Step	Img/sec	total_loss
1	images/sec: 530.6 +/- 0.0 (jitter = 0.0)	nan
10	images/sec: 530.0 +/- 1.9 (jitter = 4.0)	nan
20	images/sec: 530.4 +/- 1.9 (jitter = 4.5)	nan
30	images/sec: 529.1 +/- 1.3 (jitter = 2.2)	nan
40	images/sec: 529.6 +/- 1.2 (jitter = 2.4)	nan
50	images/sec: 528.8 +/- 1.1 (jitter = 2.3)	nan
60	images/sec: 529.0 +/- 1.0 (jitter = 2.0)	nan
70	images/sec: 529.3 +/- 0.9 (jitter = 2.6)	nan
80	images/sec: 529.8 +/- 0.8 (jitter = 3.3)	nan
90	images/sec: 530.1 +/- 0.7 (jitter = 4.3)	nan
100	images/sec: 530.3 +/- 0.7 (jitter = 4.7)	nan
----------------------------------------------------------------
total images/sec: 529.51
----------------------------------------------------------------

####Alexnet FP64適用

Step	Img/sec	total_loss
1	images/sec: 1215.9 +/- 0.0 (jitter = 0.0)	nan
10	images/sec: 1256.6 +/- 9.2 (jitter = 35.3)	nan
20	images/sec: 1264.8 +/- 7.3 (jitter = 39.2)	nan
30	images/sec: 1272.3 +/- 5.5 (jitter = 25.9)	nan
40	images/sec: 1276.2 +/- 4.4 (jitter = 24.6)	nan
50	images/sec: 1274.5 +/- 4.2 (jitter = 26.0)	nan
60	images/sec: 1275.0 +/- 3.6 (jitter = 25.2)	nan
70	images/sec: 1275.8 +/- 3.2 (jitter = 23.0)	nan
80	images/sec: 1275.0 +/- 3.0 (jitter = 23.2)	nan
90	images/sec: 1276.1 +/- 2.7 (jitter = 22.8)	nan
100	images/sec: 1275.9 +/- 2.6 (jitter = 22.7)	nan
----------------------------------------------------------------
total images/sec: 1270.20
----------------------------------------------------------------

なぜかこれだけROCm2.1比で特段高速化している印象がある

###VGG16

Step	Img/sec	total_loss
1	images/sec: 132.4 +/- 0.0 (jitter = 0.0)	7.296
10	images/sec: 132.6 +/- 0.1 (jitter = 0.4)	7.294
20	images/sec: 132.5 +/- 0.1 (jitter = 0.5)	7.294
30	images/sec: 132.4 +/- 0.1 (jitter = 0.7)	7.306
40	images/sec: 132.3 +/- 0.1 (jitter = 0.5)	7.231
50	images/sec: 132.2 +/- 0.1 (jitter = 0.5)	7.307
60	images/sec: 132.1 +/- 0.1 (jitter = 0.5)	7.281
70	images/sec: 132.0 +/- 0.1 (jitter = 0.6)	7.261
80	images/sec: 131.9 +/- 0.1 (jitter = 0.5)	7.291
90	images/sec: 131.9 +/- 0.1 (jitter = 0.5)	7.259
100	images/sec: 131.8 +/- 0.1 (jitter = 0.6)	7.273
----------------------------------------------------------------
total images/sec: 131.72
----------------------------------------------------------------

#####VGG16 FP16適用版

Step	Img/sec	total_loss
1	images/sec: 192.9 +/- 0.0 (jitter = 0.0)	7.268
10	images/sec: 194.1 +/- 0.3 (jitter = 1.2)	7.284
20	images/sec: 194.3 +/- 0.2 (jitter = 1.0)	7.267
30	images/sec: 194.3 +/- 0.2 (jitter = 0.9)	7.282
40	images/sec: 194.1 +/- 0.1 (jitter = 0.9)	7.263
50	images/sec: 194.0 +/- 0.1 (jitter = 1.0)	7.291
60	images/sec: 193.9 +/- 0.1 (jitter = 1.0)	7.218
70	images/sec: 193.9 +/- 0.1 (jitter = 1.0)	7.246
80	images/sec: 193.9 +/- 0.1 (jitter = 1.0)	7.277
90	images/sec: 193.8 +/- 0.1 (jitter = 1.0)	7.245
100	images/sec: 193.7 +/- 0.1 (jitter = 1.1)	7.287
----------------------------------------------------------------
total images/sec: 193.55
----------------------------------------------------------------

##VegaFE

###InceptionV3

1	images/sec: 66.8 +/- 0.0 (jitter = 0.0)	7.310
10	images/sec: 66.5 +/- 0.2 (jitter = 0.4)	7.339
20	images/sec: 66.4 +/- 0.1 (jitter = 0.3)	7.340
30	images/sec: 66.5 +/- 0.1 (jitter = 0.2)	7.291
40	images/sec: 66.6 +/- 0.1 (jitter = 0.5)	7.274
50	images/sec: 66.7 +/- 0.1 (jitter = 0.7)	7.246
60	images/sec: 66.8 +/- 0.1 (jitter = 0.6)	7.311
70	images/sec: 66.9 +/- 0.1 (jitter = 0.5)	7.275
80	images/sec: 66.9 +/- 0.1 (jitter = 0.4)	7.308
90	images/sec: 66.8 +/- 0.1 (jitter = 0.4)	7.316
100	images/sec: 66.7 +/- 0.1 (jitter = 0.5)	7.372
----------------------------------------------------------------
total images/sec: 66.72
----------------------------------------------------------------```

####InceptionV3 FP16

1	images/sec: 61.1 +/- 0.0 (jitter = 0.0)	7.406
10	images/sec: 61.0 +/- 0.1 (jitter = 0.3)	7.201
20	images/sec: 60.9 +/- 0.0 (jitter = 0.2)	7.289
30	images/sec: 60.9 +/- 0.0 (jitter = 0.2)	7.382
40	images/sec: 60.8 +/- 0.0 (jitter = 0.2)	7.443
50	images/sec: 60.8 +/- 0.0 (jitter = 0.2)	7.241
60	images/sec: 60.8 +/- 0.0 (jitter = 0.2)	7.320
70	images/sec: 60.7 +/- 0.0 (jitter = 0.2)	7.310
80	images/sec: 60.7 +/- 0.0 (jitter = 0.3)	7.262
90	images/sec: 60.7 +/- 0.0 (jitter = 0.3)	7.386
100	images/sec: 60.6 +/- 0.0 (jitter = 0.3)	7.374
----------------------------------------------------------------
total images/sec: 60.62
----------------------------------------------------------------

###Resnet50

Step	Img/sec	total_loss
1	images/sec: 126.7 +/- 0.0 (jitter = 0.0)	8.169
10	images/sec: 127.4 +/- 0.1 (jitter = 0.5)	7.593
20	images/sec: 127.4 +/- 0.2 (jitter = 0.5)	7.696
30	images/sec: 127.5 +/- 0.1 (jitter = 0.4)	7.753
40	images/sec: 127.5 +/- 0.1 (jitter = 0.5)	8.006
50	images/sec: 127.4 +/- 0.1 (jitter = 0.6)	7.520
60	images/sec: 127.5 +/- 0.1 (jitter = 0.6)	7.989
70	images/sec: 127.5 +/- 0.1 (jitter = 0.6)	8.028
80	images/sec: 127.5 +/- 0.1 (jitter = 0.6)	7.931
90	images/sec: 127.4 +/- 0.1 (jitter = 0.6)	7.850
100	images/sec: 127.4 +/- 0.1 (jitter = 0.6)	7.796
----------------------------------------------------------------
total images/sec: 127.32
----------------------------------------------------------------

#####Resnet50 FP16適用

Step	Img/sec	total_loss
1	images/sec: 160.0 +/- 0.0 (jitter = 0.0)	7.823
10	images/sec: 161.0 +/- 0.2 (jitter = 1.0)	8.187
20	images/sec: 161.2 +/- 0.2 (jitter = 0.7)	7.801
30	images/sec: 161.0 +/- 0.1 (jitter = 0.8)	7.880
40	images/sec: 160.7 +/- 0.2 (jitter = 1.0)	8.048
50	images/sec: 160.5 +/- 0.2 (jitter = 1.1)	7.949
60	images/sec: 160.4 +/- 0.1 (jitter = 1.1)	7.727
70	images/sec: 160.3 +/- 0.1 (jitter = 1.1)	7.871
80	images/sec: 160.3 +/- 0.1 (jitter = 1.0)	7.826
90	images/sec: 160.2 +/- 0.1 (jitter = 1.0)	7.739
100	images/sec: 160.1 +/- 0.1 (jitter = 1.1)	8.152
----------------------------------------------------------------
total images/sec: 159.98
----------------------------------------------------------------

###Resnet152

Step	Img/sec	total_loss
1	images/sec: 60.4 +/- 0.0 (jitter = 0.0)	9.008
10	images/sec: 60.7 +/- 0.1 (jitter = 0.4)	8.577
20	images/sec: 60.8 +/- 0.1 (jitter = 0.3)	8.620
30	images/sec: 60.7 +/- 0.1 (jitter = 0.4)	8.702
40	images/sec: 60.1 +/- 0.2 (jitter = 0.6)	8.624
50	images/sec: 59.7 +/- 0.2 (jitter = 1.3)	8.804
60	images/sec: 59.3 +/- 0.2 (jitter = 2.1)	8.658
70	images/sec: 59.1 +/- 0.2 (jitter = 1.4)	9.081
80	images/sec: 58.9 +/- 0.2 (jitter = 1.1)	8.851
90	images/sec: 58.5 +/- 0.2 (jitter = 1.5)	9.021
100	images/sec: 58.2 +/- 0.2 (jitter = 2.2)	8.841
----------------------------------------------------------------
total images/sec: 58.21
----------------------------------------------------------------

#####Resnet152 FP16適用

Step	Img/sec	total_loss
1	images/sec: 59.9 +/- 0.0 (jitter = 0.0)	9.191
10	images/sec: 60.2 +/- 0.2 (jitter = 0.4)	8.961
20	images/sec: 60.2 +/- 0.1 (jitter = 0.4)	8.863
30	images/sec: 60.2 +/- 0.1 (jitter = 0.4)	8.854
40	images/sec: 60.2 +/- 0.1 (jitter = 0.4)	8.994
50	images/sec: 60.2 +/- 0.1 (jitter = 0.4)	8.675
60	images/sec: 60.0 +/- 0.1 (jitter = 0.5)	8.864
70	images/sec: 59.7 +/- 0.1 (jitter = 0.6)	8.970
80	images/sec: 59.5 +/- 0.1 (jitter = 0.9)	8.899
90	images/sec: 59.2 +/- 0.1 (jitter = 1.2)	8.829
100	images/sec: 59.0 +/- 0.1 (jitter = 1.5)	8.709
----------------------------------------------------------------
total images/sec: 59.01
----------------------------------------------------------------

###AlexNet

Step	Img/sec	total_loss
1	images/sec: 487.2 +/- 0.0 (jitter = 0.0)	nan
10	images/sec: 483.0 +/- 1.1 (jitter = 2.5)	nan
20	images/sec: 481.7 +/- 0.8 (jitter = 4.5)	nan
30	images/sec: 480.6 +/- 0.6 (jitter = 2.8)	nan
40	images/sec: 481.5 +/- 0.6 (jitter = 5.2)	nan
50	images/sec: 481.8 +/- 0.5 (jitter = 4.3)	nan
60	images/sec: 482.2 +/- 0.5 (jitter = 2.5)	nan
70	images/sec: 482.3 +/- 0.4 (jitter = 2.3)	nan
80	images/sec: 482.5 +/- 0.4 (jitter = 2.3)	nan
90	images/sec: 482.6 +/- 0.4 (jitter = 2.1)	nan
100	images/sec: 482.6 +/- 0.3 (jitter = 2.2)	nan
----------------------------------------------------------------
total images/sec: 481.87
----------------------------------------------------------------

######Alexnet FP16適用

Step	Img/sec	total_loss
1	images/sec: 650.5 +/- 0.0 (jitter = 0.0)	nan
10	images/sec: 665.2 +/- 2.7 (jitter = 8.1)	nan
20	images/sec: 668.7 +/- 1.6 (jitter = 3.8)	nan
30	images/sec: 669.6 +/- 1.2 (jitter = 4.6)	nan
40	images/sec: 670.1 +/- 1.0 (jitter = 4.1)	nan
50	images/sec: 669.1 +/- 0.9 (jitter = 4.7)	nan
60	images/sec: 668.7 +/- 0.9 (jitter = 4.6)	nan
70	images/sec: 668.3 +/- 0.8 (jitter = 5.4)	nan
80	images/sec: 668.4 +/- 0.7 (jitter = 5.7)	nan
90	images/sec: 668.4 +/- 0.7 (jitter = 5.5)	nan
100	images/sec: 668.5 +/- 0.6 (jitter = 5.4)	nan
----------------------------------------------------------------
total images/sec: 667.06
----------------------------------------------------------------

###VGG16

1	images/sec: 88.4 +/- 0.0 (jitter = 0.0)	7.342
10	images/sec: 88.3 +/- 0.2 (jitter = 0.4)	7.287
20	images/sec: 88.2 +/- 0.1 (jitter = 0.4)	7.265
30	images/sec: 88.2 +/- 0.1 (jitter = 0.5)	7.304
40	images/sec: 87.7 +/- 0.2 (jitter = 0.8)	7.266
50	images/sec: 87.0 +/- 0.2 (jitter = 1.2)	7.317
60	images/sec: 86.5 +/- 0.2 (jitter = 2.3)	7.262
70	images/sec: 86.3 +/- 0.2 (jitter = 2.6)	7.261
80	images/sec: 86.0 +/- 0.2 (jitter = 2.8)	7.273
90	images/sec: 85.4 +/- 0.3 (jitter = 2.9)	7.265
100	images/sec: 84.9 +/- 0.3 (jitter = 3.3)	7.287
----------------------------------------------------------------
total images/sec: 84.87
----------------------------------------------------------------

#####VGG16 FP16適用

1	images/sec: 59.3 +/- 0.0 (jitter = 0.0)	7.283
10	images/sec: 59.3 +/- 0.1 (jitter = 0.1)	7.300
20	images/sec: 59.3 +/- 0.0 (jitter = 0.1)	7.249
30	images/sec: 59.3 +/- 0.0 (jitter = 0.2)	7.272
40	images/sec: 59.1 +/- 0.1 (jitter = 0.2)	7.280
50	images/sec: 58.9 +/- 0.1 (jitter = 0.3)	7.295
60	images/sec: 58.6 +/- 0.1 (jitter = 0.6)	7.206
70	images/sec: 58.4 +/- 0.1 (jitter = 0.8)	7.255
80	images/sec: 58.1 +/- 0.1 (jitter = 1.3)	7.291
90	images/sec: 57.9 +/- 0.1 (jitter = 1.8)	7.257
100	images/sec: 57.6 +/- 0.1 (jitter = 2.1)	7.296
----------------------------------------------------------------
total images/sec: 57.62
----------------------------------------------------------------

##RX570 16GB
RX570での測定においてはFP16では挙動が安定しなかったり逆に速度が遅くなったりと散々なので参考までにしてください

###InceptionV3

1	images/sec: 48.0 +/- 0.0 (jitter = 0.0)	7.278
10	images/sec: 48.0 +/- 0.0 (jitter = 0.0)	7.330
20	images/sec: 48.0 +/- 0.0 (jitter = 0.0)	7.356
30	images/sec: 48.0 +/- 0.0 (jitter = 0.0)	7.296
40	images/sec: 47.9 +/- 0.0 (jitter = 0.0)	7.294
50	images/sec: 47.9 +/- 0.0 (jitter = 0.0)	7.322
60	images/sec: 47.9 +/- 0.0 (jitter = 0.0)	7.343
70	images/sec: 47.9 +/- 0.0 (jitter = 0.0)	7.265
80	images/sec: 47.9 +/- 0.0 (jitter = 0.0)	7.291
90	images/sec: 47.9 +/- 0.0 (jitter = 0.0)	7.343
100	images/sec: 47.9 +/- 0.0 (jitter = 0.0)	7.339
----------------------------------------------------------------
total images/sec: 47.90
----------------------------------------------------------------

#####InceptionV3 FP16適用

1	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	7.380
10	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	7.210
20	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	7.247
30	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	7.356
40	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	7.423
50	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	7.271
60	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	7.324
70	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	7.309
80	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	7.277
90	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	7.399
100	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	7.382
----------------------------------------------------------------
total images/sec: 27.24
----------------------------------------------------------------

###Resnet50

1	images/sec: 71.1 +/- 0.0 (jitter = 0.0)	8.169
10	images/sec: 71.1 +/- 0.0 (jitter = 0.1)	7.593
20	images/sec: 71.1 +/- 0.0 (jitter = 0.1)	7.696
30	images/sec: 71.1 +/- 0.0 (jitter = 0.1)	7.753
40	images/sec: 71.1 +/- 0.0 (jitter = 0.1)	8.007
50	images/sec: 71.1 +/- 0.0 (jitter = 0.1)	7.520
60	images/sec: 71.1 +/- 0.0 (jitter = 0.1)	7.989
70	images/sec: 71.1 +/- 0.0 (jitter = 0.1)	8.028
80	images/sec: 71.1 +/- 0.0 (jitter = 0.1)	7.931
90	images/sec: 71.1 +/- 0.0 (jitter = 0.1)	7.851
100	images/sec: 71.1 +/- 0.0 (jitter = 0.1)	7.798
----------------------------------------------------------------
total images/sec: 71.10
----------------------------------------------------------------

#####Resnet50 FP16
fp16では途中で落ちてしまった為測定できなかったです

InternalError (see above for traceback): cuDNN launch failure : input shape ([32,128,28,28])
	 [[node tower_0/v/cg/resnet_v13/conv12/batchnorm12/FusedBatchNormV2 (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:473) ]]
	 [[node average_loss/Mean (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2907) ]]

###Resnet152

1	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	9.031
10	images/sec: 27.3 +/- 0.0 (jitter = 0.0)	8.569
20	images/sec: 27.2 +/- 0.0 (jitter = 0.0)	8.583
30	images/sec: 27.3 +/- 0.0 (jitter = 0.0)	8.728
40	images/sec: 27.3 +/- 0.0 (jitter = 0.0)	8.636
50	images/sec: 27.3 +/- 0.0 (jitter = 0.0)	8.795
60	images/sec: 27.3 +/- 0.0 (jitter = 0.0)	8.688
70	images/sec: 27.3 +/- 0.0 (jitter = 0.0)	9.031
80	images/sec: 27.3 +/- 0.0 (jitter = 0.0)	8.847
90	images/sec: 27.3 +/- 0.0 (jitter = 0.0)	9.035
100	images/sec: 27.3 +/- 0.0 (jitter = 0.0)	8.852
----------------------------------------------------------------
total images/sec: 27.26
----------------------------------------------------------------

#####Resnet152 FP16適用

こちらもベンチマーク完走できず

InternalError (see above for traceback): cuDNN launch failure : input shape ([32,512,28,28])
	 [[node tower_0/v/cg/resnet_v13/conv11/batchnorm11/FusedBatchNormV2 (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:473) ]]
	 [[node average_loss/Mean (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2907) ]]

###AlexNet

1	images/sec: 354.2 +/- 0.0 (jitter = 0.0)	nan
10	images/sec: 354.4 +/- 0.3 (jitter = 0.8)	nan
20	images/sec: 354.3 +/- 0.2 (jitter = 0.6)	nan
30	images/sec: 354.2 +/- 0.2 (jitter = 0.6)	nan
40	images/sec: 354.2 +/- 0.1 (jitter = 0.4)	nan
50	images/sec: 354.1 +/- 0.1 (jitter = 0.5)	nan
60	images/sec: 354.1 +/- 0.1 (jitter = 0.5)	nan
70	images/sec: 354.1 +/- 0.1 (jitter = 0.5)	nan
80	images/sec: 354.1 +/- 0.1 (jitter = 0.5)	nan
90	images/sec: 354.1 +/- 0.1 (jitter = 0.5)	nan
100	images/sec: 354.1 +/- 0.1 (jitter = 0.5)	nan
----------------------------------------------------------------
total images/sec: 353.77
----------------------------------------------------------------

#####AlexNet FP16適用

Step	Img/sec	total_loss
1	images/sec: 276.6 +/- 0.0 (jitter = 0.0)	nan
10	images/sec: 276.8 +/- 0.1 (jitter = 0.5)	nan
20	images/sec: 276.9 +/- 0.1 (jitter = 0.3)	nan
30	images/sec: 276.9 +/- 0.1 (jitter = 0.4)	nan
40	images/sec: 277.0 +/- 0.1 (jitter = 0.4)	nan
50	images/sec: 277.0 +/- 0.1 (jitter = 0.3)	nan
60	images/sec: 276.9 +/- 0.0 (jitter = 0.3)	nan
70	images/sec: 276.9 +/- 0.0 (jitter = 0.3)	nan
80	images/sec: 276.9 +/- 0.0 (jitter = 0.3)	nan
90	images/sec: 276.9 +/- 0.0 (jitter = 0.3)	nan
100	images/sec: 276.9 +/- 0.0 (jitter = 0.3)	nan
----------------------------------------------------------------
total images/sec: 276.70
----------------------------------------------------------------

###VGG16
FP32で動かしたところコアダンプでベンチマーク完走できず

#####VGG16 FP16適用

Step	Img/sec	total_loss
1	images/sec: 20.6 +/- 0.0 (jitter = 0.0)	7.262
10	images/sec: 20.6 +/- 0.0 (jitter = 0.0)	7.263
20	images/sec: 20.6 +/- 0.0 (jitter = 0.0)	7.281
30	images/sec: 20.6 +/- 0.0 (jitter = 0.0)	7.284
40	images/sec: 20.6 +/- 0.0 (jitter = 0.0)	7.266
50	images/sec: 20.6 +/- 0.0 (jitter = 0.0)	7.289
60	images/sec: 20.6 +/- 0.0 (jitter = 0.0)	7.219
70	images/sec: 20.6 +/- 0.0 (jitter = 0.0)	7.258
80	images/sec: 20.6 +/- 0.0 (jitter = 0.0)	7.305
90	images/sec: 20.6 +/- 0.0 (jitter = 0.0)	7.248
100	images/sec: 20.6 +/- 0.0 (jitter = 0.0)	7.296
----------------------------------------------------------------
total images/sec: 20.60
----------------------------------------------------------------

ひとまずこれで全部です

2
1
2

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?