3
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

ROCm2.4でTensorflow-rocmベンチマーク

Posted at

https://qiita.com/_JG1WWK/items/355866c49cf867946b48
コマンドはこれを準拠します
ROCm2.4がリリースされたのでベンチマークを試してみたいと思います

環境

minicondaとpipで環境構築を行いました。
pipのinstall内容は以下の通りです。

pip list

$ pip list
Package              Version  
-------------------- ---------
absl-py              0.7.1    
asn1crypto           0.24.0   
astor                0.7.1    
attrdict             2.0.1    
audioread            2.1.6    
bcrypt               3.1.6    
beautifulsoup4       4.7.1    
bs4                  0.0.1    
certifi              2018.8.24
cffi                 1.12.3   
chardet              3.0.4    
cryptography         2.6.1    
cycler               0.10.0   
decorator            4.4.0    
deepspeech-gpu       0.4.1    
ds-ctcdecoder        0.5.0a7  
gast                 0.2.2    
grpcio               1.20.0   
h5py                 2.9.0    
idna                 2.8      
joblib               0.13.2   
Keras-Applications   1.0.7    
Keras-Preprocessing  1.0.9    
kiwisolver           1.1.0    
librosa              0.6.3    
llvmlite             0.28.0   
Markdown             3.1      
matplotlib           3.0.3    
mock                 2.0.0    
numba                0.43.1   
numpy                1.15.4   
pandas               0.24.2   
paramiko             2.4.2    
pbr                  5.1.3    
pip                  10.0.1   
progressbar2         3.39.3   
protobuf             3.7.1    
pyasn1               0.4.5    
pycparser            2.19     
PyNaCl               1.3.0    
pyparsing            2.4.0    
python-dateutil      2.8.0    
python-utils         2.3.0    
pytz                 2019.1   
pyxdg                0.26     
requests             2.21.0   
resampy              0.2.1    
scikit-learn         0.20.3   
scipy                1.2.1    
setuptools           40.2.0   
six                  1.12.0   
SoundFile            0.10.2   
soupsieve            1.9.1    
sox                  1.3.7    
tensorboard          1.13.1   
tensorflow-estimator 1.13.0   
tensorflow-rocm      1.13.3   
termcolor            1.1.0    
urllib3              1.24.3   
Werkzeug             0.15.2   
wheel                0.31.1 

ROCmのバージョンチェックです

$ apt show rocm-libs -a
Package: rocm-libs
Version: 2.4.25
Priority: optional
Section: devel
Maintainer: Advanced Micro Devices Inc.
Installed-Size: 13.3 kB
Depends: rocfft, rocrand, hipblas, rocblas
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 770 B
APT-Manual-Installed: yes
APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

マシンの基本構成

CPU Xeon E5-2603 v4
MB msi-x99 Gaming7
RAM DDR4-2400 32GB
GPU0 NVIDIA GTX1080Ti (グラフィック表示用兼CUDA用)
GPU1 AMD Vega20 RadeonⅦ
OS Ubuntu16.04.6 LST kernel version 4.15

ROCm-tesorflowベンチマーク

ROCm2.4+tensorflow-rocm 1.13.3 で測定

グラフ

Screenshot from 2019-05-26 02-41-36.png

見づらくて申し訳ありませんがGPUとモデルごとのグラフです。
それなりにROCm2.4では性能が改善していることがわかります。

以下が乱雑ながら具体的な数値結果になります。グラフで大体のことはわかるとおもうので無理して見るほどのものではありません

InceptionV3

Step	Img/sec	total_loss
1	images/sec: 122.3 +/- 0.0 (jitter = 0.0)	7.285
10	images/sec: 122.3 +/- 0.1 (jitter = 0.4)	7.328
20	images/sec: 121.5 +/- 0.3 (jitter = 0.6)	7.370
30	images/sec: 120.8 +/- 0.3 (jitter = 2.5)	7.304
40	images/sec: 121.1 +/- 0.2 (jitter = 0.9)	7.263
50	images/sec: 121.0 +/- 0.2 (jitter = 0.9)	7.252
60	images/sec: 121.2 +/- 0.2 (jitter = 0.6)	7.350
70	images/sec: 121.3 +/- 0.2 (jitter = 0.5)	7.289
80	images/sec: 121.3 +/- 0.1 (jitter = 0.5)	7.304
90	images/sec: 121.4 +/- 0.1 (jitter = 0.5)	7.348
100	images/sec: 121.4 +/- 0.1 (jitter = 0.5)	7.356
----------------------------------------------------------------
total images/sec: 121.35
----------------------------------------------------------------

inceptionV3 FP16

Step	Img/sec	total_loss
1	images/sec: 151.5 +/- 0.0 (jitter = 0.0)	7.391
10	images/sec: 152.6 +/- 0.1 (jitter = 0.4)	7.192
20	images/sec: 152.4 +/- 0.1 (jitter = 0.5)	7.257
30	images/sec: 152.0 +/- 0.1 (jitter = 1.0)	7.344
40	images/sec: 152.0 +/- 0.1 (jitter = 1.0)	7.441
50	images/sec: 152.1 +/- 0.1 (jitter = 0.9)	7.271
60	images/sec: 152.1 +/- 0.1 (jitter = 0.8)	7.302
70	images/sec: 152.0 +/- 0.1 (jitter = 0.8)	7.294
80	images/sec: 152.0 +/- 0.1 (jitter = 0.8)	7.240
90	images/sec: 152.0 +/- 0.1 (jitter = 0.8)	7.392
100	images/sec: 152.0 +/- 0.1 (jitter = 0.7)	7.341
----------------------------------------------------------------
total images/sec: 151.91
----------------------------------------------------------------

Resnet50(RadeonⅦ)

1	images/sec: 241.5 +/- 0.0 (jitter = 0.0)	8.169
10	images/sec: 237.1 +/- 0.8 (jitter = 0.4)	7.593
20	images/sec: 242.7 +/- 1.4 (jitter = 6.9)	7.696
30	images/sec: 244.4 +/- 1.0 (jitter = 2.0)	7.753
40	images/sec: 245.4 +/- 0.8 (jitter = 1.7)	8.006
50	images/sec: 245.8 +/- 0.7 (jitter = 1.3)	7.520
60	images/sec: 246.2 +/- 0.6 (jitter = 1.2)	7.989
70	images/sec: 245.6 +/- 0.6 (jitter = 1.5)	8.027
80	images/sec: 244.5 +/- 0.6 (jitter = 1.9)	7.932
90	images/sec: 244.6 +/- 0.6 (jitter = 1.7)	7.850
100	images/sec: 245.0 +/- 0.5 (jitter = 1.4)	7.797
----------------------------------------------------------------
total images/sec: 244.83
----------------------------------------------------------------

Resnet50 FP16

Step	Img/sec	total_loss
1	images/sec: 345.4 +/- 0.0 (jitter = 0.0)	7.820
10	images/sec: 344.6 +/- 0.7 (jitter = 2.1)	8.168
20	images/sec: 344.5 +/- 0.4 (jitter = 1.5)	7.797
30	images/sec: 344.4 +/- 0.3 (jitter = 1.6)	7.880
40	images/sec: 343.4 +/- 0.8 (jitter = 1.4)	8.051
50	images/sec: 342.8 +/- 0.9 (jitter = 1.6)	7.962
60	images/sec: 343.0 +/- 0.8 (jitter = 1.4)	7.725
70	images/sec: 343.2 +/- 0.7 (jitter = 1.4)	7.865
80	images/sec: 343.0 +/- 0.7 (jitter = 1.5)	7.832
90	images/sec: 342.4 +/- 0.7 (jitter = 1.7)	7.728
100	images/sec: 342.7 +/- 0.7 (jitter = 1.6)	8.155
----------------------------------------------------------------
total images/sec: 342.37
----------------------------------------------------------------

FP16においてROCm2.3では300 images/sだったのが342 images/sまで高速化されているので13%ほど性能向上があったようです

Resnet152

Step	Img/sec	total_loss
1	images/sec: 96.3 +/- 0.0 (jitter = 0.0)	9.006
10	images/sec: 96.5 +/- 0.1 (jitter = 0.3)	8.586
20	images/sec: 96.3 +/- 0.2 (jitter = 0.3)	8.583
30	images/sec: 96.3 +/- 0.1 (jitter = 0.3)	8.678
40	images/sec: 96.4 +/- 0.1 (jitter = 0.3)	8.629
50	images/sec: 96.3 +/- 0.1 (jitter = 0.3)	8.787
60	images/sec: 96.4 +/- 0.1 (jitter = 0.3)	8.666
70	images/sec: 96.4 +/- 0.1 (jitter = 0.3)	9.119
80	images/sec: 96.4 +/- 0.1 (jitter = 0.3)	8.899
90	images/sec: 96.4 +/- 0.1 (jitter = 0.3)	9.067
100	images/sec: 96.4 +/- 0.1 (jitter = 0.3)	8.864
----------------------------------------------------------------
total images/sec: 96.32
----------------------------------------------------------------

Resnet152 FP16

Step	Img/sec	total_loss
1	images/sec: 137.7 +/- 0.0 (jitter = 0.0)	9.155
10	images/sec: 136.7 +/- 0.8 (jitter = 0.5)	8.995
20	images/sec: 136.6 +/- 0.5 (jitter = 0.6)	8.828
30	images/sec: 136.7 +/- 0.5 (jitter = 0.5)	8.796
40	images/sec: 136.9 +/- 0.4 (jitter = 0.6)	9.025
50	images/sec: 136.7 +/- 0.3 (jitter = 0.6)	8.669
60	images/sec: 136.7 +/- 0.3 (jitter = 0.6)	8.895
70	images/sec: 136.7 +/- 0.3 (jitter = 0.6)	8.936
80	images/sec: 136.9 +/- 0.3 (jitter = 0.7)	8.872
90	images/sec: 136.8 +/- 0.3 (jitter = 0.7)	8.823
100	images/sec: 136.9 +/- 0.2 (jitter = 0.7)	8.706
----------------------------------------------------------------
total images/sec: 136.86
----------------------------------------------------------------

alexNet

1	images/sec: 579.9 +/- 0.0 (jitter = 0.0)	nan
10	images/sec: 579.6 +/- 4.3 (jitter = 5.0)	nan
20	images/sec: 576.6 +/- 2.4 (jitter = 5.2)	nan
30	images/sec: 577.6 +/- 1.9 (jitter = 4.5)	nan
40	images/sec: 576.5 +/- 1.6 (jitter = 4.2)	nan
50	images/sec: 577.3 +/- 1.3 (jitter = 5.4)	nan
60	images/sec: 577.6 +/- 1.1 (jitter = 5.7)	nan
70	images/sec: 577.6 +/- 1.0 (jitter = 5.3)	nan
80	images/sec: 577.7 +/- 0.9 (jitter = 5.0)	nan
90	images/sec: 577.8 +/- 0.8 (jitter = 4.9)	nan
100	images/sec: 577.6 +/- 0.7 (jitter = 5.1)	nan
----------------------------------------------------------------
total images/sec: 576.46
----------------------------------------------------------------

alexnet FP16

Done warm up
Step	Img/sec	total_loss
1	images/sec: 1347.0 +/- 0.0 (jitter = 0.0)	nan
10	images/sec: 1454.9 +/- 14.1 (jitter = 17.2)	nan
20	images/sec: 1459.4 +/- 8.0 (jitter = 23.1)	nan
30	images/sec: 1462.4 +/- 6.0 (jitter = 29.0)	nan
40	images/sec: 1465.3 +/- 4.9 (jitter = 27.8)	nan
50	images/sec: 1466.2 +/- 4.0 (jitter = 24.9)	nan
60	images/sec: 1465.8 +/- 3.6 (jitter = 24.9)	nan
70	images/sec: 1467.5 +/- 3.4 (jitter = 25.3)	nan
80	images/sec: 1467.8 +/- 3.2 (jitter = 26.2)	nan
90	images/sec: 1467.9 +/- 3.1 (jitter = 28.4)	nan
100	images/sec: 1466.6 +/- 3.3 (jitter = 27.6)	nan
----------------------------------------------------------------
total images/sec: 1459.36
----------------------------------------------------------------

2.3がtotal images/sec: 1270だったのでちょっと性能が向上してますね

VGG16

Step	Img/sec	total_loss
1	images/sec: 136.5 +/- 0.0 (jitter = 0.0)	7.327
10	images/sec: 136.0 +/- 0.1 (jitter = 0.5)	7.300
20	images/sec: 136.2 +/- 0.1 (jitter = 0.6)	7.299
30	images/sec: 136.1 +/- 0.1 (jitter = 0.7)	7.290
40	images/sec: 136.1 +/- 0.1 (jitter = 0.6)	7.249
50	images/sec: 136.0 +/- 0.1 (jitter = 0.5)	7.282
60	images/sec: 135.8 +/- 0.1 (jitter = 0.7)	7.259
70	images/sec: 135.8 +/- 0.1 (jitter = 0.6)	7.272
80	images/sec: 135.7 +/- 0.1 (jitter = 0.7)	7.271
90	images/sec: 135.7 +/- 0.1 (jitter = 0.6)	7.248
100	images/sec: 135.6 +/- 0.1 (jitter = 0.6)	7.293
----------------------------------------------------------------
total images/sec: 135.57
----------------------------------------------------------------

VGG16 FP16適用

Done warm up
Step	Img/sec	total_loss
1	images/sec: 201.4 +/- 0.0 (jitter = 0.0)	7.278
10	images/sec: 200.7 +/- 0.2 (jitter = 0.5)	7.326
20	images/sec: 200.2 +/- 0.2 (jitter = 0.8)	7.273
30	images/sec: 200.3 +/- 0.1 (jitter = 0.8)	7.295
40	images/sec: 200.2 +/- 0.1 (jitter = 0.7)	7.258
50	images/sec: 200.3 +/- 0.1 (jitter = 0.7)	7.278
60	images/sec: 200.2 +/- 0.1 (jitter = 0.6)	7.251
70	images/sec: 200.2 +/- 0.1 (jitter = 0.7)	7.247
80	images/sec: 200.2 +/- 0.1 (jitter = 0.6)	7.303
90	images/sec: 200.1 +/- 0.1 (jitter = 0.7)	7.266
100	images/sec: 200.1 +/- 0.1 (jitter = 0.7)	7.267
----------------------------------------------------------------
total images/sec: 199.95
----------------------------------------------------------------
3
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?