LoginSignup
3
1

More than 3 years have passed since last update.

ROCm2.4でTensorflow-rocmベンチマーク

Posted at

https://qiita.com/_JG1WWK/items/355866c49cf867946b48
コマンドはこれを準拠します
ROCm2.4がリリースされたのでベンチマークを試してみたいと思います

環境

minicondaとpipで環境構築を行いました。
pipのinstall内容は以下の通りです。

pip list

$ pip list
Package              Version  
-------------------- ---------
absl-py              0.7.1    
asn1crypto           0.24.0   
astor                0.7.1    
attrdict             2.0.1    
audioread            2.1.6    
bcrypt               3.1.6    
beautifulsoup4       4.7.1    
bs4                  0.0.1    
certifi              2018.8.24
cffi                 1.12.3   
chardet              3.0.4    
cryptography         2.6.1    
cycler               0.10.0   
decorator            4.4.0    
deepspeech-gpu       0.4.1    
ds-ctcdecoder        0.5.0a7  
gast                 0.2.2    
grpcio               1.20.0   
h5py                 2.9.0    
idna                 2.8      
joblib               0.13.2   
Keras-Applications   1.0.7    
Keras-Preprocessing  1.0.9    
kiwisolver           1.1.0    
librosa              0.6.3    
llvmlite             0.28.0   
Markdown             3.1      
matplotlib           3.0.3    
mock                 2.0.0    
numba                0.43.1   
numpy                1.15.4   
pandas               0.24.2   
paramiko             2.4.2    
pbr                  5.1.3    
pip                  10.0.1   
progressbar2         3.39.3   
protobuf             3.7.1    
pyasn1               0.4.5    
pycparser            2.19     
PyNaCl               1.3.0    
pyparsing            2.4.0    
python-dateutil      2.8.0    
python-utils         2.3.0    
pytz                 2019.1   
pyxdg                0.26     
requests             2.21.0   
resampy              0.2.1    
scikit-learn         0.20.3   
scipy                1.2.1    
setuptools           40.2.0   
six                  1.12.0   
SoundFile            0.10.2   
soupsieve            1.9.1    
sox                  1.3.7    
tensorboard          1.13.1   
tensorflow-estimator 1.13.0   
tensorflow-rocm      1.13.3   
termcolor            1.1.0    
urllib3              1.24.3   
Werkzeug             0.15.2   
wheel                0.31.1 

ROCmのバージョンチェックです

$ apt show rocm-libs -a
Package: rocm-libs
Version: 2.4.25
Priority: optional
Section: devel
Maintainer: Advanced Micro Devices Inc.
Installed-Size: 13.3 kB
Depends: rocfft, rocrand, hipblas, rocblas
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 770 B
APT-Manual-Installed: yes
APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

マシンの基本構成

CPU Xeon E5-2603 v4
MB msi-x99 Gaming7
RAM DDR4-2400 32GB
GPU0 NVIDIA GTX1080Ti (グラフィック表示用兼CUDA用)
GPU1 AMD Vega20 RadeonⅦ
OS Ubuntu16.04.6 LST kernel version 4.15

ROCm-tesorflowベンチマーク

ROCm2.4+tensorflow-rocm 1.13.3 で測定

グラフ

Screenshot from 2019-05-26 02-41-36.png

見づらくて申し訳ありませんがGPUとモデルごとのグラフです。
それなりにROCm2.4では性能が改善していることがわかります。

以下が乱雑ながら具体的な数値結果になります。グラフで大体のことはわかるとおもうので無理して見るほどのものではありません

InceptionV3

Step    Img/sec total_loss
1   images/sec: 122.3 +/- 0.0 (jitter = 0.0)    7.285
10  images/sec: 122.3 +/- 0.1 (jitter = 0.4)    7.328
20  images/sec: 121.5 +/- 0.3 (jitter = 0.6)    7.370
30  images/sec: 120.8 +/- 0.3 (jitter = 2.5)    7.304
40  images/sec: 121.1 +/- 0.2 (jitter = 0.9)    7.263
50  images/sec: 121.0 +/- 0.2 (jitter = 0.9)    7.252
60  images/sec: 121.2 +/- 0.2 (jitter = 0.6)    7.350
70  images/sec: 121.3 +/- 0.2 (jitter = 0.5)    7.289
80  images/sec: 121.3 +/- 0.1 (jitter = 0.5)    7.304
90  images/sec: 121.4 +/- 0.1 (jitter = 0.5)    7.348
100 images/sec: 121.4 +/- 0.1 (jitter = 0.5)    7.356
----------------------------------------------------------------
total images/sec: 121.35
----------------------------------------------------------------

inceptionV3 FP16

Step    Img/sec total_loss
1   images/sec: 151.5 +/- 0.0 (jitter = 0.0)    7.391
10  images/sec: 152.6 +/- 0.1 (jitter = 0.4)    7.192
20  images/sec: 152.4 +/- 0.1 (jitter = 0.5)    7.257
30  images/sec: 152.0 +/- 0.1 (jitter = 1.0)    7.344
40  images/sec: 152.0 +/- 0.1 (jitter = 1.0)    7.441
50  images/sec: 152.1 +/- 0.1 (jitter = 0.9)    7.271
60  images/sec: 152.1 +/- 0.1 (jitter = 0.8)    7.302
70  images/sec: 152.0 +/- 0.1 (jitter = 0.8)    7.294
80  images/sec: 152.0 +/- 0.1 (jitter = 0.8)    7.240
90  images/sec: 152.0 +/- 0.1 (jitter = 0.8)    7.392
100 images/sec: 152.0 +/- 0.1 (jitter = 0.7)    7.341
----------------------------------------------------------------
total images/sec: 151.91
----------------------------------------------------------------

Resnet50(RadeonⅦ)

1   images/sec: 241.5 +/- 0.0 (jitter = 0.0)    8.169
10  images/sec: 237.1 +/- 0.8 (jitter = 0.4)    7.593
20  images/sec: 242.7 +/- 1.4 (jitter = 6.9)    7.696
30  images/sec: 244.4 +/- 1.0 (jitter = 2.0)    7.753
40  images/sec: 245.4 +/- 0.8 (jitter = 1.7)    8.006
50  images/sec: 245.8 +/- 0.7 (jitter = 1.3)    7.520
60  images/sec: 246.2 +/- 0.6 (jitter = 1.2)    7.989
70  images/sec: 245.6 +/- 0.6 (jitter = 1.5)    8.027
80  images/sec: 244.5 +/- 0.6 (jitter = 1.9)    7.932
90  images/sec: 244.6 +/- 0.6 (jitter = 1.7)    7.850
100 images/sec: 245.0 +/- 0.5 (jitter = 1.4)    7.797
----------------------------------------------------------------
total images/sec: 244.83
----------------------------------------------------------------

Resnet50 FP16

Step    Img/sec total_loss
1   images/sec: 345.4 +/- 0.0 (jitter = 0.0)    7.820
10  images/sec: 344.6 +/- 0.7 (jitter = 2.1)    8.168
20  images/sec: 344.5 +/- 0.4 (jitter = 1.5)    7.797
30  images/sec: 344.4 +/- 0.3 (jitter = 1.6)    7.880
40  images/sec: 343.4 +/- 0.8 (jitter = 1.4)    8.051
50  images/sec: 342.8 +/- 0.9 (jitter = 1.6)    7.962
60  images/sec: 343.0 +/- 0.8 (jitter = 1.4)    7.725
70  images/sec: 343.2 +/- 0.7 (jitter = 1.4)    7.865
80  images/sec: 343.0 +/- 0.7 (jitter = 1.5)    7.832
90  images/sec: 342.4 +/- 0.7 (jitter = 1.7)    7.728
100 images/sec: 342.7 +/- 0.7 (jitter = 1.6)    8.155
----------------------------------------------------------------
total images/sec: 342.37
----------------------------------------------------------------

FP16においてROCm2.3では300 images/sだったのが342 images/sまで高速化されているので13%ほど性能向上があったようです

Resnet152

Step    Img/sec total_loss
1   images/sec: 96.3 +/- 0.0 (jitter = 0.0) 9.006
10  images/sec: 96.5 +/- 0.1 (jitter = 0.3) 8.586
20  images/sec: 96.3 +/- 0.2 (jitter = 0.3) 8.583
30  images/sec: 96.3 +/- 0.1 (jitter = 0.3) 8.678
40  images/sec: 96.4 +/- 0.1 (jitter = 0.3) 8.629
50  images/sec: 96.3 +/- 0.1 (jitter = 0.3) 8.787
60  images/sec: 96.4 +/- 0.1 (jitter = 0.3) 8.666
70  images/sec: 96.4 +/- 0.1 (jitter = 0.3) 9.119
80  images/sec: 96.4 +/- 0.1 (jitter = 0.3) 8.899
90  images/sec: 96.4 +/- 0.1 (jitter = 0.3) 9.067
100 images/sec: 96.4 +/- 0.1 (jitter = 0.3) 8.864
----------------------------------------------------------------
total images/sec: 96.32
----------------------------------------------------------------

Resnet152 FP16

Step    Img/sec total_loss
1   images/sec: 137.7 +/- 0.0 (jitter = 0.0)    9.155
10  images/sec: 136.7 +/- 0.8 (jitter = 0.5)    8.995
20  images/sec: 136.6 +/- 0.5 (jitter = 0.6)    8.828
30  images/sec: 136.7 +/- 0.5 (jitter = 0.5)    8.796
40  images/sec: 136.9 +/- 0.4 (jitter = 0.6)    9.025
50  images/sec: 136.7 +/- 0.3 (jitter = 0.6)    8.669
60  images/sec: 136.7 +/- 0.3 (jitter = 0.6)    8.895
70  images/sec: 136.7 +/- 0.3 (jitter = 0.6)    8.936
80  images/sec: 136.9 +/- 0.3 (jitter = 0.7)    8.872
90  images/sec: 136.8 +/- 0.3 (jitter = 0.7)    8.823
100 images/sec: 136.9 +/- 0.2 (jitter = 0.7)    8.706
----------------------------------------------------------------
total images/sec: 136.86
----------------------------------------------------------------

alexNet

1   images/sec: 579.9 +/- 0.0 (jitter = 0.0)    nan
10  images/sec: 579.6 +/- 4.3 (jitter = 5.0)    nan
20  images/sec: 576.6 +/- 2.4 (jitter = 5.2)    nan
30  images/sec: 577.6 +/- 1.9 (jitter = 4.5)    nan
40  images/sec: 576.5 +/- 1.6 (jitter = 4.2)    nan
50  images/sec: 577.3 +/- 1.3 (jitter = 5.4)    nan
60  images/sec: 577.6 +/- 1.1 (jitter = 5.7)    nan
70  images/sec: 577.6 +/- 1.0 (jitter = 5.3)    nan
80  images/sec: 577.7 +/- 0.9 (jitter = 5.0)    nan
90  images/sec: 577.8 +/- 0.8 (jitter = 4.9)    nan
100 images/sec: 577.6 +/- 0.7 (jitter = 5.1)    nan
----------------------------------------------------------------
total images/sec: 576.46
----------------------------------------------------------------

alexnet FP16

Done warm up
Step    Img/sec total_loss
1   images/sec: 1347.0 +/- 0.0 (jitter = 0.0)   nan
10  images/sec: 1454.9 +/- 14.1 (jitter = 17.2) nan
20  images/sec: 1459.4 +/- 8.0 (jitter = 23.1)  nan
30  images/sec: 1462.4 +/- 6.0 (jitter = 29.0)  nan
40  images/sec: 1465.3 +/- 4.9 (jitter = 27.8)  nan
50  images/sec: 1466.2 +/- 4.0 (jitter = 24.9)  nan
60  images/sec: 1465.8 +/- 3.6 (jitter = 24.9)  nan
70  images/sec: 1467.5 +/- 3.4 (jitter = 25.3)  nan
80  images/sec: 1467.8 +/- 3.2 (jitter = 26.2)  nan
90  images/sec: 1467.9 +/- 3.1 (jitter = 28.4)  nan
100 images/sec: 1466.6 +/- 3.3 (jitter = 27.6)  nan
----------------------------------------------------------------
total images/sec: 1459.36
----------------------------------------------------------------

2.3がtotal images/sec: 1270だったのでちょっと性能が向上してますね

VGG16

Step    Img/sec total_loss
1   images/sec: 136.5 +/- 0.0 (jitter = 0.0)    7.327
10  images/sec: 136.0 +/- 0.1 (jitter = 0.5)    7.300
20  images/sec: 136.2 +/- 0.1 (jitter = 0.6)    7.299
30  images/sec: 136.1 +/- 0.1 (jitter = 0.7)    7.290
40  images/sec: 136.1 +/- 0.1 (jitter = 0.6)    7.249
50  images/sec: 136.0 +/- 0.1 (jitter = 0.5)    7.282
60  images/sec: 135.8 +/- 0.1 (jitter = 0.7)    7.259
70  images/sec: 135.8 +/- 0.1 (jitter = 0.6)    7.272
80  images/sec: 135.7 +/- 0.1 (jitter = 0.7)    7.271
90  images/sec: 135.7 +/- 0.1 (jitter = 0.6)    7.248
100 images/sec: 135.6 +/- 0.1 (jitter = 0.6)    7.293
----------------------------------------------------------------
total images/sec: 135.57
----------------------------------------------------------------

VGG16 FP16適用

Done warm up
Step    Img/sec total_loss
1   images/sec: 201.4 +/- 0.0 (jitter = 0.0)    7.278
10  images/sec: 200.7 +/- 0.2 (jitter = 0.5)    7.326
20  images/sec: 200.2 +/- 0.2 (jitter = 0.8)    7.273
30  images/sec: 200.3 +/- 0.1 (jitter = 0.8)    7.295
40  images/sec: 200.2 +/- 0.1 (jitter = 0.7)    7.258
50  images/sec: 200.3 +/- 0.1 (jitter = 0.7)    7.278
60  images/sec: 200.2 +/- 0.1 (jitter = 0.6)    7.251
70  images/sec: 200.2 +/- 0.1 (jitter = 0.7)    7.247
80  images/sec: 200.2 +/- 0.1 (jitter = 0.6)    7.303
90  images/sec: 200.1 +/- 0.1 (jitter = 0.7)    7.266
100 images/sec: 200.1 +/- 0.1 (jitter = 0.7)    7.267
----------------------------------------------------------------
total images/sec: 199.95
----------------------------------------------------------------
3
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
1