https://qiita.com/_JG1WWK/items/355866c49cf867946b48
コマンドはこれを準拠します
ROCm2.4がリリースされたのでベンチマークを試してみたいと思います
環境
minicondaとpipで環境構築を行いました。
pipのinstall内容は以下の通りです。
pip list
$ pip list
Package Version
-------------------- ---------
absl-py 0.7.1
asn1crypto 0.24.0
astor 0.7.1
attrdict 2.0.1
audioread 2.1.6
bcrypt 3.1.6
beautifulsoup4 4.7.1
bs4 0.0.1
certifi 2018.8.24
cffi 1.12.3
chardet 3.0.4
cryptography 2.6.1
cycler 0.10.0
decorator 4.4.0
deepspeech-gpu 0.4.1
ds-ctcdecoder 0.5.0a7
gast 0.2.2
grpcio 1.20.0
h5py 2.9.0
idna 2.8
joblib 0.13.2
Keras-Applications 1.0.7
Keras-Preprocessing 1.0.9
kiwisolver 1.1.0
librosa 0.6.3
llvmlite 0.28.0
Markdown 3.1
matplotlib 3.0.3
mock 2.0.0
numba 0.43.1
numpy 1.15.4
pandas 0.24.2
paramiko 2.4.2
pbr 5.1.3
pip 10.0.1
progressbar2 3.39.3
protobuf 3.7.1
pyasn1 0.4.5
pycparser 2.19
PyNaCl 1.3.0
pyparsing 2.4.0
python-dateutil 2.8.0
python-utils 2.3.0
pytz 2019.1
pyxdg 0.26
requests 2.21.0
resampy 0.2.1
scikit-learn 0.20.3
scipy 1.2.1
setuptools 40.2.0
six 1.12.0
SoundFile 0.10.2
soupsieve 1.9.1
sox 1.3.7
tensorboard 1.13.1
tensorflow-estimator 1.13.0
tensorflow-rocm 1.13.3
termcolor 1.1.0
urllib3 1.24.3
Werkzeug 0.15.2
wheel 0.31.1
ROCmのバージョンチェックです
$ apt show rocm-libs -a
Package: rocm-libs
Version: 2.4.25
Priority: optional
Section: devel
Maintainer: Advanced Micro Devices Inc.
Installed-Size: 13.3 kB
Depends: rocfft, rocrand, hipblas, rocblas
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 770 B
APT-Manual-Installed: yes
APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack
マシンの基本構成
CPU Xeon E5-2603 v4
MB msi-x99 Gaming7
RAM DDR4-2400 32GB
GPU0 NVIDIA GTX1080Ti (グラフィック表示用兼CUDA用)
GPU1 AMD Vega20 RadeonⅦ
OS Ubuntu16.04.6 LST kernel version 4.15
ROCm-tesorflowベンチマーク
ROCm2.4+tensorflow-rocm 1.13.3 で測定
グラフ
見づらくて申し訳ありませんがGPUとモデルごとのグラフです。
それなりにROCm2.4では性能が改善していることがわかります。
以下が乱雑ながら具体的な数値結果になります。グラフで大体のことはわかるとおもうので無理して見るほどのものではありません
InceptionV3
Step Img/sec total_loss
1 images/sec: 122.3 +/- 0.0 (jitter = 0.0) 7.285
10 images/sec: 122.3 +/- 0.1 (jitter = 0.4) 7.328
20 images/sec: 121.5 +/- 0.3 (jitter = 0.6) 7.370
30 images/sec: 120.8 +/- 0.3 (jitter = 2.5) 7.304
40 images/sec: 121.1 +/- 0.2 (jitter = 0.9) 7.263
50 images/sec: 121.0 +/- 0.2 (jitter = 0.9) 7.252
60 images/sec: 121.2 +/- 0.2 (jitter = 0.6) 7.350
70 images/sec: 121.3 +/- 0.2 (jitter = 0.5) 7.289
80 images/sec: 121.3 +/- 0.1 (jitter = 0.5) 7.304
90 images/sec: 121.4 +/- 0.1 (jitter = 0.5) 7.348
100 images/sec: 121.4 +/- 0.1 (jitter = 0.5) 7.356
----------------------------------------------------------------
total images/sec: 121.35
----------------------------------------------------------------
inceptionV3 FP16
Step Img/sec total_loss
1 images/sec: 151.5 +/- 0.0 (jitter = 0.0) 7.391
10 images/sec: 152.6 +/- 0.1 (jitter = 0.4) 7.192
20 images/sec: 152.4 +/- 0.1 (jitter = 0.5) 7.257
30 images/sec: 152.0 +/- 0.1 (jitter = 1.0) 7.344
40 images/sec: 152.0 +/- 0.1 (jitter = 1.0) 7.441
50 images/sec: 152.1 +/- 0.1 (jitter = 0.9) 7.271
60 images/sec: 152.1 +/- 0.1 (jitter = 0.8) 7.302
70 images/sec: 152.0 +/- 0.1 (jitter = 0.8) 7.294
80 images/sec: 152.0 +/- 0.1 (jitter = 0.8) 7.240
90 images/sec: 152.0 +/- 0.1 (jitter = 0.8) 7.392
100 images/sec: 152.0 +/- 0.1 (jitter = 0.7) 7.341
----------------------------------------------------------------
total images/sec: 151.91
----------------------------------------------------------------
Resnet50(RadeonⅦ)
1 images/sec: 241.5 +/- 0.0 (jitter = 0.0) 8.169
10 images/sec: 237.1 +/- 0.8 (jitter = 0.4) 7.593
20 images/sec: 242.7 +/- 1.4 (jitter = 6.9) 7.696
30 images/sec: 244.4 +/- 1.0 (jitter = 2.0) 7.753
40 images/sec: 245.4 +/- 0.8 (jitter = 1.7) 8.006
50 images/sec: 245.8 +/- 0.7 (jitter = 1.3) 7.520
60 images/sec: 246.2 +/- 0.6 (jitter = 1.2) 7.989
70 images/sec: 245.6 +/- 0.6 (jitter = 1.5) 8.027
80 images/sec: 244.5 +/- 0.6 (jitter = 1.9) 7.932
90 images/sec: 244.6 +/- 0.6 (jitter = 1.7) 7.850
100 images/sec: 245.0 +/- 0.5 (jitter = 1.4) 7.797
----------------------------------------------------------------
total images/sec: 244.83
----------------------------------------------------------------
Resnet50 FP16
Step Img/sec total_loss
1 images/sec: 345.4 +/- 0.0 (jitter = 0.0) 7.820
10 images/sec: 344.6 +/- 0.7 (jitter = 2.1) 8.168
20 images/sec: 344.5 +/- 0.4 (jitter = 1.5) 7.797
30 images/sec: 344.4 +/- 0.3 (jitter = 1.6) 7.880
40 images/sec: 343.4 +/- 0.8 (jitter = 1.4) 8.051
50 images/sec: 342.8 +/- 0.9 (jitter = 1.6) 7.962
60 images/sec: 343.0 +/- 0.8 (jitter = 1.4) 7.725
70 images/sec: 343.2 +/- 0.7 (jitter = 1.4) 7.865
80 images/sec: 343.0 +/- 0.7 (jitter = 1.5) 7.832
90 images/sec: 342.4 +/- 0.7 (jitter = 1.7) 7.728
100 images/sec: 342.7 +/- 0.7 (jitter = 1.6) 8.155
----------------------------------------------------------------
total images/sec: 342.37
----------------------------------------------------------------
FP16においてROCm2.3では300 images/sだったのが342 images/sまで高速化されているので13%ほど性能向上があったようです
Resnet152
Step Img/sec total_loss
1 images/sec: 96.3 +/- 0.0 (jitter = 0.0) 9.006
10 images/sec: 96.5 +/- 0.1 (jitter = 0.3) 8.586
20 images/sec: 96.3 +/- 0.2 (jitter = 0.3) 8.583
30 images/sec: 96.3 +/- 0.1 (jitter = 0.3) 8.678
40 images/sec: 96.4 +/- 0.1 (jitter = 0.3) 8.629
50 images/sec: 96.3 +/- 0.1 (jitter = 0.3) 8.787
60 images/sec: 96.4 +/- 0.1 (jitter = 0.3) 8.666
70 images/sec: 96.4 +/- 0.1 (jitter = 0.3) 9.119
80 images/sec: 96.4 +/- 0.1 (jitter = 0.3) 8.899
90 images/sec: 96.4 +/- 0.1 (jitter = 0.3) 9.067
100 images/sec: 96.4 +/- 0.1 (jitter = 0.3) 8.864
----------------------------------------------------------------
total images/sec: 96.32
----------------------------------------------------------------
Resnet152 FP16
Step Img/sec total_loss
1 images/sec: 137.7 +/- 0.0 (jitter = 0.0) 9.155
10 images/sec: 136.7 +/- 0.8 (jitter = 0.5) 8.995
20 images/sec: 136.6 +/- 0.5 (jitter = 0.6) 8.828
30 images/sec: 136.7 +/- 0.5 (jitter = 0.5) 8.796
40 images/sec: 136.9 +/- 0.4 (jitter = 0.6) 9.025
50 images/sec: 136.7 +/- 0.3 (jitter = 0.6) 8.669
60 images/sec: 136.7 +/- 0.3 (jitter = 0.6) 8.895
70 images/sec: 136.7 +/- 0.3 (jitter = 0.6) 8.936
80 images/sec: 136.9 +/- 0.3 (jitter = 0.7) 8.872
90 images/sec: 136.8 +/- 0.3 (jitter = 0.7) 8.823
100 images/sec: 136.9 +/- 0.2 (jitter = 0.7) 8.706
----------------------------------------------------------------
total images/sec: 136.86
----------------------------------------------------------------
alexNet
1 images/sec: 579.9 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 579.6 +/- 4.3 (jitter = 5.0) nan
20 images/sec: 576.6 +/- 2.4 (jitter = 5.2) nan
30 images/sec: 577.6 +/- 1.9 (jitter = 4.5) nan
40 images/sec: 576.5 +/- 1.6 (jitter = 4.2) nan
50 images/sec: 577.3 +/- 1.3 (jitter = 5.4) nan
60 images/sec: 577.6 +/- 1.1 (jitter = 5.7) nan
70 images/sec: 577.6 +/- 1.0 (jitter = 5.3) nan
80 images/sec: 577.7 +/- 0.9 (jitter = 5.0) nan
90 images/sec: 577.8 +/- 0.8 (jitter = 4.9) nan
100 images/sec: 577.6 +/- 0.7 (jitter = 5.1) nan
----------------------------------------------------------------
total images/sec: 576.46
----------------------------------------------------------------
alexnet FP16
Done warm up
Step Img/sec total_loss
1 images/sec: 1347.0 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 1454.9 +/- 14.1 (jitter = 17.2) nan
20 images/sec: 1459.4 +/- 8.0 (jitter = 23.1) nan
30 images/sec: 1462.4 +/- 6.0 (jitter = 29.0) nan
40 images/sec: 1465.3 +/- 4.9 (jitter = 27.8) nan
50 images/sec: 1466.2 +/- 4.0 (jitter = 24.9) nan
60 images/sec: 1465.8 +/- 3.6 (jitter = 24.9) nan
70 images/sec: 1467.5 +/- 3.4 (jitter = 25.3) nan
80 images/sec: 1467.8 +/- 3.2 (jitter = 26.2) nan
90 images/sec: 1467.9 +/- 3.1 (jitter = 28.4) nan
100 images/sec: 1466.6 +/- 3.3 (jitter = 27.6) nan
----------------------------------------------------------------
total images/sec: 1459.36
----------------------------------------------------------------
2.3がtotal images/sec: 1270だったのでちょっと性能が向上してますね
VGG16
Step Img/sec total_loss
1 images/sec: 136.5 +/- 0.0 (jitter = 0.0) 7.327
10 images/sec: 136.0 +/- 0.1 (jitter = 0.5) 7.300
20 images/sec: 136.2 +/- 0.1 (jitter = 0.6) 7.299
30 images/sec: 136.1 +/- 0.1 (jitter = 0.7) 7.290
40 images/sec: 136.1 +/- 0.1 (jitter = 0.6) 7.249
50 images/sec: 136.0 +/- 0.1 (jitter = 0.5) 7.282
60 images/sec: 135.8 +/- 0.1 (jitter = 0.7) 7.259
70 images/sec: 135.8 +/- 0.1 (jitter = 0.6) 7.272
80 images/sec: 135.7 +/- 0.1 (jitter = 0.7) 7.271
90 images/sec: 135.7 +/- 0.1 (jitter = 0.6) 7.248
100 images/sec: 135.6 +/- 0.1 (jitter = 0.6) 7.293
----------------------------------------------------------------
total images/sec: 135.57
----------------------------------------------------------------
VGG16 FP16適用
Done warm up
Step Img/sec total_loss
1 images/sec: 201.4 +/- 0.0 (jitter = 0.0) 7.278
10 images/sec: 200.7 +/- 0.2 (jitter = 0.5) 7.326
20 images/sec: 200.2 +/- 0.2 (jitter = 0.8) 7.273
30 images/sec: 200.3 +/- 0.1 (jitter = 0.8) 7.295
40 images/sec: 200.2 +/- 0.1 (jitter = 0.7) 7.258
50 images/sec: 200.3 +/- 0.1 (jitter = 0.7) 7.278
60 images/sec: 200.2 +/- 0.1 (jitter = 0.6) 7.251
70 images/sec: 200.2 +/- 0.1 (jitter = 0.7) 7.247
80 images/sec: 200.2 +/- 0.1 (jitter = 0.6) 7.303
90 images/sec: 200.1 +/- 0.1 (jitter = 0.7) 7.266
100 images/sec: 200.1 +/- 0.1 (jitter = 0.7) 7.267
----------------------------------------------------------------
total images/sec: 199.95
----------------------------------------------------------------