LoginSignup
4
3

More than 5 years have passed since last update.

例のグラボのTensorflow-rocm実行性能の検証テスト

Posted at

皆様「例のグラボ」を覚えておいででしょうか。冬の秋葉原に熱い一報を届けてくれた6500円の彼らを覚えてますか?
ROCm2.3がリリースされてVega20などの主要アーキテクチャではTensorflow-rocmにおいて各種モデルのスループット改善が確認されました。

https://qiita.com/_JG1WWK/items/355866c49cf867946b48
ROCm2.3がリリースされたのでVega20&10&RX570でベンチマークを取りました

これの例のグラボことRX470 8GBマイニングエディションでのROCm2.3+Tensorflow-rocmでの実機計測です
環境等は上記の記事とGPU以外はすべて同じです。

Tensorflowr-rocm.png
0になってる部分はベンチマーク完走ができなかった部分です。ROCmは素直にVega20を使うのがよさそうです

InceptionV3

Step    Img/sec total_loss
1   images/sec: 24.9 +/- 0.0 (jitter = 0.0) 7.284
10  images/sec: 24.9 +/- 0.0 (jitter = 0.1) 7.310
20  images/sec: 24.9 +/- 0.0 (jitter = 0.1) 7.347
30  images/sec: 24.9 +/- 0.0 (jitter = 0.1) 7.299
40  images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.272
50  images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.294
60  images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.347
70  images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.298
80  images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.286
90  images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.300
100 images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.366
----------------------------------------------------------------
total images/sec: 24.77
----------------------------------------------------------------

InceptionV3 Fp16

Step    Img/sec total_loss
1   images/sec: 19.0 +/- 0.0 (jitter = 0.0) 7.378
10  images/sec: 19.0 +/- 0.0 (jitter = 0.0) 7.201
20  images/sec: 19.0 +/- 0.0 (jitter = 0.1) 7.250
30  images/sec: 18.9 +/- 0.0 (jitter = 0.0) 7.347
40  images/sec: 18.9 +/- 0.0 (jitter = 0.1) 7.436
50  images/sec: 18.9 +/- 0.0 (jitter = 0.0) 7.257
60  images/sec: 18.9 +/- 0.0 (jitter = 0.0) 7.311
70  images/sec: 18.9 +/- 0.0 (jitter = 0.1) 7.317
80  images/sec: 18.9 +/- 0.0 (jitter = 0.1) 7.250
90  images/sec: 18.9 +/- 0.0 (jitter = 0.1) 7.390
100 images/sec: 18.9 +/- 0.0 (jitter = 0.1) 7.336
----------------------------------------------------------------
total images/sec: 18.95
----------------------------------------------------------------

Resnet50

1   images/sec: 47.1 +/- 0.0 (jitter = 0.0) 8.169
10  images/sec: 47.0 +/- 0.1 (jitter = 0.2) 7.593
20  images/sec: 47.0 +/- 0.0 (jitter = 0.1) 7.696
30  images/sec: 47.0 +/- 0.0 (jitter = 0.2) 7.753
40  images/sec: 46.9 +/- 0.0 (jitter = 0.2) 8.007
50  images/sec: 46.9 +/- 0.0 (jitter = 0.2) 7.520
60  images/sec: 46.9 +/- 0.0 (jitter = 0.2) 7.989
70  images/sec: 46.9 +/- 0.0 (jitter = 0.2) 8.028
80  images/sec: 46.9 +/- 0.0 (jitter = 0.2) 7.932
90  images/sec: 46.9 +/- 0.0 (jitter = 0.2) 7.852
100 images/sec: 46.9 +/- 0.0 (jitter = 0.2) 7.800
----------------------------------------------------------------
total images/sec: 46.88
----------------------------------------------------------------

Resnet50 FP16適用

InternalError (see above for traceback): cuDNN launch failure : input shape ([32,512,28,28])
     [[node tower_0/v/cg/resnet_v13/conv11/batchnorm11/FusedBatchNormV2 (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:473) ]]
     [[node main_fetch_group (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2083) ]]

ベンチマーク完走せず

Resnet152

1   images/sec: 17.8 +/- 0.0 (jitter = 0.0) 9.024
10  images/sec: 17.8 +/- 0.0 (jitter = 0.0) 8.551
20  images/sec: 17.8 +/- 0.0 (jitter = 0.0) 8.632
30  images/sec: 17.8 +/- 0.0 (jitter = 0.0) 8.703
40  images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.633
50  images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.813
60  images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.653
70  images/sec: 17.8 +/- 0.0 (jitter = 0.1) 9.110
80  images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.834
90  images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.981
100 images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.813
----------------------------------------------------------------
total images/sec: 17.78
----------------------------------------------------------------

Resnet152 FP16

InternalError (see above for traceback): cuDNN launch failure : input shape ([32,512,28,28])
     [[node tower_0/v/cg/resnet_v13/conv11/batchnorm11/FusedBatchNormV2 (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:473) ]]
     [[node average_loss/Mean (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2907) ]]

gfx803系コアだとResnetをFP16で動かそうとすると落ちるみたいです

Alexnet

Step    Img/sec total_loss
1   images/sec: 288.6 +/- 0.0 (jitter = 0.0)    nan
10  images/sec: 288.7 +/- 0.1 (jitter = 0.4)    nan
20  images/sec: 288.6 +/- 0.1 (jitter = 0.3)    nan
30  images/sec: 288.7 +/- 0.1 (jitter = 0.3)    nan
40  images/sec: 288.7 +/- 0.1 (jitter = 0.3)    nan
50  images/sec: 288.1 +/- 0.6 (jitter = 0.3)    nan
60  images/sec: 288.1 +/- 0.5 (jitter = 0.3)    nan
70  images/sec: 287.1 +/- 0.8 (jitter = 0.4)    nan
80  images/sec: 286.5 +/- 0.8 (jitter = 0.4)    nan
90  images/sec: 286.3 +/- 0.8 (jitter = 0.4)    nan
100 images/sec: 285.6 +/- 0.9 (jitter = 0.5)    nan
----------------------------------------------------------------
total images/sec: 285.33
----------------------------------------------------------------

Alexnet FP16

Step    Img/sec total_loss
1   images/sec: 202.2 +/- 0.0 (jitter = 0.0)    nan
10  images/sec: 201.0 +/- 0.4 (jitter = 1.3)    nan
20  images/sec: 200.6 +/- 0.3 (jitter = 1.6)    nan
30  images/sec: 200.4 +/- 0.3 (jitter = 1.6)    nan
40  images/sec: 200.5 +/- 0.2 (jitter = 1.7)    nan
50  images/sec: 200.6 +/- 0.2 (jitter = 1.6)    nan
60  images/sec: 200.6 +/- 0.2 (jitter = 1.6)    nan
70  images/sec: 200.6 +/- 0.2 (jitter = 1.6)    nan
80  images/sec: 200.7 +/- 0.2 (jitter = 1.6)    nan
90  images/sec: 200.6 +/- 0.2 (jitter = 1.7)    nan
100 images/sec: 200.6 +/- 0.1 (jitter = 1.7)    nan
----------------------------------------------------------------
total images/sec: 200.44
----------------------------------------------------------------

VGG16

Thread 0x00007f6c0148b700 (most recent call first):
  File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1407 in _call_tf_sessionrun
  File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1319 in _run_fn
  File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334 in _do_call
  File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328 in _do_run
  File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152 in _run
  File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929 in run
  File "/home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 840 in benchmark_one_step
  File "/home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2401 in benchmark_with_session
  File "/home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2265 in _benchmark_graph
  File "/home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2056 in _benchmark_train
  File "/home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1851 in run
  File "./tf_cnn_benchmarks.py", line 68 in main
  File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/absl/app.py", line 251 in _run_main
  File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/absl/app.py", line 300 in run
  File "./tf_cnn_benchmarks.py", line 72 in <module>
中止 (コアダンプ)

コアダンプで終了

vgg16 fP16

1   images/sec: 12.5 +/- 0.0 (jitter = 0.0) 7.231
10  images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.290
20  images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.274
30  images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.261
40  images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.272
50  images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.286
60  images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.225
70  images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.246
80  images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.269
90  images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.243
100 images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.289
----------------------------------------------------------------
total images/sec: 12.39
----------------------------------------------------------------
4
3
2

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
3