皆様「例のグラボ」を覚えておいででしょうか。冬の秋葉原に熱い一報を届けてくれた6500円の彼らを覚えてますか?
ROCm2.3がリリースされてVega20などの主要アーキテクチャではTensorflow-rocmにおいて各種モデルのスループット改善が確認されました。
https://qiita.com/_JG1WWK/items/355866c49cf867946b48
ROCm2.3がリリースされたのでVega20&10&RX570でベンチマークを取りました
これの例のグラボことRX470 8GBマイニングエディションでのROCm2.3+Tensorflow-rocmでの実機計測です
環境等は上記の記事とGPU以外はすべて同じです。
0になってる部分はベンチマーク完走ができなかった部分です。ROCmは素直にVega20を使うのがよさそうです
InceptionV3
Step Img/sec total_loss
1 images/sec: 24.9 +/- 0.0 (jitter = 0.0) 7.284
10 images/sec: 24.9 +/- 0.0 (jitter = 0.1) 7.310
20 images/sec: 24.9 +/- 0.0 (jitter = 0.1) 7.347
30 images/sec: 24.9 +/- 0.0 (jitter = 0.1) 7.299
40 images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.272
50 images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.294
60 images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.347
70 images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.298
80 images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.286
90 images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.300
100 images/sec: 24.8 +/- 0.0 (jitter = 0.1) 7.366
----------------------------------------------------------------
total images/sec: 24.77
----------------------------------------------------------------
InceptionV3 Fp16
Step Img/sec total_loss
1 images/sec: 19.0 +/- 0.0 (jitter = 0.0) 7.378
10 images/sec: 19.0 +/- 0.0 (jitter = 0.0) 7.201
20 images/sec: 19.0 +/- 0.0 (jitter = 0.1) 7.250
30 images/sec: 18.9 +/- 0.0 (jitter = 0.0) 7.347
40 images/sec: 18.9 +/- 0.0 (jitter = 0.1) 7.436
50 images/sec: 18.9 +/- 0.0 (jitter = 0.0) 7.257
60 images/sec: 18.9 +/- 0.0 (jitter = 0.0) 7.311
70 images/sec: 18.9 +/- 0.0 (jitter = 0.1) 7.317
80 images/sec: 18.9 +/- 0.0 (jitter = 0.1) 7.250
90 images/sec: 18.9 +/- 0.0 (jitter = 0.1) 7.390
100 images/sec: 18.9 +/- 0.0 (jitter = 0.1) 7.336
----------------------------------------------------------------
total images/sec: 18.95
----------------------------------------------------------------
Resnet50
1 images/sec: 47.1 +/- 0.0 (jitter = 0.0) 8.169
10 images/sec: 47.0 +/- 0.1 (jitter = 0.2) 7.593
20 images/sec: 47.0 +/- 0.0 (jitter = 0.1) 7.696
30 images/sec: 47.0 +/- 0.0 (jitter = 0.2) 7.753
40 images/sec: 46.9 +/- 0.0 (jitter = 0.2) 8.007
50 images/sec: 46.9 +/- 0.0 (jitter = 0.2) 7.520
60 images/sec: 46.9 +/- 0.0 (jitter = 0.2) 7.989
70 images/sec: 46.9 +/- 0.0 (jitter = 0.2) 8.028
80 images/sec: 46.9 +/- 0.0 (jitter = 0.2) 7.932
90 images/sec: 46.9 +/- 0.0 (jitter = 0.2) 7.852
100 images/sec: 46.9 +/- 0.0 (jitter = 0.2) 7.800
----------------------------------------------------------------
total images/sec: 46.88
----------------------------------------------------------------
Resnet50 FP16適用
InternalError (see above for traceback): cuDNN launch failure : input shape ([32,512,28,28])
[[node tower_0/v/cg/resnet_v13/conv11/batchnorm11/FusedBatchNormV2 (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:473) ]]
[[node main_fetch_group (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2083) ]]
ベンチマーク完走せず
Resnet152
1 images/sec: 17.8 +/- 0.0 (jitter = 0.0) 9.024
10 images/sec: 17.8 +/- 0.0 (jitter = 0.0) 8.551
20 images/sec: 17.8 +/- 0.0 (jitter = 0.0) 8.632
30 images/sec: 17.8 +/- 0.0 (jitter = 0.0) 8.703
40 images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.633
50 images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.813
60 images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.653
70 images/sec: 17.8 +/- 0.0 (jitter = 0.1) 9.110
80 images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.834
90 images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.981
100 images/sec: 17.8 +/- 0.0 (jitter = 0.1) 8.813
----------------------------------------------------------------
total images/sec: 17.78
----------------------------------------------------------------
Resnet152 FP16
InternalError (see above for traceback): cuDNN launch failure : input shape ([32,512,28,28])
[[node tower_0/v/cg/resnet_v13/conv11/batchnorm11/FusedBatchNormV2 (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:473) ]]
[[node average_loss/Mean (defined at /home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2907) ]]
gfx803系コアだとResnetをFP16で動かそうとすると落ちるみたいです
Alexnet
Step Img/sec total_loss
1 images/sec: 288.6 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 288.7 +/- 0.1 (jitter = 0.4) nan
20 images/sec: 288.6 +/- 0.1 (jitter = 0.3) nan
30 images/sec: 288.7 +/- 0.1 (jitter = 0.3) nan
40 images/sec: 288.7 +/- 0.1 (jitter = 0.3) nan
50 images/sec: 288.1 +/- 0.6 (jitter = 0.3) nan
60 images/sec: 288.1 +/- 0.5 (jitter = 0.3) nan
70 images/sec: 287.1 +/- 0.8 (jitter = 0.4) nan
80 images/sec: 286.5 +/- 0.8 (jitter = 0.4) nan
90 images/sec: 286.3 +/- 0.8 (jitter = 0.4) nan
100 images/sec: 285.6 +/- 0.9 (jitter = 0.5) nan
----------------------------------------------------------------
total images/sec: 285.33
----------------------------------------------------------------
Alexnet FP16
Step Img/sec total_loss
1 images/sec: 202.2 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 201.0 +/- 0.4 (jitter = 1.3) nan
20 images/sec: 200.6 +/- 0.3 (jitter = 1.6) nan
30 images/sec: 200.4 +/- 0.3 (jitter = 1.6) nan
40 images/sec: 200.5 +/- 0.2 (jitter = 1.7) nan
50 images/sec: 200.6 +/- 0.2 (jitter = 1.6) nan
60 images/sec: 200.6 +/- 0.2 (jitter = 1.6) nan
70 images/sec: 200.6 +/- 0.2 (jitter = 1.6) nan
80 images/sec: 200.7 +/- 0.2 (jitter = 1.6) nan
90 images/sec: 200.6 +/- 0.2 (jitter = 1.7) nan
100 images/sec: 200.6 +/- 0.1 (jitter = 1.7) nan
----------------------------------------------------------------
total images/sec: 200.44
----------------------------------------------------------------
VGG16
Thread 0x00007f6c0148b700 (most recent call first):
File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1407 in _call_tf_sessionrun
File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1319 in _run_fn
File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334 in _do_call
File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328 in _do_run
File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152 in _run
File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929 in run
File "/home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 840 in benchmark_one_step
File "/home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2401 in benchmark_with_session
File "/home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2265 in _benchmark_graph
File "/home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2056 in _benchmark_train
File "/home/rocm2/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1851 in run
File "./tf_cnn_benchmarks.py", line 68 in main
File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/absl/app.py", line 251 in _run_main
File "/home/rocm2/miniconda3/envs/rocm2/lib/python3.5/site-packages/absl/app.py", line 300 in run
File "./tf_cnn_benchmarks.py", line 72 in <module>
中止 (コアダンプ)
コアダンプで終了
vgg16 fP16
1 images/sec: 12.5 +/- 0.0 (jitter = 0.0) 7.231
10 images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.290
20 images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.274
30 images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.261
40 images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.272
50 images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.286
60 images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.225
70 images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.246
80 images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.269
90 images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.243
100 images/sec: 12.4 +/- 0.0 (jitter = 0.0) 7.289
----------------------------------------------------------------
total images/sec: 12.39
----------------------------------------------------------------