ROCm2.4+tensorflow2.0のαリリース(lpha0-config-v2)Docker コンテナイメージを動作検証

どうやらDocker imageとしてですがROCmでもtensorflow2.0動くみたいなので軽く動かしてみます。


rocm2.4-tf2.0-alpha0-config-v2 と言うtagでdocker pullすればimageをpullできそうです。



docker image pull

  sudo docker pull rocm/tensorflow:rocm2.4-tf2.0-alpha0-config-v2

docker imag listはこんな感じになるはずです

REPOSITORY          TAG                              IMAGE ID            CREATED             SIZE
rocm/tensorflow     rocm2.4-tf2.0-alpha0-config-v2   b1746f7b7f09        2 weeks ago         6.53GB

次にDocker runさせます

$ sudo docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/tensorflow:rocm2.4-tf2.0-alpha0-config-v2


root@a0662b5acae0:/root# ls
bazel-0.19.2-installer-linux-x86_64.sh  benchmarks  convnet-benchmarks  models  tensorflow


 cd /root/tensorflow

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.19.2 installed.
Please specify the location of python. [Default is /usr/bin/python]: 

Found possible Python library paths:
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: y
ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: n
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=haswell -Wno-sign-compare]: 

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
    --config=mkl            # Build with MKL support.
    --config=monolithic     # Config for mostly static monolithic build.
    --config=gdr            # Build with GDR support.
    --config=verbs          # Build with libverbs support.
    --config=ngraph         # Build with Intel nGraph support.
    --config=numa           # Build with NUMA support.
    --config=dynamic_kernels    # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
    --config=noaws          # Disable AWS S3 filesystem support.
    --config=nogcp          # Disable GCP support.
    --config=nohdfs         # Disable HDFS support.
    --config=noignite       # Disable Apache Ignite support.
    --config=nokafka        # Disable Apache Kafka support.
    --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished
/root/tensorflow# ./build_rocm_python3 


念の為pip3 listを確認します。

 pip3 list
Package              Version               
-------------------- ----------------------
absl-py              0.7.1                 
astor                0.7.1                 
attrs                19.1.0                
backcall             0.1.0                 
bleach               3.1.0                 
chardet              2.3.0                 
decorator            4.4.0                 
defusedxml           0.6.0                 
entrypoints          0.3                   
enum34               1.1.6                 
gast                 0.2.2                 
google-pasta         0.1.6                 
grpcio               1.20.1                
h5py                 2.9.0                 
ipykernel            5.1.0                 
ipython              7.5.0                 
ipython-genutils     0.2.0                 
ipywidgets           7.4.2                 
jedi                 0.13.3                
Jinja2               2.10.1                
jsonschema           3.0.1                 
jupyter              1.0.0                 
jupyter-client       5.2.4                 
jupyter-console      6.0.0                 
jupyter-core         4.4.0                 
Keras-Applications   1.0.7                 
Keras-Preprocessing  1.0.9                 
Markdown             3.1                   
MarkupSafe           1.1.1                 
mistune              0.8.4                 
mock                 3.0.5                 
nbconvert            5.5.0                 
nbformat             4.4.0                 
notebook             5.7.8                 
numpy                1.16.3                
pandocfilters        1.4.2                 
parso                0.4.0                 
pexpect              4.7.0                 
pickleshare          0.7.5                 
pip                  19.1.1                
prometheus-client    0.6.0                 
prompt-toolkit       2.0.9                 
protobuf             3.7.1                 
ptyprocess           0.6.0                 
pycurl               7.43.0                
Pygments             2.4.0                 
pygobject            3.20.0                
pyrsistent           0.15.2                
python-apt           1.1.0b1+ubuntu0.16.4.2
python-dateutil      2.8.0                 
pyzmq                18.0.1                
qtconsole            4.4.4                 
requests             2.9.1                 
Send2Trash           1.5.0                 
setuptools           41.0.1                
six                  1.12.0                
ssh-import-id        5.5                   
tb-nightly           1.14.0a20190301       
tensorflow           2.0.0a0               
termcolor            1.1.0                 
terminado            0.8.2                 
testpath             0.4.2                 
tf-estimator-nightly 1.14.0.dev2019030115  
tornado              6.0.2                 
traitlets            4.3.2                 
unattended-upgrades  0.1                   
urllib3              1.13.1                
wcwidth              0.1.7                 
webencodings         0.5.1                 
Werkzeug             0.15.2                
wheel                0.29.0                
widgetsnbextension   3.4.2  



# apt show rocm-libs -a
Package: rocm-libs
Version: 2.4.25
Status: install ok installed
Priority: optional
Section: devel
Maintainer: Advanced Micro Devices Inc.
Installed-Size: 13.3 kB
Depends: rocfft, rocrand, hipblas, rocblas
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: unknown
APT-Manual-Installed: yes
APT-Sources: /var/lib/dpkg/status
Description: Radeon Open Compute (ROCm) Runtime software stack


図1.0 ベンチマーク
(TF1.13.3)ROCm2.3+(TF1.13.3)ROCm2.4(RadeonⅦ)+(ROCm2.4+TF2.0 RadeonⅦ).png



python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model inception3 --batch_size 32

Step    Img/sec total_loss
1   images/sec: 119.8 +/- 0.0 (jitter = 0.0)    7.415
10  images/sec: 119.7 +/- 0.1 (jitter = 0.2)    7.394
20  images/sec: 119.4 +/- 0.2 (jitter = 0.3)    7.324
30  images/sec: 119.0 +/- 0.3 (jitter = 0.4)    7.487
40  images/sec: 119.0 +/- 0.2 (jitter = 0.4)    7.353
50  images/sec: 119.1 +/- 0.2 (jitter = 0.4)    7.369
60  images/sec: 119.0 +/- 0.2 (jitter = 0.4)    7.433
70  images/sec: 118.9 +/- 0.2 (jitter = 0.4)    7.317
80  images/sec: 119.0 +/- 0.2 (jitter = 0.5)    7.357
90  images/sec: 118.9 +/- 0.2 (jitter = 0.5)    7.485
100 images/sec: 118.9 +/- 0.1 (jitter = 0.5)    7.431
total images/sec: 118.86


#python3  ./tf_cnn_benchmarks.py --num_gpus=1  --model inception3 --batch_size 32 --use_fp16
Step    Img/sec total_loss
1   images/sec: 148.5 +/- 0.0 (jitter = 0.0)    7.388
10  images/sec: 148.7 +/- 0.3 (jitter = 0.9)    7.390
20  images/sec: 148.4 +/- 0.2 (jitter = 0.6)    7.454
30  images/sec: 148.4 +/- 0.2 (jitter = 0.7)    7.359
40  images/sec: 148.4 +/- 0.1 (jitter = 0.7)    7.338
50  images/sec: 147.9 +/- 0.3 (jitter = 0.8)    7.387
60  images/sec: 148.0 +/- 0.2 (jitter = 0.9)    7.356
70  images/sec: 148.0 +/- 0.2 (jitter = 0.9)    7.441
80  images/sec: 147.9 +/- 0.2 (jitter = 0.9)    7.423
90  images/sec: 147.9 +/- 0.2 (jitter = 0.9)    7.334
100 images/sec: 147.8 +/- 0.2 (jitter = 0.9)    7.308
total images/sec: 147.70


 python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32
1   images/sec: 230.0 +/- 0.0 (jitter = 0.0)    8.458
10  images/sec: 232.5 +/- 0.7 (jitter = 2.8)    7.997
20  images/sec: 233.4 +/- 0.4 (jitter = 1.5)    8.260
30  images/sec: 233.4 +/- 0.3 (jitter = 1.2)    8.337
40  images/sec: 233.3 +/- 0.2 (jitter = 1.4)    8.197
50  images/sec: 232.3 +/- 0.5 (jitter = 1.6)    7.759
60  images/sec: 231.9 +/- 0.5 (jitter = 1.6)    8.059
70  images/sec: 231.9 +/- 0.5 (jitter = 1.6)    8.481
80  images/sec: 231.9 +/- 0.5 (jitter = 1.5)    8.279
90  images/sec: 232.1 +/- 0.4 (jitter = 1.7)    8.019
100 images/sec: 231.8 +/- 0.4 (jitter = 1.8)    8.009
total images/sec: 231.63


python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16
1   images/sec: 325.0 +/- 0.0 (jitter = 0.0)    7.979
10  images/sec: 315.0 +/- 3.2 (jitter = 7.0)    8.049
20  images/sec: 315.9 +/- 1.9 (jitter = 6.6)    8.331
30  images/sec: 316.0 +/- 1.5 (jitter = 7.4)    8.063
40  images/sec: 313.8 +/- 1.7 (jitter = 7.5)    8.678
50  images/sec: 314.4 +/- 1.3 (jitter = 5.4)    8.290
60  images/sec: 314.7 +/- 1.1 (jitter = 4.9)    8.344
70  images/sec: 315.0 +/- 1.0 (jitter = 3.4)    8.160
80  images/sec: 315.2 +/- 0.9 (jitter = 3.3)    8.145
90  images/sec: 315.0 +/- 0.9 (jitter = 3.2)    8.418
100 images/sec: 315.2 +/- 0.8 (jitter = 2.9)    8.296
total images/sec: 314.90


 python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16
1   images/sec: 92.9 +/- 0.0 (jitter = 0.0) 9.936
10  images/sec: 91.8 +/- 0.8 (jitter = 0.4) 9.680
20  images/sec: 91.3 +/- 0.6 (jitter = 0.8) 9.762
30  images/sec: 91.4 +/- 0.4 (jitter = 0.8) 9.945
40  images/sec: 91.7 +/- 0.3 (jitter = 0.7) 9.962
50  images/sec: 91.9 +/- 0.3 (jitter = 0.6) 9.992
60  images/sec: 92.1 +/- 0.2 (jitter = 0.5) 10.279
70  images/sec: 92.1 +/- 0.2 (jitter = 0.5) 9.990
80  images/sec: 92.1 +/- 0.2 (jitter = 0.5) 9.969
90  images/sec: 92.1 +/- 0.2 (jitter = 0.5) 10.196
100 images/sec: 92.0 +/- 0.2 (jitter = 0.6) 10.034
total images/sec: 91.93


python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32 --use_fp16
1   images/sec: 110.6 +/- 0.0 (jitter = 0.0)    10.107
10  images/sec: 118.1 +/- 1.6 (jitter = 2.0)    9.862
20  images/sec: 119.7 +/- 0.9 (jitter = 1.4)    9.758
30  images/sec: 120.1 +/- 0.7 (jitter = 1.2)    10.020
40  images/sec: 120.3 +/- 0.6 (jitter = 1.2)    9.834
50  images/sec: 120.2 +/- 0.5 (jitter = 1.1)    9.973
60  images/sec: 120.4 +/- 0.5 (jitter = 1.2)    9.637
70  images/sec: 120.4 +/- 0.4 (jitter = 1.2)    9.880
80  images/sec: 120.2 +/- 0.4 (jitter = 1.2)    9.619
90  images/sec: 120.4 +/- 0.4 (jitter = 1.0)    10.036
100 images/sec: 120.3 +/- 0.4 (jitter = 0.9)    10.061
total images/sec: 120.28


 python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model  alexnet --batch_size 32
1   images/sec: 517.4 +/- 0.0 (jitter = 0.0)    7.205
10  images/sec: 523.8 +/- 2.2 (jitter = 2.4)    7.205
20  images/sec: 526.5 +/- 2.3 (jitter = 4.4)    7.205
30  images/sec: 526.3 +/- 1.7 (jitter = 7.1)    7.205
40  images/sec: 527.3 +/- 1.6 (jitter = 7.1)    7.205
50  images/sec: 526.7 +/- 1.3 (jitter = 6.3)    7.205
60  images/sec: 527.5 +/- 1.3 (jitter = 6.5)    7.205
70  images/sec: 526.4 +/- 1.2 (jitter = 6.9)    7.205
80  images/sec: 526.5 +/- 1.1 (jitter = 6.9)    7.205
90  images/sec: 526.2 +/- 1.0 (jitter = 6.2)    7.205
100 images/sec: 526.3 +/- 0.9 (jitter = 6.6)    7.205
total images/sec: 525.37


python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model  alexnet --batch_size 32 --use_fp16
1   images/sec: 1250.9 +/- 0.0 (jitter = 0.0)   nan
10  images/sec: 1268.1 +/- 7.5 (jitter = 28.0)  nan
20  images/sec: 1264.7 +/- 4.8 (jitter = 28.3)  nan
30  images/sec: 1261.6 +/- 6.0 (jitter = 25.2)  nan
40  images/sec: 1265.2 +/- 4.8 (jitter = 22.8)  nan
50  images/sec: 1265.3 +/- 4.2 (jitter = 22.8)  nan
60  images/sec: 1259.4 +/- 5.4 (jitter = 26.1)  nan
70  images/sec: 1255.0 +/- 5.6 (jitter = 26.0)  nan
80  images/sec: 1256.7 +/- 5.0 (jitter = 25.3)  nan
90  images/sec: 1257.6 +/- 4.5 (jitter = 25.8)  nan
100 images/sec: 1258.3 +/- 4.1 (jitter = 23.9)  nan
total images/sec: 1252.77


 python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32
1   images/sec: 130.0 +/- 0.0 (jitter = 0.0)    7.289
10  images/sec: 130.3 +/- 0.1 (jitter = 0.5)    7.275
20  images/sec: 130.3 +/- 0.1 (jitter = 0.5)    7.215
30  images/sec: 130.4 +/- 0.1 (jitter = 0.5)    7.293
40  images/sec: 130.3 +/- 0.1 (jitter = 0.5)    7.249
50  images/sec: 130.3 +/- 0.1 (jitter = 0.5)    7.318
60  images/sec: 130.3 +/- 0.0 (jitter = 0.5)    7.272
70  images/sec: 130.3 +/- 0.0 (jitter = 0.4)    7.250
80  images/sec: 130.3 +/- 0.0 (jitter = 0.5)    7.264
90  images/sec: 130.2 +/- 0.0 (jitter = 0.5)    7.238
100 images/sec: 130.2 +/- 0.0 (jitter = 0.5)    7.240
total images/sec: 130.13


python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32 --use_fp16

1   images/sec: 192.6 +/- 0.0 (jitter = 0.0)    7.281
10  images/sec: 193.0 +/- 0.2 (jitter = 0.4)    7.308
20  images/sec: 193.2 +/- 0.2 (jitter = 1.0)    7.232
30  images/sec: 193.1 +/- 0.1 (jitter = 0.8)    7.287
40  images/sec: 193.0 +/- 0.1 (jitter = 0.6)    7.295
50  images/sec: 193.0 +/- 0.1 (jitter = 0.6)    7.265
60  images/sec: 193.0 +/- 0.1 (jitter = 0.6)    7.253
70  images/sec: 192.9 +/- 0.1 (jitter = 0.6)    7.264
80  images/sec: 192.9 +/- 0.1 (jitter = 0.5)    7.269
90  images/sec: 192.8 +/- 0.1 (jitter = 0.5)    7.270
100 images/sec: 192.8 +/- 0.1 (jitter = 0.5)    7.249
total images/sec: 192.66


ROCm 1.8.2 で TensorFlow upstream(1.9.0-rc0)をコンパイルする試み(CIFAR10 11500 examples/sec @ VEGA56 80W. 2018 年 7 月 22 日時点)

syoyoさんありがとうございました とても参考になりました。


