LoginSignup
0
0

More than 3 years have passed since last update.

ROCm2.4+tensorflow2.0のαリリース(lpha0-config-v2)Docker コンテナイメージを動作検証

Posted at

概要

https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/362
どうやらDocker imageとしてですがROCmでもtensorflow2.0動くみたいなので軽く動かしてみます。

https://hub.docker.com/u/rocm
https://hub.docker.com/r/rocm/tensorflow/tags

rocm2.4-tf2.0-alpha0-config-v2 と言うtagでdocker pullすればimageをpullできそうです。

環境構築

docker-ceのインストールはhttps://qiita.com/tkyonezu/items/0f6da57eb2d823d2611d
を参考にさせていただきました。

docker image pull

  sudo docker pull rocm/tensorflow:rocm2.4-tf2.0-alpha0-config-v2

docker imag listはこんな感じになるはずです

REPOSITORY          TAG                              IMAGE ID            CREATED             SIZE
rocm/tensorflow     rocm2.4-tf2.0-alpha0-config-v2   b1746f7b7f09        2 weeks ago         6.53GB

次にDocker runさせます

$ sudo docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/tensorflow:rocm2.4-tf2.0-alpha0-config-v2

コンテナの中の/rootディレクトリは多分こんな感じにになってるはずです

root@a0662b5acae0:/root# ls
bazel-0.19.2-installer-linux-x86_64.sh  benchmarks  convnet-benchmarks  models  tensorflow

これからTensorflow-rocm2.0のビルドをします

 cd /root/tensorflow
./configure

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.19.2 installed.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: y
ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: n
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=haswell -Wno-sign-compare]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
    --config=mkl            # Build with MKL support.
    --config=monolithic     # Config for mostly static monolithic build.
    --config=gdr            # Build with GDR support.
    --config=verbs          # Build with libverbs support.
    --config=ngraph         # Build with Intel nGraph support.
    --config=numa           # Build with NUMA support.
    --config=dynamic_kernels    # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
    --config=noaws          # Disable AWS S3 filesystem support.
    --config=nogcp          # Disable GCP support.
    --config=nohdfs         # Disable HDFS support.
    --config=noignite       # Disable Apache Ignite support.
    --config=nokafka        # Disable Apache Kafka support.
    --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished
/root/tensorflow# ./build_rocm_python3 

普通に時間かかるのでしばらくお待ちください

念の為pip3 listを確認します。

 pip3 list
Package              Version               
-------------------- ----------------------
absl-py              0.7.1                 
astor                0.7.1                 
attrs                19.1.0                
backcall             0.1.0                 
bleach               3.1.0                 
chardet              2.3.0                 
decorator            4.4.0                 
defusedxml           0.6.0                 
entrypoints          0.3                   
enum34               1.1.6                 
gast                 0.2.2                 
google-pasta         0.1.6                 
grpcio               1.20.1                
h5py                 2.9.0                 
ipykernel            5.1.0                 
ipython              7.5.0                 
ipython-genutils     0.2.0                 
ipywidgets           7.4.2                 
jedi                 0.13.3                
Jinja2               2.10.1                
jsonschema           3.0.1                 
jupyter              1.0.0                 
jupyter-client       5.2.4                 
jupyter-console      6.0.0                 
jupyter-core         4.4.0                 
Keras-Applications   1.0.7                 
Keras-Preprocessing  1.0.9                 
Markdown             3.1                   
MarkupSafe           1.1.1                 
mistune              0.8.4                 
mock                 3.0.5                 
nbconvert            5.5.0                 
nbformat             4.4.0                 
notebook             5.7.8                 
numpy                1.16.3                
pandocfilters        1.4.2                 
parso                0.4.0                 
pexpect              4.7.0                 
pickleshare          0.7.5                 
pip                  19.1.1                
prometheus-client    0.6.0                 
prompt-toolkit       2.0.9                 
protobuf             3.7.1                 
ptyprocess           0.6.0                 
pycurl               7.43.0                
Pygments             2.4.0                 
pygobject            3.20.0                
pyrsistent           0.15.2                
python-apt           1.1.0b1+ubuntu0.16.4.2
python-dateutil      2.8.0                 
pyzmq                18.0.1                
qtconsole            4.4.4                 
requests             2.9.1                 
Send2Trash           1.5.0                 
setuptools           41.0.1                
six                  1.12.0                
ssh-import-id        5.5                   
tb-nightly           1.14.0a20190301       
tensorflow           2.0.0a0               
termcolor            1.1.0                 
terminado            0.8.2                 
testpath             0.4.2                 
tf-estimator-nightly 1.14.0.dev2019030115  
tornado              6.0.2                 
traitlets            4.3.2                 
unattended-upgrades  0.1                   
urllib3              1.13.1                
wcwidth              0.1.7                 
webencodings         0.5.1                 
Werkzeug             0.15.2                
wheel                0.29.0                
widgetsnbextension   3.4.2  

tensorflow2.0.0a0が入ったことがわかります

rocmバージョンは

# apt show rocm-libs -a
Package: rocm-libs
Version: 2.4.25
Status: install ok installed
Priority: optional
Section: devel
Maintainer: Advanced Micro Devices Inc.
Installed-Size: 13.3 kB
Depends: rocfft, rocrand, hipblas, rocblas
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: unknown
APT-Manual-Installed: yes
APT-Sources: /var/lib/dpkg/status
Description: Radeon Open Compute (ROCm) Runtime software stack

ベンチマーク

図1.0 ベンチマーク
(TF1.13.3)ROCm2.3+(TF1.13.3)ROCm2.4(RadeonⅦ)+(ROCm2.4+TF2.0 RadeonⅦ).png

ベンチマークの総評としてはTF2.0を使ったからと言って特段性能が高い感じはしませんでした。一々ビルドする手間を考えると普通に1.13.3を使えばいいのではないでしょうか。

inception3

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model inception3 --batch_size 32

Step    Img/sec total_loss
1   images/sec: 119.8 +/- 0.0 (jitter = 0.0)    7.415
10  images/sec: 119.7 +/- 0.1 (jitter = 0.2)    7.394
20  images/sec: 119.4 +/- 0.2 (jitter = 0.3)    7.324
30  images/sec: 119.0 +/- 0.3 (jitter = 0.4)    7.487
40  images/sec: 119.0 +/- 0.2 (jitter = 0.4)    7.353
50  images/sec: 119.1 +/- 0.2 (jitter = 0.4)    7.369
60  images/sec: 119.0 +/- 0.2 (jitter = 0.4)    7.433
70  images/sec: 118.9 +/- 0.2 (jitter = 0.4)    7.317
80  images/sec: 119.0 +/- 0.2 (jitter = 0.5)    7.357
90  images/sec: 118.9 +/- 0.2 (jitter = 0.5)    7.485
100 images/sec: 118.9 +/- 0.1 (jitter = 0.5)    7.431
----------------------------------------------------------------
total images/sec: 118.86
----------------------------------------------------------------

FP16

#python3  ./tf_cnn_benchmarks.py --num_gpus=1  --model inception3 --batch_size 32 --use_fp16
Step    Img/sec total_loss
1   images/sec: 148.5 +/- 0.0 (jitter = 0.0)    7.388
10  images/sec: 148.7 +/- 0.3 (jitter = 0.9)    7.390
20  images/sec: 148.4 +/- 0.2 (jitter = 0.6)    7.454
30  images/sec: 148.4 +/- 0.2 (jitter = 0.7)    7.359
40  images/sec: 148.4 +/- 0.1 (jitter = 0.7)    7.338
50  images/sec: 147.9 +/- 0.3 (jitter = 0.8)    7.387
60  images/sec: 148.0 +/- 0.2 (jitter = 0.9)    7.356
70  images/sec: 148.0 +/- 0.2 (jitter = 0.9)    7.441
80  images/sec: 147.9 +/- 0.2 (jitter = 0.9)    7.423
90  images/sec: 147.9 +/- 0.2 (jitter = 0.9)    7.334
100 images/sec: 147.8 +/- 0.2 (jitter = 0.9)    7.308
----------------------------------------------------------------
total images/sec: 147.70
----------------------------------------------------------------

Resnet50

 python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32
1   images/sec: 230.0 +/- 0.0 (jitter = 0.0)    8.458
10  images/sec: 232.5 +/- 0.7 (jitter = 2.8)    7.997
20  images/sec: 233.4 +/- 0.4 (jitter = 1.5)    8.260
30  images/sec: 233.4 +/- 0.3 (jitter = 1.2)    8.337
40  images/sec: 233.3 +/- 0.2 (jitter = 1.4)    8.197
50  images/sec: 232.3 +/- 0.5 (jitter = 1.6)    7.759
60  images/sec: 231.9 +/- 0.5 (jitter = 1.6)    8.059
70  images/sec: 231.9 +/- 0.5 (jitter = 1.6)    8.481
80  images/sec: 231.9 +/- 0.5 (jitter = 1.5)    8.279
90  images/sec: 232.1 +/- 0.4 (jitter = 1.7)    8.019
100 images/sec: 231.8 +/- 0.4 (jitter = 1.8)    8.009
----------------------------------------------------------------
total images/sec: 231.63
----------------------------------------------------------------

FP16

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16
1   images/sec: 325.0 +/- 0.0 (jitter = 0.0)    7.979
10  images/sec: 315.0 +/- 3.2 (jitter = 7.0)    8.049
20  images/sec: 315.9 +/- 1.9 (jitter = 6.6)    8.331
30  images/sec: 316.0 +/- 1.5 (jitter = 7.4)    8.063
40  images/sec: 313.8 +/- 1.7 (jitter = 7.5)    8.678
50  images/sec: 314.4 +/- 1.3 (jitter = 5.4)    8.290
60  images/sec: 314.7 +/- 1.1 (jitter = 4.9)    8.344
70  images/sec: 315.0 +/- 1.0 (jitter = 3.4)    8.160
80  images/sec: 315.2 +/- 0.9 (jitter = 3.3)    8.145
90  images/sec: 315.0 +/- 0.9 (jitter = 3.2)    8.418
100 images/sec: 315.2 +/- 0.8 (jitter = 2.9)    8.296
----------------------------------------------------------------
total images/sec: 314.90
----------------------------------------------------------------

Resnet152

 python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16
1   images/sec: 92.9 +/- 0.0 (jitter = 0.0) 9.936
10  images/sec: 91.8 +/- 0.8 (jitter = 0.4) 9.680
20  images/sec: 91.3 +/- 0.6 (jitter = 0.8) 9.762
30  images/sec: 91.4 +/- 0.4 (jitter = 0.8) 9.945
40  images/sec: 91.7 +/- 0.3 (jitter = 0.7) 9.962
50  images/sec: 91.9 +/- 0.3 (jitter = 0.6) 9.992
60  images/sec: 92.1 +/- 0.2 (jitter = 0.5) 10.279
70  images/sec: 92.1 +/- 0.2 (jitter = 0.5) 9.990
80  images/sec: 92.1 +/- 0.2 (jitter = 0.5) 9.969
90  images/sec: 92.1 +/- 0.2 (jitter = 0.5) 10.196
100 images/sec: 92.0 +/- 0.2 (jitter = 0.6) 10.034
----------------------------------------------------------------
total images/sec: 91.93
----------------------------------------------------------------

FP16

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32 --use_fp16
1   images/sec: 110.6 +/- 0.0 (jitter = 0.0)    10.107
10  images/sec: 118.1 +/- 1.6 (jitter = 2.0)    9.862
20  images/sec: 119.7 +/- 0.9 (jitter = 1.4)    9.758
30  images/sec: 120.1 +/- 0.7 (jitter = 1.2)    10.020
40  images/sec: 120.3 +/- 0.6 (jitter = 1.2)    9.834
50  images/sec: 120.2 +/- 0.5 (jitter = 1.1)    9.973
60  images/sec: 120.4 +/- 0.5 (jitter = 1.2)    9.637
70  images/sec: 120.4 +/- 0.4 (jitter = 1.2)    9.880
80  images/sec: 120.2 +/- 0.4 (jitter = 1.2)    9.619
90  images/sec: 120.4 +/- 0.4 (jitter = 1.0)    10.036
100 images/sec: 120.3 +/- 0.4 (jitter = 0.9)    10.061
----------------------------------------------------------------
total images/sec: 120.28
----------------------------------------------------------------

ALexnet

 python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model  alexnet --batch_size 32
1   images/sec: 517.4 +/- 0.0 (jitter = 0.0)    7.205
10  images/sec: 523.8 +/- 2.2 (jitter = 2.4)    7.205
20  images/sec: 526.5 +/- 2.3 (jitter = 4.4)    7.205
30  images/sec: 526.3 +/- 1.7 (jitter = 7.1)    7.205
40  images/sec: 527.3 +/- 1.6 (jitter = 7.1)    7.205
50  images/sec: 526.7 +/- 1.3 (jitter = 6.3)    7.205
60  images/sec: 527.5 +/- 1.3 (jitter = 6.5)    7.205
70  images/sec: 526.4 +/- 1.2 (jitter = 6.9)    7.205
80  images/sec: 526.5 +/- 1.1 (jitter = 6.9)    7.205
90  images/sec: 526.2 +/- 1.0 (jitter = 6.2)    7.205
100 images/sec: 526.3 +/- 0.9 (jitter = 6.6)    7.205
----------------------------------------------------------------
total images/sec: 525.37
----------------------------------------------------------------

FP16

python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model  alexnet --batch_size 32 --use_fp16
1   images/sec: 1250.9 +/- 0.0 (jitter = 0.0)   nan
10  images/sec: 1268.1 +/- 7.5 (jitter = 28.0)  nan
20  images/sec: 1264.7 +/- 4.8 (jitter = 28.3)  nan
30  images/sec: 1261.6 +/- 6.0 (jitter = 25.2)  nan
40  images/sec: 1265.2 +/- 4.8 (jitter = 22.8)  nan
50  images/sec: 1265.3 +/- 4.2 (jitter = 22.8)  nan
60  images/sec: 1259.4 +/- 5.4 (jitter = 26.1)  nan
70  images/sec: 1255.0 +/- 5.6 (jitter = 26.0)  nan
80  images/sec: 1256.7 +/- 5.0 (jitter = 25.3)  nan
90  images/sec: 1257.6 +/- 4.5 (jitter = 25.8)  nan
100 images/sec: 1258.3 +/- 4.1 (jitter = 23.9)  nan
----------------------------------------------------------------
total images/sec: 1252.77
----------------------------------------------------------------

VGG16

 python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32
1   images/sec: 130.0 +/- 0.0 (jitter = 0.0)    7.289
10  images/sec: 130.3 +/- 0.1 (jitter = 0.5)    7.275
20  images/sec: 130.3 +/- 0.1 (jitter = 0.5)    7.215
30  images/sec: 130.4 +/- 0.1 (jitter = 0.5)    7.293
40  images/sec: 130.3 +/- 0.1 (jitter = 0.5)    7.249
50  images/sec: 130.3 +/- 0.1 (jitter = 0.5)    7.318
60  images/sec: 130.3 +/- 0.0 (jitter = 0.5)    7.272
70  images/sec: 130.3 +/- 0.0 (jitter = 0.4)    7.250
80  images/sec: 130.3 +/- 0.0 (jitter = 0.5)    7.264
90  images/sec: 130.2 +/- 0.0 (jitter = 0.5)    7.238
100 images/sec: 130.2 +/- 0.0 (jitter = 0.5)    7.240
----------------------------------------------------------------
total images/sec: 130.13
----------------------------------------------------------------

FP16

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32 --use_fp16

1   images/sec: 192.6 +/- 0.0 (jitter = 0.0)    7.281
10  images/sec: 193.0 +/- 0.2 (jitter = 0.4)    7.308
20  images/sec: 193.2 +/- 0.2 (jitter = 1.0)    7.232
30  images/sec: 193.1 +/- 0.1 (jitter = 0.8)    7.287
40  images/sec: 193.0 +/- 0.1 (jitter = 0.6)    7.295
50  images/sec: 193.0 +/- 0.1 (jitter = 0.6)    7.265
60  images/sec: 193.0 +/- 0.1 (jitter = 0.6)    7.253
70  images/sec: 192.9 +/- 0.1 (jitter = 0.6)    7.264
80  images/sec: 192.9 +/- 0.1 (jitter = 0.5)    7.269
90  images/sec: 192.8 +/- 0.1 (jitter = 0.5)    7.270
100 images/sec: 192.8 +/- 0.1 (jitter = 0.5)    7.249
----------------------------------------------------------------
total images/sec: 192.66
----------------------------------------------------------------

参考

https://qiita.com/syoyo/items/4a9c3e17969757ab5422
ROCm 1.8.2 で TensorFlow upstream(1.9.0-rc0)をコンパイルする試み(CIFAR10 11500 examples/sec @ VEGA56 80W. 2018 年 7 月 22 日時点)

syoyoさんありがとうございました とても参考になりました。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0