More than 5 years have passed since last update.

ROCm2.4+tensorflow2.0のαリリース(lpha0-config-v2)Docker コンテナイメージを動作検証

Posted at 2019-06-09

概要

https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/362
どうやらDocker imageとしてですがROCｍでもtensorflow2.0動くみたいなので軽く動かしてみます。

https://hub.docker.com/u/rocm
https://hub.docker.com/r/rocm/tensorflow/tags

rocm2.4-tf2.0-alpha0-config-v2 と言うtagでdocker pullすればimageをpullできそうです。

環境構築

docker-ceのインストールはhttps://qiita.com/tkyonezu/items/0f6da57eb2d823d2611d
を参考にさせていただきました。

docker image pull

  sudo docker pull rocm/tensorflow:rocm2.4-tf2.0-alpha0-config-v2

docker imag listはこんな感じになるはずです

REPOSITORY          TAG                              IMAGE ID            CREATED             SIZE
rocm/tensorflow     rocm2.4-tf2.0-alpha0-config-v2   b1746f7b7f09        2 weeks ago         6.53GB

次にDocker runさせます

$ sudo docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/tensorflow:rocm2.4-tf2.0-alpha0-config-v2

コンテナの中の/rootディレクトリは多分こんな感じにになってるはずです

root@a0662b5acae0:/root# ls
bazel-0.19.2-installer-linux-x86_64.sh  benchmarks  convnet-benchmarks  models  tensorflow

これからTensorflow-rocm2.0のビルドをします

 cd /root/tensorflow
./configure

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.19.2 installed.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: y
ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: n
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=haswell -Wno-sign-compare]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
	--config=gdr         	# Build with GDR support.
	--config=verbs       	# Build with libverbs support.
	--config=ngraph      	# Build with Intel nGraph support.
	--config=numa        	# Build with NUMA support.
	--config=dynamic_kernels	# (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
	--config=noaws       	# Disable AWS S3 filesystem support.
	--config=nogcp       	# Disable GCP support.
	--config=nohdfs      	# Disable HDFS support.
	--config=noignite    	# Disable Apache Ignite support.
	--config=nokafka     	# Disable Apache Kafka support.
	--config=nonccl      	# Disable NVIDIA NCCL support.
Configuration finished
/root/tensorflow# ./build_rocm_python3

普通に時間かかるのでしばらくお待ちください

念の為pip3 listを確認します。

 pip3 list
Package              Version               
-------------------- ----------------------
absl-py              0.7.1                 
astor                0.7.1                 
attrs                19.1.0                
backcall             0.1.0                 
bleach               3.1.0                 
chardet              2.3.0                 
decorator            4.4.0                 
defusedxml           0.6.0                 
entrypoints          0.3                   
enum34               1.1.6                 
gast                 0.2.2                 
google-pasta         0.1.6                 
grpcio               1.20.1                
h5py                 2.9.0                 
ipykernel            5.1.0                 
ipython              7.5.0                 
ipython-genutils     0.2.0                 
ipywidgets           7.4.2                 
jedi                 0.13.3                
Jinja2               2.10.1                
jsonschema           3.0.1                 
jupyter              1.0.0                 
jupyter-client       5.2.4                 
jupyter-console      6.0.0                 
jupyter-core         4.4.0                 
Keras-Applications   1.0.7                 
Keras-Preprocessing  1.0.9                 
Markdown             3.1                   
MarkupSafe           1.1.1                 
mistune              0.8.4                 
mock                 3.0.5                 
nbconvert            5.5.0                 
nbformat             4.4.0                 
notebook             5.7.8                 
numpy                1.16.3                
pandocfilters        1.4.2                 
parso                0.4.0                 
pexpect              4.7.0                 
pickleshare          0.7.5                 
pip                  19.1.1                
prometheus-client    0.6.0                 
prompt-toolkit       2.0.9                 
protobuf             3.7.1                 
ptyprocess           0.6.0                 
pycurl               7.43.0                
Pygments             2.4.0                 
pygobject            3.20.0                
pyrsistent           0.15.2                
python-apt           1.1.0b1+ubuntu0.16.4.2
python-dateutil      2.8.0                 
pyzmq                18.0.1                
qtconsole            4.4.4                 
requests             2.9.1                 
Send2Trash           1.5.0                 
setuptools           41.0.1                
six                  1.12.0                
ssh-import-id        5.5                   
tb-nightly           1.14.0a20190301       
tensorflow           2.0.0a0               
termcolor            1.1.0                 
terminado            0.8.2                 
testpath             0.4.2                 
tf-estimator-nightly 1.14.0.dev2019030115  
tornado              6.0.2                 
traitlets            4.3.2                 
unattended-upgrades  0.1                   
urllib3              1.13.1                
wcwidth              0.1.7                 
webencodings         0.5.1                 
Werkzeug             0.15.2                
wheel                0.29.0                
widgetsnbextension   3.4.2

tensorflow2.0.0a0が入ったことがわかります

rocmバージョンは

# apt show rocm-libs -a
Package: rocm-libs
Version: 2.4.25
Status: install ok installed
Priority: optional
Section: devel
Maintainer: Advanced Micro Devices Inc.
Installed-Size: 13.3 kB
Depends: rocfft, rocrand, hipblas, rocblas
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: unknown
APT-Manual-Installed: yes
APT-Sources: /var/lib/dpkg/status
Description: Radeon Open Compute (ROCm) Runtime software stack

ベンチマーク

図1.0 ベンチマーク

ベンチマークの総評としてはTF2.0を使ったからと言って特段性能が高い感じはしませんでした。一々ビルドする手間を考えると普通に1.13.3を使えばいいのではないでしょうか。

inception3

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model inception3 --batch_size 32

Step	Img/sec	total_loss
1	images/sec: 119.8 +/- 0.0 (jitter = 0.0)	7.415
10	images/sec: 119.7 +/- 0.1 (jitter = 0.2)	7.394
20	images/sec: 119.4 +/- 0.2 (jitter = 0.3)	7.324
30	images/sec: 119.0 +/- 0.3 (jitter = 0.4)	7.487
40	images/sec: 119.0 +/- 0.2 (jitter = 0.4)	7.353
50	images/sec: 119.1 +/- 0.2 (jitter = 0.4)	7.369
60	images/sec: 119.0 +/- 0.2 (jitter = 0.4)	7.433
70	images/sec: 118.9 +/- 0.2 (jitter = 0.4)	7.317
80	images/sec: 119.0 +/- 0.2 (jitter = 0.5)	7.357
90	images/sec: 118.9 +/- 0.2 (jitter = 0.5)	7.485
100	images/sec: 118.9 +/- 0.1 (jitter = 0.5)	7.431
----------------------------------------------------------------
total images/sec: 118.86
----------------------------------------------------------------

FP16

# python3  ./tf_cnn_benchmarks.py --num_gpus=1  --model inception3 --batch_size 32 --use_fp16
Step	Img/sec	total_loss
1	images/sec: 148.5 +/- 0.0 (jitter = 0.0)	7.388
10	images/sec: 148.7 +/- 0.3 (jitter = 0.9)	7.390
20	images/sec: 148.4 +/- 0.2 (jitter = 0.6)	7.454
30	images/sec: 148.4 +/- 0.2 (jitter = 0.7)	7.359
40	images/sec: 148.4 +/- 0.1 (jitter = 0.7)	7.338
50	images/sec: 147.9 +/- 0.3 (jitter = 0.8)	7.387
60	images/sec: 148.0 +/- 0.2 (jitter = 0.9)	7.356
70	images/sec: 148.0 +/- 0.2 (jitter = 0.9)	7.441
80	images/sec: 147.9 +/- 0.2 (jitter = 0.9)	7.423
90	images/sec: 147.9 +/- 0.2 (jitter = 0.9)	7.334
100	images/sec: 147.8 +/- 0.2 (jitter = 0.9)	7.308
----------------------------------------------------------------
total images/sec: 147.70
----------------------------------------------------------------

Resnet50

 python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32

1	images/sec: 230.0 +/- 0.0 (jitter = 0.0)	8.458
10	images/sec: 232.5 +/- 0.7 (jitter = 2.8)	7.997
20	images/sec: 233.4 +/- 0.4 (jitter = 1.5)	8.260
30	images/sec: 233.4 +/- 0.3 (jitter = 1.2)	8.337
40	images/sec: 233.3 +/- 0.2 (jitter = 1.4)	8.197
50	images/sec: 232.3 +/- 0.5 (jitter = 1.6)	7.759
60	images/sec: 231.9 +/- 0.5 (jitter = 1.6)	8.059
70	images/sec: 231.9 +/- 0.5 (jitter = 1.6)	8.481
80	images/sec: 231.9 +/- 0.5 (jitter = 1.5)	8.279
90	images/sec: 232.1 +/- 0.4 (jitter = 1.7)	8.019
100	images/sec: 231.8 +/- 0.4 (jitter = 1.8)	8.009
----------------------------------------------------------------
total images/sec: 231.63
----------------------------------------------------------------

FP16

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16

1	images/sec: 325.0 +/- 0.0 (jitter = 0.0)	7.979
10	images/sec: 315.0 +/- 3.2 (jitter = 7.0)	8.049
20	images/sec: 315.9 +/- 1.9 (jitter = 6.6)	8.331
30	images/sec: 316.0 +/- 1.5 (jitter = 7.4)	8.063
40	images/sec: 313.8 +/- 1.7 (jitter = 7.5)	8.678
50	images/sec: 314.4 +/- 1.3 (jitter = 5.4)	8.290
60	images/sec: 314.7 +/- 1.1 (jitter = 4.9)	8.344
70	images/sec: 315.0 +/- 1.0 (jitter = 3.4)	8.160
80	images/sec: 315.2 +/- 0.9 (jitter = 3.3)	8.145
90	images/sec: 315.0 +/- 0.9 (jitter = 3.2)	8.418
100	images/sec: 315.2 +/- 0.8 (jitter = 2.9)	8.296
----------------------------------------------------------------
total images/sec: 314.90
----------------------------------------------------------------

Resnet152

 python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16

1	images/sec: 92.9 +/- 0.0 (jitter = 0.0)	9.936
10	images/sec: 91.8 +/- 0.8 (jitter = 0.4)	9.680
20	images/sec: 91.3 +/- 0.6 (jitter = 0.8)	9.762
30	images/sec: 91.4 +/- 0.4 (jitter = 0.8)	9.945
40	images/sec: 91.7 +/- 0.3 (jitter = 0.7)	9.962
50	images/sec: 91.9 +/- 0.3 (jitter = 0.6)	9.992
60	images/sec: 92.1 +/- 0.2 (jitter = 0.5)	10.279
70	images/sec: 92.1 +/- 0.2 (jitter = 0.5)	9.990
80	images/sec: 92.1 +/- 0.2 (jitter = 0.5)	9.969
90	images/sec: 92.1 +/- 0.2 (jitter = 0.5)	10.196
100	images/sec: 92.0 +/- 0.2 (jitter = 0.6)	10.034
----------------------------------------------------------------
total images/sec: 91.93
----------------------------------------------------------------

FP16

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32 --use_fp16

1	images/sec: 110.6 +/- 0.0 (jitter = 0.0)	10.107
10	images/sec: 118.1 +/- 1.6 (jitter = 2.0)	9.862
20	images/sec: 119.7 +/- 0.9 (jitter = 1.4)	9.758
30	images/sec: 120.1 +/- 0.7 (jitter = 1.2)	10.020
40	images/sec: 120.3 +/- 0.6 (jitter = 1.2)	9.834
50	images/sec: 120.2 +/- 0.5 (jitter = 1.1)	9.973
60	images/sec: 120.4 +/- 0.5 (jitter = 1.2)	9.637
70	images/sec: 120.4 +/- 0.4 (jitter = 1.2)	9.880
80	images/sec: 120.2 +/- 0.4 (jitter = 1.2)	9.619
90	images/sec: 120.4 +/- 0.4 (jitter = 1.0)	10.036
100	images/sec: 120.3 +/- 0.4 (jitter = 0.9)	10.061
----------------------------------------------------------------
total images/sec: 120.28
----------------------------------------------------------------

ALexnet

 python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model  alexnet --batch_size 32

1	images/sec: 517.4 +/- 0.0 (jitter = 0.0)	7.205
10	images/sec: 523.8 +/- 2.2 (jitter = 2.4)	7.205
20	images/sec: 526.5 +/- 2.3 (jitter = 4.4)	7.205
30	images/sec: 526.3 +/- 1.7 (jitter = 7.1)	7.205
40	images/sec: 527.3 +/- 1.6 (jitter = 7.1)	7.205
50	images/sec: 526.7 +/- 1.3 (jitter = 6.3)	7.205
60	images/sec: 527.5 +/- 1.3 (jitter = 6.5)	7.205
70	images/sec: 526.4 +/- 1.2 (jitter = 6.9)	7.205
80	images/sec: 526.5 +/- 1.1 (jitter = 6.9)	7.205
90	images/sec: 526.2 +/- 1.0 (jitter = 6.2)	7.205
100	images/sec: 526.3 +/- 0.9 (jitter = 6.6)	7.205
----------------------------------------------------------------
total images/sec: 525.37
----------------------------------------------------------------

FP16

python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model  alexnet --batch_size 32 --use_fp16

1	images/sec: 1250.9 +/- 0.0 (jitter = 0.0)	nan
10	images/sec: 1268.1 +/- 7.5 (jitter = 28.0)	nan
20	images/sec: 1264.7 +/- 4.8 (jitter = 28.3)	nan
30	images/sec: 1261.6 +/- 6.0 (jitter = 25.2)	nan
40	images/sec: 1265.2 +/- 4.8 (jitter = 22.8)	nan
50	images/sec: 1265.3 +/- 4.2 (jitter = 22.8)	nan
60	images/sec: 1259.4 +/- 5.4 (jitter = 26.1)	nan
70	images/sec: 1255.0 +/- 5.6 (jitter = 26.0)	nan
80	images/sec: 1256.7 +/- 5.0 (jitter = 25.3)	nan
90	images/sec: 1257.6 +/- 4.5 (jitter = 25.8)	nan
100	images/sec: 1258.3 +/- 4.1 (jitter = 23.9)	nan
----------------------------------------------------------------
total images/sec: 1252.77
----------------------------------------------------------------

VGG16

 python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32

1	images/sec: 130.0 +/- 0.0 (jitter = 0.0)	7.289
10	images/sec: 130.3 +/- 0.1 (jitter = 0.5)	7.275
20	images/sec: 130.3 +/- 0.1 (jitter = 0.5)	7.215
30	images/sec: 130.4 +/- 0.1 (jitter = 0.5)	7.293
40	images/sec: 130.3 +/- 0.1 (jitter = 0.5)	7.249
50	images/sec: 130.3 +/- 0.1 (jitter = 0.5)	7.318
60	images/sec: 130.3 +/- 0.0 (jitter = 0.5)	7.272
70	images/sec: 130.3 +/- 0.0 (jitter = 0.4)	7.250
80	images/sec: 130.3 +/- 0.0 (jitter = 0.5)	7.264
90	images/sec: 130.2 +/- 0.0 (jitter = 0.5)	7.238
100	images/sec: 130.2 +/- 0.0 (jitter = 0.5)	7.240
----------------------------------------------------------------
total images/sec: 130.13
----------------------------------------------------------------

FP16

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32 --use_fp16

1	images/sec: 192.6 +/- 0.0 (jitter = 0.0)	7.281
10	images/sec: 193.0 +/- 0.2 (jitter = 0.4)	7.308
20	images/sec: 193.2 +/- 0.2 (jitter = 1.0)	7.232
30	images/sec: 193.1 +/- 0.1 (jitter = 0.8)	7.287
40	images/sec: 193.0 +/- 0.1 (jitter = 0.6)	7.295
50	images/sec: 193.0 +/- 0.1 (jitter = 0.6)	7.265
60	images/sec: 193.0 +/- 0.1 (jitter = 0.6)	7.253
70	images/sec: 192.9 +/- 0.1 (jitter = 0.6)	7.264
80	images/sec: 192.9 +/- 0.1 (jitter = 0.5)	7.269
90	images/sec: 192.8 +/- 0.1 (jitter = 0.5)	7.270
100	images/sec: 192.8 +/- 0.1 (jitter = 0.5)	7.249
----------------------------------------------------------------
total images/sec: 192.66
----------------------------------------------------------------

参考

https://qiita.com/syoyo/items/4a9c3e17969757ab5422
ROCm 1.8.2 で TensorFlow upstream(1.9.0-rc0)をコンパイルする試み(CIFAR10 11500 examples/sec @ VEGA56 80W. 2018 年 7 月 22 日時点)

syoyoさんありがとうございました　とても参考になりました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up