0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

ROCm2.4+tensorflow2.0のαリリース(lpha0-config-v2)Docker コンテナイメージを動作検証

Posted at

概要

https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/362
どうやらDocker imageとしてですがROCmでもtensorflow2.0動くみたいなので軽く動かしてみます。

https://hub.docker.com/u/rocm
https://hub.docker.com/r/rocm/tensorflow/tags

rocm2.4-tf2.0-alpha0-config-v2 と言うtagでdocker pullすればimageをpullできそうです。

環境構築

docker-ceのインストールはhttps://qiita.com/tkyonezu/items/0f6da57eb2d823d2611d
を参考にさせていただきました。

docker image pull

  sudo docker pull rocm/tensorflow:rocm2.4-tf2.0-alpha0-config-v2

docker imag listはこんな感じになるはずです

REPOSITORY          TAG                              IMAGE ID            CREATED             SIZE
rocm/tensorflow     rocm2.4-tf2.0-alpha0-config-v2   b1746f7b7f09        2 weeks ago         6.53GB

次にDocker runさせます

$ sudo docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/tensorflow:rocm2.4-tf2.0-alpha0-config-v2

コンテナの中の/rootディレクトリは多分こんな感じにになってるはずです

root@a0662b5acae0:/root# ls
bazel-0.19.2-installer-linux-x86_64.sh  benchmarks  convnet-benchmarks  models  tensorflow

これからTensorflow-rocm2.0のビルドをします

 cd /root/tensorflow
./configure

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.19.2 installed.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: y
ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: n
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=haswell -Wno-sign-compare]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
	--config=gdr         	# Build with GDR support.
	--config=verbs       	# Build with libverbs support.
	--config=ngraph      	# Build with Intel nGraph support.
	--config=numa        	# Build with NUMA support.
	--config=dynamic_kernels	# (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
	--config=noaws       	# Disable AWS S3 filesystem support.
	--config=nogcp       	# Disable GCP support.
	--config=nohdfs      	# Disable HDFS support.
	--config=noignite    	# Disable Apache Ignite support.
	--config=nokafka     	# Disable Apache Kafka support.
	--config=nonccl      	# Disable NVIDIA NCCL support.
Configuration finished
/root/tensorflow# ./build_rocm_python3 

普通に時間かかるのでしばらくお待ちください

念の為pip3 listを確認します。

 pip3 list
Package              Version               
-------------------- ----------------------
absl-py              0.7.1                 
astor                0.7.1                 
attrs                19.1.0                
backcall             0.1.0                 
bleach               3.1.0                 
chardet              2.3.0                 
decorator            4.4.0                 
defusedxml           0.6.0                 
entrypoints          0.3                   
enum34               1.1.6                 
gast                 0.2.2                 
google-pasta         0.1.6                 
grpcio               1.20.1                
h5py                 2.9.0                 
ipykernel            5.1.0                 
ipython              7.5.0                 
ipython-genutils     0.2.0                 
ipywidgets           7.4.2                 
jedi                 0.13.3                
Jinja2               2.10.1                
jsonschema           3.0.1                 
jupyter              1.0.0                 
jupyter-client       5.2.4                 
jupyter-console      6.0.0                 
jupyter-core         4.4.0                 
Keras-Applications   1.0.7                 
Keras-Preprocessing  1.0.9                 
Markdown             3.1                   
MarkupSafe           1.1.1                 
mistune              0.8.4                 
mock                 3.0.5                 
nbconvert            5.5.0                 
nbformat             4.4.0                 
notebook             5.7.8                 
numpy                1.16.3                
pandocfilters        1.4.2                 
parso                0.4.0                 
pexpect              4.7.0                 
pickleshare          0.7.5                 
pip                  19.1.1                
prometheus-client    0.6.0                 
prompt-toolkit       2.0.9                 
protobuf             3.7.1                 
ptyprocess           0.6.0                 
pycurl               7.43.0                
Pygments             2.4.0                 
pygobject            3.20.0                
pyrsistent           0.15.2                
python-apt           1.1.0b1+ubuntu0.16.4.2
python-dateutil      2.8.0                 
pyzmq                18.0.1                
qtconsole            4.4.4                 
requests             2.9.1                 
Send2Trash           1.5.0                 
setuptools           41.0.1                
six                  1.12.0                
ssh-import-id        5.5                   
tb-nightly           1.14.0a20190301       
tensorflow           2.0.0a0               
termcolor            1.1.0                 
terminado            0.8.2                 
testpath             0.4.2                 
tf-estimator-nightly 1.14.0.dev2019030115  
tornado              6.0.2                 
traitlets            4.3.2                 
unattended-upgrades  0.1                   
urllib3              1.13.1                
wcwidth              0.1.7                 
webencodings         0.5.1                 
Werkzeug             0.15.2                
wheel                0.29.0                
widgetsnbextension   3.4.2  

tensorflow2.0.0a0が入ったことがわかります

rocmバージョンは

# apt show rocm-libs -a
Package: rocm-libs
Version: 2.4.25
Status: install ok installed
Priority: optional
Section: devel
Maintainer: Advanced Micro Devices Inc.
Installed-Size: 13.3 kB
Depends: rocfft, rocrand, hipblas, rocblas
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: unknown
APT-Manual-Installed: yes
APT-Sources: /var/lib/dpkg/status
Description: Radeon Open Compute (ROCm) Runtime software stack

ベンチマーク

図1.0 ベンチマーク
(TF1.13.3)ROCm2.3+(TF1.13.3)ROCm2.4(RadeonⅦ)+(ROCm2.4+TF2.0 RadeonⅦ).png

ベンチマークの総評としてはTF2.0を使ったからと言って特段性能が高い感じはしませんでした。一々ビルドする手間を考えると普通に1.13.3を使えばいいのではないでしょうか。

inception3

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model inception3 --batch_size 32

Step	Img/sec	total_loss
1	images/sec: 119.8 +/- 0.0 (jitter = 0.0)	7.415
10	images/sec: 119.7 +/- 0.1 (jitter = 0.2)	7.394
20	images/sec: 119.4 +/- 0.2 (jitter = 0.3)	7.324
30	images/sec: 119.0 +/- 0.3 (jitter = 0.4)	7.487
40	images/sec: 119.0 +/- 0.2 (jitter = 0.4)	7.353
50	images/sec: 119.1 +/- 0.2 (jitter = 0.4)	7.369
60	images/sec: 119.0 +/- 0.2 (jitter = 0.4)	7.433
70	images/sec: 118.9 +/- 0.2 (jitter = 0.4)	7.317
80	images/sec: 119.0 +/- 0.2 (jitter = 0.5)	7.357
90	images/sec: 118.9 +/- 0.2 (jitter = 0.5)	7.485
100	images/sec: 118.9 +/- 0.1 (jitter = 0.5)	7.431
----------------------------------------------------------------
total images/sec: 118.86
----------------------------------------------------------------

FP16

# python3  ./tf_cnn_benchmarks.py --num_gpus=1  --model inception3 --batch_size 32 --use_fp16
Step	Img/sec	total_loss
1	images/sec: 148.5 +/- 0.0 (jitter = 0.0)	7.388
10	images/sec: 148.7 +/- 0.3 (jitter = 0.9)	7.390
20	images/sec: 148.4 +/- 0.2 (jitter = 0.6)	7.454
30	images/sec: 148.4 +/- 0.2 (jitter = 0.7)	7.359
40	images/sec: 148.4 +/- 0.1 (jitter = 0.7)	7.338
50	images/sec: 147.9 +/- 0.3 (jitter = 0.8)	7.387
60	images/sec: 148.0 +/- 0.2 (jitter = 0.9)	7.356
70	images/sec: 148.0 +/- 0.2 (jitter = 0.9)	7.441
80	images/sec: 147.9 +/- 0.2 (jitter = 0.9)	7.423
90	images/sec: 147.9 +/- 0.2 (jitter = 0.9)	7.334
100	images/sec: 147.8 +/- 0.2 (jitter = 0.9)	7.308
----------------------------------------------------------------
total images/sec: 147.70
----------------------------------------------------------------

Resnet50

 python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32
1	images/sec: 230.0 +/- 0.0 (jitter = 0.0)	8.458
10	images/sec: 232.5 +/- 0.7 (jitter = 2.8)	7.997
20	images/sec: 233.4 +/- 0.4 (jitter = 1.5)	8.260
30	images/sec: 233.4 +/- 0.3 (jitter = 1.2)	8.337
40	images/sec: 233.3 +/- 0.2 (jitter = 1.4)	8.197
50	images/sec: 232.3 +/- 0.5 (jitter = 1.6)	7.759
60	images/sec: 231.9 +/- 0.5 (jitter = 1.6)	8.059
70	images/sec: 231.9 +/- 0.5 (jitter = 1.6)	8.481
80	images/sec: 231.9 +/- 0.5 (jitter = 1.5)	8.279
90	images/sec: 232.1 +/- 0.4 (jitter = 1.7)	8.019
100	images/sec: 231.8 +/- 0.4 (jitter = 1.8)	8.009
----------------------------------------------------------------
total images/sec: 231.63
----------------------------------------------------------------

FP16

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16
1	images/sec: 325.0 +/- 0.0 (jitter = 0.0)	7.979
10	images/sec: 315.0 +/- 3.2 (jitter = 7.0)	8.049
20	images/sec: 315.9 +/- 1.9 (jitter = 6.6)	8.331
30	images/sec: 316.0 +/- 1.5 (jitter = 7.4)	8.063
40	images/sec: 313.8 +/- 1.7 (jitter = 7.5)	8.678
50	images/sec: 314.4 +/- 1.3 (jitter = 5.4)	8.290
60	images/sec: 314.7 +/- 1.1 (jitter = 4.9)	8.344
70	images/sec: 315.0 +/- 1.0 (jitter = 3.4)	8.160
80	images/sec: 315.2 +/- 0.9 (jitter = 3.3)	8.145
90	images/sec: 315.0 +/- 0.9 (jitter = 3.2)	8.418
100	images/sec: 315.2 +/- 0.8 (jitter = 2.9)	8.296
----------------------------------------------------------------
total images/sec: 314.90
----------------------------------------------------------------

Resnet152

 python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16
1	images/sec: 92.9 +/- 0.0 (jitter = 0.0)	9.936
10	images/sec: 91.8 +/- 0.8 (jitter = 0.4)	9.680
20	images/sec: 91.3 +/- 0.6 (jitter = 0.8)	9.762
30	images/sec: 91.4 +/- 0.4 (jitter = 0.8)	9.945
40	images/sec: 91.7 +/- 0.3 (jitter = 0.7)	9.962
50	images/sec: 91.9 +/- 0.3 (jitter = 0.6)	9.992
60	images/sec: 92.1 +/- 0.2 (jitter = 0.5)	10.279
70	images/sec: 92.1 +/- 0.2 (jitter = 0.5)	9.990
80	images/sec: 92.1 +/- 0.2 (jitter = 0.5)	9.969
90	images/sec: 92.1 +/- 0.2 (jitter = 0.5)	10.196
100	images/sec: 92.0 +/- 0.2 (jitter = 0.6)	10.034
----------------------------------------------------------------
total images/sec: 91.93
----------------------------------------------------------------

FP16

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32 --use_fp16
1	images/sec: 110.6 +/- 0.0 (jitter = 0.0)	10.107
10	images/sec: 118.1 +/- 1.6 (jitter = 2.0)	9.862
20	images/sec: 119.7 +/- 0.9 (jitter = 1.4)	9.758
30	images/sec: 120.1 +/- 0.7 (jitter = 1.2)	10.020
40	images/sec: 120.3 +/- 0.6 (jitter = 1.2)	9.834
50	images/sec: 120.2 +/- 0.5 (jitter = 1.1)	9.973
60	images/sec: 120.4 +/- 0.5 (jitter = 1.2)	9.637
70	images/sec: 120.4 +/- 0.4 (jitter = 1.2)	9.880
80	images/sec: 120.2 +/- 0.4 (jitter = 1.2)	9.619
90	images/sec: 120.4 +/- 0.4 (jitter = 1.0)	10.036
100	images/sec: 120.3 +/- 0.4 (jitter = 0.9)	10.061
----------------------------------------------------------------
total images/sec: 120.28
----------------------------------------------------------------

ALexnet

 python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model  alexnet --batch_size 32
1	images/sec: 517.4 +/- 0.0 (jitter = 0.0)	7.205
10	images/sec: 523.8 +/- 2.2 (jitter = 2.4)	7.205
20	images/sec: 526.5 +/- 2.3 (jitter = 4.4)	7.205
30	images/sec: 526.3 +/- 1.7 (jitter = 7.1)	7.205
40	images/sec: 527.3 +/- 1.6 (jitter = 7.1)	7.205
50	images/sec: 526.7 +/- 1.3 (jitter = 6.3)	7.205
60	images/sec: 527.5 +/- 1.3 (jitter = 6.5)	7.205
70	images/sec: 526.4 +/- 1.2 (jitter = 6.9)	7.205
80	images/sec: 526.5 +/- 1.1 (jitter = 6.9)	7.205
90	images/sec: 526.2 +/- 1.0 (jitter = 6.2)	7.205
100	images/sec: 526.3 +/- 0.9 (jitter = 6.6)	7.205
----------------------------------------------------------------
total images/sec: 525.37
----------------------------------------------------------------

FP16

python3  ./tf_cnn_benchmarks.py --num_gpus=1 --model  alexnet --batch_size 32 --use_fp16
1	images/sec: 1250.9 +/- 0.0 (jitter = 0.0)	nan
10	images/sec: 1268.1 +/- 7.5 (jitter = 28.0)	nan
20	images/sec: 1264.7 +/- 4.8 (jitter = 28.3)	nan
30	images/sec: 1261.6 +/- 6.0 (jitter = 25.2)	nan
40	images/sec: 1265.2 +/- 4.8 (jitter = 22.8)	nan
50	images/sec: 1265.3 +/- 4.2 (jitter = 22.8)	nan
60	images/sec: 1259.4 +/- 5.4 (jitter = 26.1)	nan
70	images/sec: 1255.0 +/- 5.6 (jitter = 26.0)	nan
80	images/sec: 1256.7 +/- 5.0 (jitter = 25.3)	nan
90	images/sec: 1257.6 +/- 4.5 (jitter = 25.8)	nan
100	images/sec: 1258.3 +/- 4.1 (jitter = 23.9)	nan
----------------------------------------------------------------
total images/sec: 1252.77
----------------------------------------------------------------

VGG16

 python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32
1	images/sec: 130.0 +/- 0.0 (jitter = 0.0)	7.289
10	images/sec: 130.3 +/- 0.1 (jitter = 0.5)	7.275
20	images/sec: 130.3 +/- 0.1 (jitter = 0.5)	7.215
30	images/sec: 130.4 +/- 0.1 (jitter = 0.5)	7.293
40	images/sec: 130.3 +/- 0.1 (jitter = 0.5)	7.249
50	images/sec: 130.3 +/- 0.1 (jitter = 0.5)	7.318
60	images/sec: 130.3 +/- 0.0 (jitter = 0.5)	7.272
70	images/sec: 130.3 +/- 0.0 (jitter = 0.4)	7.250
80	images/sec: 130.3 +/- 0.0 (jitter = 0.5)	7.264
90	images/sec: 130.2 +/- 0.0 (jitter = 0.5)	7.238
100	images/sec: 130.2 +/- 0.0 (jitter = 0.5)	7.240
----------------------------------------------------------------
total images/sec: 130.13
----------------------------------------------------------------

FP16

python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16  --batch_size 32 --use_fp16

1	images/sec: 192.6 +/- 0.0 (jitter = 0.0)	7.281
10	images/sec: 193.0 +/- 0.2 (jitter = 0.4)	7.308
20	images/sec: 193.2 +/- 0.2 (jitter = 1.0)	7.232
30	images/sec: 193.1 +/- 0.1 (jitter = 0.8)	7.287
40	images/sec: 193.0 +/- 0.1 (jitter = 0.6)	7.295
50	images/sec: 193.0 +/- 0.1 (jitter = 0.6)	7.265
60	images/sec: 193.0 +/- 0.1 (jitter = 0.6)	7.253
70	images/sec: 192.9 +/- 0.1 (jitter = 0.6)	7.264
80	images/sec: 192.9 +/- 0.1 (jitter = 0.5)	7.269
90	images/sec: 192.8 +/- 0.1 (jitter = 0.5)	7.270
100	images/sec: 192.8 +/- 0.1 (jitter = 0.5)	7.249
----------------------------------------------------------------
total images/sec: 192.66
----------------------------------------------------------------

参考

https://qiita.com/syoyo/items/4a9c3e17969757ab5422
ROCm 1.8.2 で TensorFlow upstream(1.9.0-rc0)をコンパイルする試み(CIFAR10 11500 examples/sec @ VEGA56 80W. 2018 年 7 月 22 日時点)

syoyoさんありがとうございました とても参考になりました。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?