概要
https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/362
どうやらDocker imageとしてですがROCmでもtensorflow2.0動くみたいなので軽く動かしてみます。
https://hub.docker.com/u/rocm
https://hub.docker.com/r/rocm/tensorflow/tags
rocm2.4-tf2.0-alpha0-config-v2
と言うtagでdocker pullすればimageをpullできそうです。
環境構築
docker-ceのインストールはhttps://qiita.com/tkyonezu/items/0f6da57eb2d823d2611d
を参考にさせていただきました。
docker image pull
sudo docker pull rocm/tensorflow:rocm2.4-tf2.0-alpha0-config-v2
docker imag listはこんな感じになるはずです
REPOSITORY TAG IMAGE ID CREATED SIZE
rocm/tensorflow rocm2.4-tf2.0-alpha0-config-v2 b1746f7b7f09 2 weeks ago 6.53GB
次にDocker runさせます
$ sudo docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/tensorflow:rocm2.4-tf2.0-alpha0-config-v2
コンテナの中の/rootディレクトリは多分こんな感じにになってるはずです
root@a0662b5acae0:/root# ls
bazel-0.19.2-installer-linux-x86_64.sh benchmarks convnet-benchmarks models tensorflow
これからTensorflow-rocm2.0のビルドをします
cd /root/tensorflow
./configure
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.19.2 installed.
Please specify the location of python. [Default is /usr/bin/python]:
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: y
ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.
Do you wish to download a fresh release of clang? (Experimental) [y/N]: n
Clang will not be downloaded.
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=haswell -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=noignite # Disable Apache Ignite support.
--config=nokafka # Disable Apache Kafka support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
/root/tensorflow# ./build_rocm_python3
普通に時間かかるのでしばらくお待ちください
念の為pip3 listを確認します。
pip3 list
Package Version
-------------------- ----------------------
absl-py 0.7.1
astor 0.7.1
attrs 19.1.0
backcall 0.1.0
bleach 3.1.0
chardet 2.3.0
decorator 4.4.0
defusedxml 0.6.0
entrypoints 0.3
enum34 1.1.6
gast 0.2.2
google-pasta 0.1.6
grpcio 1.20.1
h5py 2.9.0
ipykernel 5.1.0
ipython 7.5.0
ipython-genutils 0.2.0
ipywidgets 7.4.2
jedi 0.13.3
Jinja2 2.10.1
jsonschema 3.0.1
jupyter 1.0.0
jupyter-client 5.2.4
jupyter-console 6.0.0
jupyter-core 4.4.0
Keras-Applications 1.0.7
Keras-Preprocessing 1.0.9
Markdown 3.1
MarkupSafe 1.1.1
mistune 0.8.4
mock 3.0.5
nbconvert 5.5.0
nbformat 4.4.0
notebook 5.7.8
numpy 1.16.3
pandocfilters 1.4.2
parso 0.4.0
pexpect 4.7.0
pickleshare 0.7.5
pip 19.1.1
prometheus-client 0.6.0
prompt-toolkit 2.0.9
protobuf 3.7.1
ptyprocess 0.6.0
pycurl 7.43.0
Pygments 2.4.0
pygobject 3.20.0
pyrsistent 0.15.2
python-apt 1.1.0b1+ubuntu0.16.4.2
python-dateutil 2.8.0
pyzmq 18.0.1
qtconsole 4.4.4
requests 2.9.1
Send2Trash 1.5.0
setuptools 41.0.1
six 1.12.0
ssh-import-id 5.5
tb-nightly 1.14.0a20190301
tensorflow 2.0.0a0
termcolor 1.1.0
terminado 0.8.2
testpath 0.4.2
tf-estimator-nightly 1.14.0.dev2019030115
tornado 6.0.2
traitlets 4.3.2
unattended-upgrades 0.1
urllib3 1.13.1
wcwidth 0.1.7
webencodings 0.5.1
Werkzeug 0.15.2
wheel 0.29.0
widgetsnbextension 3.4.2
tensorflow2.0.0a0が入ったことがわかります
rocmバージョンは
# apt show rocm-libs -a
Package: rocm-libs
Version: 2.4.25
Status: install ok installed
Priority: optional
Section: devel
Maintainer: Advanced Micro Devices Inc.
Installed-Size: 13.3 kB
Depends: rocfft, rocrand, hipblas, rocblas
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: unknown
APT-Manual-Installed: yes
APT-Sources: /var/lib/dpkg/status
Description: Radeon Open Compute (ROCm) Runtime software stack
ベンチマーク
ベンチマークの総評としてはTF2.0を使ったからと言って特段性能が高い感じはしませんでした。一々ビルドする手間を考えると普通に1.13.3を使えばいいのではないでしょうか。
inception3
python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model inception3 --batch_size 32
Step Img/sec total_loss
1 images/sec: 119.8 +/- 0.0 (jitter = 0.0) 7.415
10 images/sec: 119.7 +/- 0.1 (jitter = 0.2) 7.394
20 images/sec: 119.4 +/- 0.2 (jitter = 0.3) 7.324
30 images/sec: 119.0 +/- 0.3 (jitter = 0.4) 7.487
40 images/sec: 119.0 +/- 0.2 (jitter = 0.4) 7.353
50 images/sec: 119.1 +/- 0.2 (jitter = 0.4) 7.369
60 images/sec: 119.0 +/- 0.2 (jitter = 0.4) 7.433
70 images/sec: 118.9 +/- 0.2 (jitter = 0.4) 7.317
80 images/sec: 119.0 +/- 0.2 (jitter = 0.5) 7.357
90 images/sec: 118.9 +/- 0.2 (jitter = 0.5) 7.485
100 images/sec: 118.9 +/- 0.1 (jitter = 0.5) 7.431
----------------------------------------------------------------
total images/sec: 118.86
----------------------------------------------------------------
FP16
# python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model inception3 --batch_size 32 --use_fp16
Step Img/sec total_loss
1 images/sec: 148.5 +/- 0.0 (jitter = 0.0) 7.388
10 images/sec: 148.7 +/- 0.3 (jitter = 0.9) 7.390
20 images/sec: 148.4 +/- 0.2 (jitter = 0.6) 7.454
30 images/sec: 148.4 +/- 0.2 (jitter = 0.7) 7.359
40 images/sec: 148.4 +/- 0.1 (jitter = 0.7) 7.338
50 images/sec: 147.9 +/- 0.3 (jitter = 0.8) 7.387
60 images/sec: 148.0 +/- 0.2 (jitter = 0.9) 7.356
70 images/sec: 148.0 +/- 0.2 (jitter = 0.9) 7.441
80 images/sec: 147.9 +/- 0.2 (jitter = 0.9) 7.423
90 images/sec: 147.9 +/- 0.2 (jitter = 0.9) 7.334
100 images/sec: 147.8 +/- 0.2 (jitter = 0.9) 7.308
----------------------------------------------------------------
total images/sec: 147.70
----------------------------------------------------------------
Resnet50
python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32
1 images/sec: 230.0 +/- 0.0 (jitter = 0.0) 8.458
10 images/sec: 232.5 +/- 0.7 (jitter = 2.8) 7.997
20 images/sec: 233.4 +/- 0.4 (jitter = 1.5) 8.260
30 images/sec: 233.4 +/- 0.3 (jitter = 1.2) 8.337
40 images/sec: 233.3 +/- 0.2 (jitter = 1.4) 8.197
50 images/sec: 232.3 +/- 0.5 (jitter = 1.6) 7.759
60 images/sec: 231.9 +/- 0.5 (jitter = 1.6) 8.059
70 images/sec: 231.9 +/- 0.5 (jitter = 1.6) 8.481
80 images/sec: 231.9 +/- 0.5 (jitter = 1.5) 8.279
90 images/sec: 232.1 +/- 0.4 (jitter = 1.7) 8.019
100 images/sec: 231.8 +/- 0.4 (jitter = 1.8) 8.009
----------------------------------------------------------------
total images/sec: 231.63
----------------------------------------------------------------
FP16
python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16
1 images/sec: 325.0 +/- 0.0 (jitter = 0.0) 7.979
10 images/sec: 315.0 +/- 3.2 (jitter = 7.0) 8.049
20 images/sec: 315.9 +/- 1.9 (jitter = 6.6) 8.331
30 images/sec: 316.0 +/- 1.5 (jitter = 7.4) 8.063
40 images/sec: 313.8 +/- 1.7 (jitter = 7.5) 8.678
50 images/sec: 314.4 +/- 1.3 (jitter = 5.4) 8.290
60 images/sec: 314.7 +/- 1.1 (jitter = 4.9) 8.344
70 images/sec: 315.0 +/- 1.0 (jitter = 3.4) 8.160
80 images/sec: 315.2 +/- 0.9 (jitter = 3.3) 8.145
90 images/sec: 315.0 +/- 0.9 (jitter = 3.2) 8.418
100 images/sec: 315.2 +/- 0.8 (jitter = 2.9) 8.296
----------------------------------------------------------------
total images/sec: 314.90
----------------------------------------------------------------
Resnet152
python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet50 --batch_size 32 --use_fp16
1 images/sec: 92.9 +/- 0.0 (jitter = 0.0) 9.936
10 images/sec: 91.8 +/- 0.8 (jitter = 0.4) 9.680
20 images/sec: 91.3 +/- 0.6 (jitter = 0.8) 9.762
30 images/sec: 91.4 +/- 0.4 (jitter = 0.8) 9.945
40 images/sec: 91.7 +/- 0.3 (jitter = 0.7) 9.962
50 images/sec: 91.9 +/- 0.3 (jitter = 0.6) 9.992
60 images/sec: 92.1 +/- 0.2 (jitter = 0.5) 10.279
70 images/sec: 92.1 +/- 0.2 (jitter = 0.5) 9.990
80 images/sec: 92.1 +/- 0.2 (jitter = 0.5) 9.969
90 images/sec: 92.1 +/- 0.2 (jitter = 0.5) 10.196
100 images/sec: 92.0 +/- 0.2 (jitter = 0.6) 10.034
----------------------------------------------------------------
total images/sec: 91.93
----------------------------------------------------------------
FP16
python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model resnet152 --batch_size 32 --use_fp16
1 images/sec: 110.6 +/- 0.0 (jitter = 0.0) 10.107
10 images/sec: 118.1 +/- 1.6 (jitter = 2.0) 9.862
20 images/sec: 119.7 +/- 0.9 (jitter = 1.4) 9.758
30 images/sec: 120.1 +/- 0.7 (jitter = 1.2) 10.020
40 images/sec: 120.3 +/- 0.6 (jitter = 1.2) 9.834
50 images/sec: 120.2 +/- 0.5 (jitter = 1.1) 9.973
60 images/sec: 120.4 +/- 0.5 (jitter = 1.2) 9.637
70 images/sec: 120.4 +/- 0.4 (jitter = 1.2) 9.880
80 images/sec: 120.2 +/- 0.4 (jitter = 1.2) 9.619
90 images/sec: 120.4 +/- 0.4 (jitter = 1.0) 10.036
100 images/sec: 120.3 +/- 0.4 (jitter = 0.9) 10.061
----------------------------------------------------------------
total images/sec: 120.28
----------------------------------------------------------------
ALexnet
python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model alexnet --batch_size 32
1 images/sec: 517.4 +/- 0.0 (jitter = 0.0) 7.205
10 images/sec: 523.8 +/- 2.2 (jitter = 2.4) 7.205
20 images/sec: 526.5 +/- 2.3 (jitter = 4.4) 7.205
30 images/sec: 526.3 +/- 1.7 (jitter = 7.1) 7.205
40 images/sec: 527.3 +/- 1.6 (jitter = 7.1) 7.205
50 images/sec: 526.7 +/- 1.3 (jitter = 6.3) 7.205
60 images/sec: 527.5 +/- 1.3 (jitter = 6.5) 7.205
70 images/sec: 526.4 +/- 1.2 (jitter = 6.9) 7.205
80 images/sec: 526.5 +/- 1.1 (jitter = 6.9) 7.205
90 images/sec: 526.2 +/- 1.0 (jitter = 6.2) 7.205
100 images/sec: 526.3 +/- 0.9 (jitter = 6.6) 7.205
----------------------------------------------------------------
total images/sec: 525.37
----------------------------------------------------------------
FP16
python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model alexnet --batch_size 32 --use_fp16
1 images/sec: 1250.9 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 1268.1 +/- 7.5 (jitter = 28.0) nan
20 images/sec: 1264.7 +/- 4.8 (jitter = 28.3) nan
30 images/sec: 1261.6 +/- 6.0 (jitter = 25.2) nan
40 images/sec: 1265.2 +/- 4.8 (jitter = 22.8) nan
50 images/sec: 1265.3 +/- 4.2 (jitter = 22.8) nan
60 images/sec: 1259.4 +/- 5.4 (jitter = 26.1) nan
70 images/sec: 1255.0 +/- 5.6 (jitter = 26.0) nan
80 images/sec: 1256.7 +/- 5.0 (jitter = 25.3) nan
90 images/sec: 1257.6 +/- 4.5 (jitter = 25.8) nan
100 images/sec: 1258.3 +/- 4.1 (jitter = 23.9) nan
----------------------------------------------------------------
total images/sec: 1252.77
----------------------------------------------------------------
VGG16
python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16 --batch_size 32
1 images/sec: 130.0 +/- 0.0 (jitter = 0.0) 7.289
10 images/sec: 130.3 +/- 0.1 (jitter = 0.5) 7.275
20 images/sec: 130.3 +/- 0.1 (jitter = 0.5) 7.215
30 images/sec: 130.4 +/- 0.1 (jitter = 0.5) 7.293
40 images/sec: 130.3 +/- 0.1 (jitter = 0.5) 7.249
50 images/sec: 130.3 +/- 0.1 (jitter = 0.5) 7.318
60 images/sec: 130.3 +/- 0.0 (jitter = 0.5) 7.272
70 images/sec: 130.3 +/- 0.0 (jitter = 0.4) 7.250
80 images/sec: 130.3 +/- 0.0 (jitter = 0.5) 7.264
90 images/sec: 130.2 +/- 0.0 (jitter = 0.5) 7.238
100 images/sec: 130.2 +/- 0.0 (jitter = 0.5) 7.240
----------------------------------------------------------------
total images/sec: 130.13
----------------------------------------------------------------
FP16
python3 ./tf_cnn_benchmarks.py --num_gpus=1 --model vgg16 --batch_size 32 --use_fp16
1 images/sec: 192.6 +/- 0.0 (jitter = 0.0) 7.281
10 images/sec: 193.0 +/- 0.2 (jitter = 0.4) 7.308
20 images/sec: 193.2 +/- 0.2 (jitter = 1.0) 7.232
30 images/sec: 193.1 +/- 0.1 (jitter = 0.8) 7.287
40 images/sec: 193.0 +/- 0.1 (jitter = 0.6) 7.295
50 images/sec: 193.0 +/- 0.1 (jitter = 0.6) 7.265
60 images/sec: 193.0 +/- 0.1 (jitter = 0.6) 7.253
70 images/sec: 192.9 +/- 0.1 (jitter = 0.6) 7.264
80 images/sec: 192.9 +/- 0.1 (jitter = 0.5) 7.269
90 images/sec: 192.8 +/- 0.1 (jitter = 0.5) 7.270
100 images/sec: 192.8 +/- 0.1 (jitter = 0.5) 7.249
----------------------------------------------------------------
total images/sec: 192.66
----------------------------------------------------------------
参考
https://qiita.com/syoyo/items/4a9c3e17969757ab5422
ROCm 1.8.2 で TensorFlow upstream(1.9.0-rc0)をコンパイルする試み(CIFAR10 11500 examples/sec @ VEGA56 80W. 2018 年 7 月 22 日時点)
syoyoさんありがとうございました とても参考になりました。