はじめに
AMD GPUを用いてTensorflowのサンプル動作するまでの過程を記載します。
マイニングマシンからの転用でROCmを用いたTensorFlow環境を構築できるか試してみます。
前回ではROCmの導入をしましたので、
今回はTensorflowの導入~サンプル動作までを行います。
本記事は概要版となります。
詳細はAMD社製GPUを用いたTensorFlow環境構築(Tensorflow導入~サンプル動作編):詳細版で紹介しています。
構成
CPU: Celeron G3930
GPU: Radeon Vega 56
Ubuntu : 18.04 LTS(Kernel 4.15)
ROCm Version: 2.1
Python: 3.6
Tensorflow: 1.12
TensorFlowインストール事前準備
・Python諸々のインストール
sudo apt-get update && sudo apt-get install -y \
python3-numpy \
python3-dev \
python3-wheel \
python3-mock \
python3-future \
python3-pip \
python3-yaml \
python3-setuptools && \
sudo apt-get clean && \
sudo rm -rf /var/lib/apt/lists/*
TensorFlowインストール
・tensorflow-rocmをPipでインストール
pip3 install tensorflow-rocm
TensorFlow実行
Tensorflow動作確認→色々足りませんエラー
python3
>>> import tensorflow
Traceback (most recent call last):
File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libCXLActivityLogger.so: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libCXLActivityLogger.so: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
>>>
調べてみると同様な現象の方々がいた為、こちらを参考に以下コマンドを入力しました。
$ sudo apt-get update && \
sudo apt-get install -y --allow-unauthenticated \
rocm-dkms rocm-dev rocm-libs \
rocm-device-libs \
hsa-ext-rocr-dev hsakmt-roct-dev hsa-rocr-dev \
rocm-opencl rocm-opencl-dev \
rocm-utils \
rocm-profiler cxlactivitylogger \
miopen-hip miopengemm
諸々のインストール完了後、再度Tensorflow動作確認→とりあえずは動作しました。
python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
WARNING:tensorflow:From /home/tk/.local/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py:265: ReparameterizationType.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
WARNING:tensorflow:From /home/tk/.local/lib/python3.6/site-packages/tensorflow/python/ops/distributions/bernoulli.py:169: RegisterKL.__init__ (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
>>>
サンプル動作
・Gitのインストール
sudo apt install git
・Git clone:公式GitHubに従って、クローンしました。
cd ~
git clone https://github.com/tensorflow/models.git
・この中から、CIFAR10というベンチマークを動かしてみました。
cd ~/models/tutorials/image/cifar10
export HIP_VISIBLE_DEVICES=0
python3 ./cifar10_train.py
・結果→無事動作しました。
019-02-12 16:58:19.769456: step 7720, loss = 0.85 (3943.5 examples/sec; 0.032 sec/batch)
2019-02-12 16:58:20.132842: step 7730, loss = 0.87 (3522.4 examples/sec; 0.036 sec/batch)
2019-02-12 16:58:20.468507: step 7740, loss = 0.88 (3813.3 examples/sec; 0.034 sec/batch)
2019-02-12 16:58:20.791237: step 7750, loss = 1.07 (3966.2 examples/sec; 0.032 sec/batch)
2019-02-12 16:58:21.121733: step 7760, loss = 0.90 (3873.0 examples/sec; 0.033 sec/batch)
2019-02-12 16:58:21.487553: step 7770, loss = 0.83 (3499.0 examples/sec; 0.037 sec/batch)
2019-02-12 16:58:21.851375: step 7780, loss = 0.81 (3518.2 examples/sec; 0.036 sec/batch)
2019-02-12 16:58:22.170275: step 7790, loss = 0.87 (4013.8 examples/sec; 0.032 sec/batch)
・GPUの負荷を見てみてもちゃんと動作しているようです。
$ sudo /opt/rocm/bin/rocm-smi -u
======================== ROCm System Management Interface ========================
================================================================================================
GPU[0] : Cannot get GPU use.
GPU[1] : Current GPU use: 64%
================================================================================================
======================== End of ROCm SMI Log ========================
まとめ
一応ROCm、TensorFlowの導入、サンプル動作まで一通り実現できましたが、
ほかの方のベンチマークを見てみるともっと値が出ていたり、GPUの負荷も結構変動しているように見受けられたので、パラメータ最適化に関しては少し調べてみたいと思います。