Help us understand the problem. What is going on with this article?

AMD社製GPUを用いたTensorFlow環境構築(Tensorflow導入~サンプル動作編)

More than 1 year has passed since last update.

はじめに

AMD GPUを用いてTensorflowのサンプル動作するまでの過程を記載します。
マイニングマシンからの転用でROCmを用いたTensorFlow環境を構築できるか試してみます。
前回ではROCmの導入をしましたので、
今回はTensorflowの導入~サンプル動作までを行います。


本記事は概要版となります。
詳細はAMD社製GPUを用いたTensorFlow環境構築(Tensorflow導入~サンプル動作編):詳細版で紹介しています。


構成

CPU: Celeron G3930
GPU: Radeon Vega 56
Ubuntu : 18.04 LTS(Kernel 4.15)
ROCm Version: 2.1
Python: 3.6
Tensorflow: 1.12

TensorFlowインストール事前準備

・Python諸々のインストール

sudo apt-get update && sudo apt-get install -y \
python3-numpy \
python3-dev \
python3-wheel \
python3-mock \
python3-future \
python3-pip \
python3-yaml \
python3-setuptools && \
sudo apt-get clean && \
sudo rm -rf /var/lib/apt/lists/*

TensorFlowインストール

・tensorflow-rocmをPipでインストール

pip3 install tensorflow-rocm

TensorFlow実行

Tensorflow動作確認→色々足りませんエラー

python3
>>> import tensorflow
Traceback (most recent call last):
  File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libCXLActivityLogger.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/tk/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libCXLActivityLogger.so: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.
>>> 

調べてみると同様な現象の方々がいた為、こちらを参考に以下コマンドを入力しました。

$ sudo apt-get update && \
      sudo apt-get install -y --allow-unauthenticated \
      rocm-dkms rocm-dev rocm-libs \
      rocm-device-libs \
      hsa-ext-rocr-dev hsakmt-roct-dev hsa-rocr-dev \
      rocm-opencl rocm-opencl-dev \
      rocm-utils \
      rocm-profiler cxlactivitylogger \
      miopen-hip miopengemm

諸々のインストール完了後、再度Tensorflow動作確認→とりあえずは動作しました。

python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
WARNING:tensorflow:From /home/tk/.local/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py:265: ReparameterizationType.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
WARNING:tensorflow:From /home/tk/.local/lib/python3.6/site-packages/tensorflow/python/ops/distributions/bernoulli.py:169: RegisterKL.__init__ (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
>>> 

サンプル動作

・Gitのインストール

sudo apt install git

・Git clone:公式GitHubに従って、クローンしました。

cd ~
git clone https://github.com/tensorflow/models.git

・この中から、CIFAR10というベンチマークを動かしてみました。

cd ~/models/tutorials/image/cifar10
export HIP_VISIBLE_DEVICES=0
python3 ./cifar10_train.py

・結果→無事動作しました。

019-02-12 16:58:19.769456: step 7720, loss = 0.85 (3943.5 examples/sec; 0.032 sec/batch)
2019-02-12 16:58:20.132842: step 7730, loss = 0.87 (3522.4 examples/sec; 0.036 sec/batch)
2019-02-12 16:58:20.468507: step 7740, loss = 0.88 (3813.3 examples/sec; 0.034 sec/batch)
2019-02-12 16:58:20.791237: step 7750, loss = 1.07 (3966.2 examples/sec; 0.032 sec/batch)
2019-02-12 16:58:21.121733: step 7760, loss = 0.90 (3873.0 examples/sec; 0.033 sec/batch)
2019-02-12 16:58:21.487553: step 7770, loss = 0.83 (3499.0 examples/sec; 0.037 sec/batch)
2019-02-12 16:58:21.851375: step 7780, loss = 0.81 (3518.2 examples/sec; 0.036 sec/batch)
2019-02-12 16:58:22.170275: step 7790, loss = 0.87 (4013.8 examples/sec; 0.032 sec/batch)

・GPUの負荷を見てみてもちゃんと動作しているようです。

$ sudo /opt/rocm/bin/rocm-smi -u
========================        ROCm System Management Interface        ========================
================================================================================================
GPU[0]      : Cannot get GPU use.
GPU[1]      : Current GPU use: 64%
================================================================================================
========================               End of ROCm SMI Log              ========================

まとめ

一応ROCm、TensorFlowの導入、サンプル動作まで一通り実現できましたが、
ほかの方のベンチマークを見てみるともっと値が出ていたり、GPUの負荷も結構変動しているように見受けられたので、パラメータ最適化に関しては少し調べてみたいと思います。

ymiura17
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away