More than 5 years have passed since last update.

Chainer/CuPyをAMDGPU上で動かす(インストール～MNISTまで)

Last updated at 2019-12-08Posted at 2018-12-26

(2019/01/12更新)
現在のところ畳み込みなど今日的なニューラルネットで用いられる処理は、実行すると中間言語HIPのコンパイラのエラーにより動作しない状態にあります。
CuPyのHIP対応に向けて、BLAS等の対応の協力を惜しまない次第です。

TL;DR

12月25日のCuPyのアップデートにより、RadeonOpenCompute環境上でもChainer w/ CuPyが動くようになりました。

はじめに

Preferred Network社の提供する深層学習フレームワーク「Chainer」、およびNumPy互換数理計算ライブラリ『CuPy』は、Version5でのAMDGPU向けGPGPU環境RadeonOpenCompute(ROCm)への対応を目指し開発を行っておりましたが、諸般によりVersion5での対応からVersion6での対応へと後ろ倒しとなっておりました。
12月25日、PFN奥田氏のメンテナンスするCuPyブランチ[^1]のアップデートがなされ、ROCmでのCuPyのコンパイルのみならず実行も可能になりました。

CuPyのインストール

CuPyのインストールにあたっては、以下の処理が必要になります。

必要ライブラリのインストール
環境変数の設定

まずは、CuPyの実装の骨格をなす線形代数演算ライブラリBlas、および疎行列向けの線形代数演算ライブラリをインストールします。ROCm環境においては以下のhipblasおよびhipsparseが該当します。

$ sudo apt install hipblas hipsparse

続いて、CuPyのコンパイル、および実行にあたって必要な環境変数を設定しておきます。以下の例ではRadeon RX Vegaを利用する場合の環境変数となっておりますが、利用するGPUに合わせてHCC_AMDGPU_TARGET=に与えるパラメータを変更しましょう(RX 5X0の場合はgfx806、のように)。

export HCC_AMDGPU_TARGET=gfx900
export __HIP_PLATFORM_HCC__

export ROCM_HOME=/opt/rocm
export CUPY_INSTALL_USE_HIP=1
export PATH=$ROCM_HOME/bin:$PATH

必要に合わせ、上記の環境変数はbashrc等に設定しておくとよいでしょう。

続いて、ROCm対応ブランチのCuPyをダウンロードし、レポジトリ内でビルドしインストールします。

git clone https://github.com/okuta/cupy -b support-hip && cd cupy
python setup.py install

Chainerのインストール

CuPyをディレクトリ内でビルドし、ROCm対応CuPyをインストールしたらChainerもインストールしましょう。

git clone https://github.com/chainer/chainer && cd chainer
python setup.py install

動作テスト～MNISTまで

ここからは、CuPyのインストールおよび動作が確認できるかテストしていきましょう。まずはCuPyのExampleで、CuPy自体の動作を確認します。

$ python cupy/examples/gemm/sgemm.py

CuPyがエラーを出力せずGEMMのサンプルコードを実行したことを確認したら、続いてChainer上で動作するかを確認しましょう。
ここでは一例としてMNISTの動作まで確認します。

$ python chainer/examples/mnist/train_mnist.py

（追記）GPUでChainer+CuPyを動かしてみた

実行した環境がそこそこCPUが早かったもので最初気づかなかったのですが、上記のコマンドだとMNISTをGPUで動かしたものではありませんでした。

-g (args)オプションでCuPyの利用するGPUを指定する

必要が有るわけです。そこでGPUを有効化して実行すると・・・

$ python train_mnist.py -g 0

さて、このコマンドを実行すると…

Traceback (most recent call last):
  File "train_mnist.py", line 147, in <module>
    main()
  File "train_mnist.py", line 143, in main
    trainer.run()
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/training/trainer.py", line 336, in run
    entry.extension(self)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/reporter.py", line 109, in scope
    yield
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/training/trainer.py", line 336, in run
    entry.extension(self)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/training/extensions/evaluator.py", line 180, in __call__
    result = self.evaluate()
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/training/extensions/evaluator.py", line 245, in evaluate
    eval_func(in_arrays)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/reporter.py", line 260, in report_scope
    yield
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/training/extensions/evaluator.py", line 241, in evaluate
    eval_func(*in_arrays)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/link.py", line 287, in __call__
    out = forward(*args, **kwargs)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/links/model/classifier.py", line 143, in forward
    self.y = self.predictor(*args, **kwargs)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/link.py", line 287, in __call__
    out = forward(*args, **kwargs)
  File "train_mnist.py", line 27, in forward
    h1 = F.relu(self.l1(x))
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/link.py", line 287, in __call__
    out = forward(*args, **kwargs)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/links/connection/linear.py", line 182, in forward
    self._initialize_params(in_size)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/links/connection/linear.py", line 127, in _initialize_params
    self.W.initialize((self.out_size, in_size))  # type: ignore
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/variable.py", line 1799, in initialize
    self.initializer, shape, xp, device=device)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/initializers/__init__.py", line 74, in generate_array
    initializer(array)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/initializers/normal.py", line 89, in __call__
    Normal(s, rng=self.rng)(array)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/chainer/initializers/normal.py", line 47, in __call__
    array[...] = device.xp.random.normal(**args)
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/cupy-7.0.0rc1-py3.5-linux-x86_64.egg/cupy/random/distributions.py", line 502, in normal
    cupy.multiply(x, scale, out=x)
  File "cupy/core/_kernel.pyx", line 890, in cupy.core._kernel.ufunc.__call__
  File "cupy/core/_kernel.pyx", line 913, in cupy.core._kernel.ufunc._get_ufunc_kernel
  File "cupy/core/_kernel.pyx", line 658, in cupy.core._kernel._get_ufunc_kernel
  File "cupy/core/_kernel.pyx", line 61, in cupy.core._kernel._get_simple_elementwise_kernel
  File "cupy/core/carray.pxi", line 175, in cupy.core.core.compile_with_cache
RuntimeError: Failed to auto-detect CUDA root directory. Please specify `CUDA_PATH` environment variable if you are using CUDA v9.0, v9.1 or versions not yet supported by CuPy.

Cudaのパージョンが見つからないので環境変数で指定して、というエラーが出ましたね。
かなりダーティなやり方ですが、CUDA_PATHにROCmのディレクトリを指定して、再度実行してみましょう。
(PyTorchのROCm版でもインストールでは使われる床しい？手段ですし)

$ CUDA_PATH=/opt/rocm python train_mnist.py -g 0

結果は、というとこちらになります

Traceback (most recent call last):
  File "/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/cupy-7.0.0rc1-py3.5-linux-x86_64.egg/cupy/cuda/compiler.py", line 459, in _run_hipcc
    env=env)
  File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['hipcc', '--genco', '--targets=gfx900', '--flags="-I/mnt/storage/ubuntu_data/PyML3/lib/python3.5/site-packages/cupy-7.0.0rc1-py3.5-linux-x86_64.egg/cupy/core/include -I /opt/rocm/include"', '/tmp/tmp_6oe44if/kern.cpp', '-o', '/tmp/tmp_6oe44if/kern.hsaco']' returned non-zero exit status 2

During handling of the above exception, another exception occurred:

(中略)
1 warning generated.
ld: -lhsa-runtime64 が見つかりません
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)
Compilation failed.
Died at /opt/rocm/bin/hipcc line 360.

エラーが出て終了してしまいました。
取り敢えず今日はここまで

以下、追記予定

[^1]https://github.com/okuta/cupy/tree/support-hip

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up