More than 5 years have passed since last update.

GPUを使うTensorflowをmacOSで野良build

Posted at 2018-05-13

流れ

python3.5 の venv 環境で TensorFlow の実験を行うことにした。
適当な作業ディレクトリで

python -m venv P
source P/bin/activate
pip install --upgrade tensorflow

などとする。
pip が自分自身をインストールせよと言い、他に自前の script で利用していたので

pip install --upgrade pip
pip install --upgrade numpy
pip install --upgrade matplotlib
pip install --upgrade opencv-python

などとする（pip自身のインストールは最初にやってしまっていいかもしれない）。

これで、自前の script は動いたが、よく見ると gpu を使っていない。

pip install --upgrade tensorflow-gpu

すればいいような気もしたが、どうも中身が古い。

pip install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.8.0-py3-none-any.whl

するのが正しいかと思えば（現時点最新が 1.8）、公式には macOS での GPU 利用はサポートされなくなっていた。

しょうがないので tensorflow を野良 build してみる。
以下は https://qiita.com/74th/items/fc6ebb684c23f3655e7c を参考にしている。

bazel

tensorflow の build に bazel が使われる。

MacPorts からインストールを試みると、コンパイル終了後 bash completion 作成のためにローカルに git リポジトリを作って commit する箇所がある。
この時、 git を使っていない環境だと

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

を出す。
多分、正しいのは Portfile (/opt/local/var/macports/sources/rsync.macports.org/release/tarballs/ports/devel/bazel/Portfile) を修正して --global 抜きの

git config user.email "you@example.com"
git config user.name "Your Name"

を git init の後に追加しておくことだと思われる。
あるいは、一旦 build 失敗した後で build ディレクトリに cd して上記 git config して、再度 port install でやり直させる。

bash completion 作成時には猛烈に load av. が上がる。

この時、終了しきらない java のプロセスが 1process あり、 bazel がその終了待ちになってインストールプロセスが終了しなくなっていた。
この終了しない java の process の process ID を確認して sudo kill すると MacPorts のインストールプロセスの残りすべて完了する。

tensorflow

git clone https://github.com/tensorflow/tensorflow.git -b v1.8.0

する。前述 venv 環境のまま

./configure

して [y/n] 質問は基本 n で cuda 使用のみ y。
python は venv 環境の python が選ばれているはず。

cuDNN が必要で、 macOS 用のものは cuDNN-7.0.5 の CUDA-9.1 用が存在しているのでこれを使う。 /usr/local/ で tar で展開すればいいが、 cuda/include/ が symlink なので cudnn.h だけ配置に失敗する。適当なディレクトリで展開して手でコピーする。

その他 version などは適当に。

bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package

で build する。

CUDA 実行環境

現時点で、 CUDA の最新は 9.1 であり Xcode の最新が 9.3 になっているが、 CUDA 9.1 の nvcc はそれより後に出た Xcode 9.3 の clang を使えない（バージョンチェックで弾く仕様）。
そのため、 Command Line Tools (macOS 10.13) for Xcode 9.2 を https://developer.apple.com/download/more/ から取得して来てインストールして、

sudo xcode-select --switch /Library/Developer/CommandLineTools

これで nvcc は動くようになったが、なぜか実行時に

./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

のようになる。

この件、結局 http://www.xlr8yourmac.com/archives/apr18/NvidiaCUDAandDriverUpdates.htm から 387.10.10.10.30.107 という nvidia web driver を取得して来てインストールしてやっと解決した（CUDA再インストールでランタイムライブラリとCUDA driverを再インストールしても直らない。環境設定のCUDA Preferenceから387.178にUpdateしても直らない）。
インストール手順は https://devtalk.nvidia.com/default/topic/1025945/mac-cuda-driver-fully-compatible-with-macos-high-sierra-10-13-error-quot-update-required-quot-solved-/?offset=173 である。

macOS で CUDA の実行環境を維持するのは本当に大変。

tensorflow build

前述の bazel コマンドで build。
以下の問題に逢うので tensorflow/workspace.bzl を修正する（eigen_archive の分は修正不要）。

diff --git a/tensorflow/workspace.bzl b/tensorflow/workspace.bzl
index 48728ac131..268e4fe2e6 100644
--- a/tensorflow/workspace.bzl
+++ b/tensorflow/workspace.bzl
@@ -330,11 +330,11 @@ def tf_workspace(path_prefix="", tf_repo_name=""):
   tf_http_archive(
       name = "protobuf_archive",
       urls = [
-          "https://mirror.bazel.build/github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
-          "https://github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
+          "https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
+          "https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
       ],
-      sha256 = "846d907acf472ae233ec0882ef3a2d24edbbe834b80c305e867ac65a1f2c59e3",
-      strip_prefix = "protobuf-396336eb961b75f03b25824fe86cf6490fb75e3a",
+      sha256 = "eb16b33431b91fe8cee479575cee8de202f3626aaf00d9bf1783c6e62b4ffbc7",
+      strip_prefix = "protobuf-50f552646ba1de79e07562b41f3999fe036b4fd0",
   )

   # We need to import the protobuf library under the names com_google_protobuf

TensorFlow 1.8 からは nvidia の NCCL ライブラリが前提のようだが、現時点版（2.2）には macOS 用が存在しない。
古い 1.3.4 が以下に存在しているのでダウンロードして /usr/local/cuda/ に展開して、 tensorflow/third_party/nccl/ に /usr/local/cuda/include/nccl.h への symlink を作っておく。
https://storage.googleapis.com/74thopen/tensorflow_osx/nccl_osx_1.3.4.tar.gz

Qiita の記事にある

以下の3つのソースコードの、__align(sizeof(T))__の文字列を削除する。
tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
tensorflow/core/kernels/split_lib_gpu.cu.cc
tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc

を行う（正しくは、__align__(sizeof(T)) だった）。

bazel による実行、以下のように行うという指示があるが後述のように実行時エラーになる。

export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib
export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH
export PATH=$DYLD_LIBRARY_PATH:$PATH
bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package

ERROR: /Users/shun/pworks/tensorflow/tensorflow/python/BUILD:1435:1: Executing genrule //tensorflow/python:control_flow_ops_pygenrule failed (Aborted): bash failed: error executing command /bin/bash bazel-out/host/genfiles/tensorflow/python/control_flow_ops_pygenrule.genrule_script.sh
dyld: Library not loaded: @rpath/libcudart.9.1.dylib
  Referenced from: /private/var/tmp/_bazel_shun/8a2b32240072a9a79dbc7048c24218fa/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/gen_control_flow_ops_py_wrappers_cc
  Reason: image not found

dlopen で /usr/local/cuda/lib/libcudart.9.1.dylib のオープンに失敗している。理由としては dylib が非標準の場所にあるのでそれを DYLD_LIBRARY_PATH で指定しているがこれがちゃんと渡っていない。
bazel 引数の --action_env でこの環境変数を参照するように指示しているのだが、そもそも環境変数としてセットできていない（shell変数にはなる）。

bash-3.2$ export DYLD_LIBRARY_PATH=AAA
bash-3.2$ echo $DYLD_LIBRARY_PATH
AAA
bash-3.2$ env|grep DYLD_LIBRARY_PATH
bash-3.2$ cat A.sh
# !/bin/sh

echo $DYLD_LIBRARY_PATH
bash-3.2$ /bin/sh A.sh

bash-3.2$ exit

多分 SIP (System Integrity Protection) による問題で、無効化しないとどうにもならなそうである。

https://github.com/tensorflow/tensorflow/issues/6729#issuecomment-279483546
https://rcmdnk.com/blog/2015/10/10/computer-mac/

ただし今の問題は bazel による build 中の dlopen をどうにかすればよいので、
https://developer.apple.com/library/content/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/UsingDynamicLibraries.html にあるように ~/lib ならば SIP 無効化不要のようでもある。

そこで $HOME/lib に /usr/local/cuda/lib/libcudart.9.1.dylib をコピーして作業を続けたがダメだった。

SIP を無効化して build

https://rcmdnk.com/blog/2015/10/10/computer-mac/#sipの無効化に従って command-R を押しながら起動して、ユーティリティからターミナルを起動して、

csrutil disable

して再起動。
この条件で bazel で build すれば dlopen で失敗していた点は回避して先へ進むが

ld: library not found for -lgomp
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.

となる。

https://metakermit.com/2017/compiling-tensorflow-with-gpu-support-on-a-macbook-pro/ に従い

diff --git a/third_party/gpus/cuda/BUILD.tpl b/third_party/gpus/cuda/BUILD.tpl
index 2a37c65bc7..43446dd99b 100644
--- a/third_party/gpus/cuda/BUILD.tpl
+++ b/third_party/gpus/cuda/BUILD.tpl
@@ -110,7 +110,7 @@ cc_library(
         ".",
         "cuda/include",
     ],
-    linkopts = ["-lgomp"],
+    #linkopts = ["-lgomp"],
     linkstatic = 1,
     visibility = ["//visibility:public"],
 )

のようにコメントアウトする。

ここまでの対処で

Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 7071.413s, Critical Path: 220.29s
INFO: Build completed successfully, 5588 total actions

の様に build を完走した。

インストールは以下の様に pip インストール用のパッケージを作成する。

./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Sat May 5 22:59:14 JST 2018 : === Using tmpdir: /var/folders/k2/crc7tsdx0yx1bgbrntyr67pw0000gn/T/tmp.XXXXXXXXXX.WILzRiUZ
~/pworks/tensorflow/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles ~/pworks/tensorflow
~/pworks/tensorflow
/var/folders/k2/crc7tsdx0yx1bgbrntyr67pw0000gn/T/tmp.XXXXXXXXXX.WILzRiUZ ~/pworks/tensorflow
Sat May 5 22:59:29 JST 2018 : === Building wheel
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

error: invalid command 'bdist_wheel'

この様にエラーになるので

pip install --upgrade wheel

必須であった。

./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Sat May 5 23:01:41 JST 2018 : === Using tmpdir: /var/folders/k2/crc7tsdx0yx1bgbrntyr67pw0000gn/T/tmp.XXXXXXXXXX.cJJuECUm
~/pworks/tensorflow/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles ~/pworks/tensorflow
~/pworks/tensorflow
/var/folders/k2/crc7tsdx0yx1bgbrntyr67pw0000gn/T/tmp.XXXXXXXXXX.cJJuECUm ~/pworks/tensorflow
Sat May 5 23:01:55 JST 2018 : === Building wheel
warning: no files found matching '*.dll' under directory '*'
warning: no files found matching '*.lib' under directory '*'
warning: no files found matching '*' under directory 'tensorflow/aux-bin'
warning: no files found matching '*.h' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*' under directory 'tensorflow/include/Eigen'
warning: no files found matching '*' under directory 'tensorflow/include/external'
warning: no files found matching '*.h' under directory 'tensorflow/include/google'
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
warning: no files found matching '*' under directory 'tensorflow/include/unsupported'
~/pworks/tensorflow
Sat May 5 23:02:22 JST 2018 : === Output wheel file is in: /tmp/tensorflow_pkg

この様にして作成したパッケージを

pip install /tmp/tensorflow_pkg/tensorflow-1.8.0-cp35-cp35m-macosx_10_13_x86_64.whl

でインストールする。

動作確認

python
Python 3.5.5 (default, Mar 29 2018, 16:22:58) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2018-05-05 23:15:25.266979: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-05-05 23:15:25.267212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GT 650M major: 3 minor: 0 memoryClockRate(GHz): 0.9
pciBusID: 0000:01:00.0
totalMemory: 1023.69MiB freeMemory: 683.31MiB
2018-05-05 23:15:25.267236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-05 23:15:25.608525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-05 23:15:25.608554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-05 23:15:25.608570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-05 23:15:25.608715: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 421 MB memory) -> physical GPU (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0, compute capability: 3.0)
>>> print(sess.run(hello))
2018-05-05 23:15:35.846650: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
b'Hello, TensorFlow!'
>>> ^D

何故か

tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered

とエラーになっているのだが print() はちゃんと動いている。

>>> import tensorflow as tf
>>> a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
>>> c = tf.matmul(a, b)
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2018-05-05 23:42:04.506645: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-05-05 23:42:04.506881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GT 650M major: 3 minor: 0 memoryClockRate(GHz): 0.9
pciBusID: 0000:01:00.0
totalMemory: 1023.69MiB freeMemory: 715.20MiB
2018-05-05 23:42:04.506903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-05 23:42:04.870686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-05 23:42:04.870715: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-05 23:42:04.870723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-05 23:42:04.870826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 453 MB memory) -> physical GPU (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0, compute capability: 3.0)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0, compute capability: 3.0
2018-05-05 23:42:04.881471: I tensorflow/core/common_runtime/direct_session.cc:284] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0, compute capability: 3.0

>>> print(sess.run(c))
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2018-05-05 23:43:01.452572: I tensorflow/core/common_runtime/placer.cc:886] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-05-05 23:43:01.452593: I tensorflow/core/common_runtime/placer.cc:886] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-05-05 23:43:01.452605: I tensorflow/core/common_runtime/placer.cc:886] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-05-05 23:43:01.452797: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
[[22. 28.]
 [49. 64.]]

ここまでやって、2年くらい前に書いた AUtoEncoder を食わせてみて、 TensorFlow が猛烈にメモリ食うことを思い出した。 MacBook でやるのは無謀そうなので、実際の実験は linux のサーバでやることにする。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up