- MacBook Pro (15-inch, 2016)
- macOS High Sierra 10.13.6(17G65)
- GeForce GTX 1080 (外付VGA BOX GV-N1080IXEB-8GD)
- NVIDIA web Driver 387.
- CUDA Driver 410.130
- Xcode 8.3.2
- python 3.6.6/2.7.15 (pyenv, pyenv-virtualenv)
- CUDA Toolkit 10.0
- cuDNN v7.3.1 (Sept 28, 2018), for CUDA 10.0
- NCCL 2.3.5
[ 環境セットアップ ]
- [ install Homebrew ]
- /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
- [ install pyenv, pyenv-virtualenv ]
- brew install pyenv
- brew install pyenv-virtualenv
- [ install python 3.6.6 ]
- pyenv install 3.6.6
- pyenv virtualenv 3.6.6 tensorflow_python366
- pyenv activate tensorflow_python366
- pip install で、必要なモジュールをインストール
ネイティブの場合は、インストールで、適宜 sudo の使用。
- [ download Tensorflow source code ]
- git clone https://github.com/tensorflow/tensorflow
- cd ./tensorflow
- git checkout r1.10 (r.1.11 でも同様の手順でbuildできました。)
- この時点で、tensorflow cifar10 が動作可能なことを確認しておく。
- [ CUDA Toolkit 10.0 Download & install ]
- [ Download cuDNN v7.3.1 (Sept 28, 2018), for CUDA 10.0 ]
- cuDNN v7.3.1 Library for OSX
- https://developer.nvidia.com/rdp/cudnn-download
- tar xvf cudnn-10.0-osx-x64-v7.3.1.20.tar (cd ~/Downloads)
- sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
- sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib/
- sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
- [ Download NCCL v2.3.5, for CUDA 10.0, Sept 25, 2018 ]
- https://developer.nvidia.com/nccl/nccl-download
- NCCL 2.3.5 O/S agnostic and CUDA 10.0
- tar xvf nccl_2.3.5-2+cuda10.0_x86_64.txz (cd ~/Downloads)
- cd nccl_2.3.5-2+cuda10.0_x86_64/lib
- sudo mv * /Developer/NVIDIA/CUDA-10.0/lib/
- cd nccl_2.3.5-2+cuda10.0_x86_64/include
- sudo mv nccl.h /Developer/NVIDIA/CUDA-10.0/include/
- cd tensorflow/third_party/nccl/
- ln -s /Developer/NVIDIA/CUDA-10.0/include/nccl.h
- 注)third_party/nccl/nccl_configure.bzl に下記のようにあるが、設定するとlibが見つからずエラーになる
`nccl_configure` depends on the following environment variables:
* `TF_NCCL_VERSION`: The NCCL version.
* `NCCL_INSTALL_PATH`: The installation path of the NCCL library.
- [ Command Line Tools for Xcode 8.3.2 ]
- https://developer.apple.com/download/more/
- CommandLineToolsforXcode8.3.2.dmg
code 修正
特定のファイルの __align__(sizeof(T))の記述を削除
- tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
- tensorflow/core/kernels/split_lib_gpu.cu.cc
- tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
linkopts = [“-lgomp”]を削除
- tensorflow/third_party/gpus/cuda/BUILD.tpl
constexpr Variant() noexcept = default; // "constexpr" を削除
- tensorflow/core/framework/variant.h
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib:/Developer/NVIDIA/CUDA-10.0/lib
export PATH=/Developer/NVIDIA/CUDA-10.0/bin${PATH:+:${PATH}}
configure & build
- SIPの無効化
- ./configure
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 10.0
Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.3.1
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]: 6.1
- bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
INFO: Elapsed time: 4855.958s, Critical Path: 245.05s
INFO: 5073 processes: 5073 local.
INFO: Build completed successfully, 5262 total actions
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg2
pip install /tmp/tensorflow_pkg/tensorflow-1.10.1-cp36-cp36m-macosx_10_13_x86_64.whl
cd models/tutorials/image/cifar10
python cifar10_train.py
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
2018-10-05 23:04:13.815498: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:858] OS X does not support NUMA - returning NUMA node zero
2018-10-05 23:04:13.815687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:46:00.0
totalMemory: 8.00GiB freeMemory: 5.71GiB
2018-10-05 23:04:13.815706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-10-05 23:04:14.187784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-05 23:04:14.187826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-10-05 23:04:14.187831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2018-10-05 23:04:14.188000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:46:00.0, compute capability: 6.1)
2018-10-05 23:04:17.923348: step 0, loss = 4.68 (139.6 examples/sec; 0.917 sec/batch)
2018-10-05 23:04:18.492055: step 10, loss = 4.60 (2250.7 examples/sec; 0.057 sec/batch)
2018-10-05 23:04:18.914545: step 20, loss = 4.62 (3029.7 examples/sec; 0.042 sec/batch)
2018-10-05 23:04:19.332597: step 30, loss = 4.60 (3061.8 examples/sec; 0.042 sec/batch)
2018-10-05 23:04:19.756255: step 40, loss = 4.38 (3021.3 examples/sec; 0.042 sec/batch)
2018-10-05 23:04:20.198415: step 50, loss = 4.42 (2894.9 examples/sec; 0.044 sec/batch)
2018-10-05 23:04:20.622042: step 60, loss = 4.40 (3021.5 examples/sec; 0.042 sec/batch)
2018-10-05 23:04:21.040028: step 70, loss = 4.21 (3062.3 examples/sec; 0.042 sec/batch)
2018-10-05 23:04:21.449060: step 80, loss = 4.19 (3129.3 examples/sec; 0.041 sec/batch)
2018-10-05 23:04:21.868983: step 90, loss = 4.12 (3048.2 examples/sec; 0.042 sec/batch)
2018-10-05 23:04:22.424941: step 100, loss = 4.13 (2302.3 examples/sec; 0.056 sec/batch)
2018-10-05 23:04:22.865714: step 110, loss = 4.14 (2904.0 examples/sec; 0.044 sec/batch)
2018-10-05 23:04:23.308738: step 120, loss = 4.08 (2889.2 examples/sec; 0.044 sec/batch)
2018-10-05 23:04:23.761966: step 130, loss = 3.85 (2824.2 examples/sec; 0.045 sec/batch)
2018-10-05 23:04:24.201389: step 140, loss = 4.00 (2912.9 examples/sec; 0.044 sec/batch)
2018-10-05 23:04:24.621869: step 150, loss = 3.95 (3044.1 examples/sec; 0.042 sec/batch)
2018-10-05 23:04:25.062640: step 160, loss = 3.93 (2904.0 examples/sec; 0.044 sec/batch)
2018-10-05 23:04:25.516130: step 170, loss = 3.92 (2822.6 examples/sec; 0.045 sec/batch)
2018-10-05 23:04:25.967363: step 180, loss = 3.88 (2836.7 examples/sec; 0.045 sec/batch)
2018-10-05 23:04:26.433383: step 190, loss = 3.72 (2746.7 examples/sec; 0.047 sec/batch)
2018-10-05 23:04:27.021799: step 200, loss = 3.90 (2175.3 examples/sec; 0.059 sec/batch)
python 2.7.15 の場合、mock を要求されたので、インストール。
それ以外は、同様に build できました。(NVIDIA のイーラーニングで必要だったので。)
pip install mock