05/04/2019 時点の情報です
最新の対応状況は https://www.tensorflow.org/install/gpu#software_requirements
Ubuntu 16.04 を使っています。適宜コマンド等読み替えてください
モチベーション
Tensorflow の公式パッケージが CUDA 10.1 + cuDNN 7.5 に対応してなかったため (05/04/2019)
公式ドキュメント通りでビルドできなかったのでメモの意味も込めて
CUDA
公式:https://developer.nvidia.com/cuda-toolkit-archive
- 上記サイトから自分のOSにあった
dev(local)
をダウンロード - インストール
- 再起動
Ubuntu 16.04 の場合
すでにCUDAがインストールされてる場合
sudo apt-get --purge remove nvidia-* -y
sudo apt-get --purge remove cuda-* -y
dpkg -l | grep '^rc' | awk '{print $2}' | sudo xargs dpkg --purge
sudo apt-get autoremove -y
sudo apt-get autoreclean -y
sudo rm -rf /usr/local/cuda*
共通
cd ~
wget https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1604-10-1-local-10.1.105-418.39_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604-10-1-local-10.1.105-418.39_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.105-418.39/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
sudo reboot
cuDNN
公式:https://developer.nvidia.com/rdp/cudnn-archive
-
cuDNN * ,for CUDA 10.1
内の2つのパッケージをダウンロード&インストール- cuDNN Runtime Library for Ubuntu16.04 (Deb)
- cuDNN Developer Library for Ubuntu16.04 (Deb)
- PATH に追加
Ubuntu 16.04 の場合
(二つのパッケージを置いたディレクトリで)
sudo dpkg -i libcudnn*.deb
sudo vim ~/.bashrc
+ export PATH=/usr/local/cuda-10.1/bin:${PATH}
+ export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:${LD_LIBRARY_PATH}
Tensorflow ビルド
公式:https://www.tensorflow.org/install/source
ビルド準備
sudo apt install python-dev python-pip # or python3-dev python3-pip
pip install -U --user pip six numpy wheel setuptools mock
pip install -U --user keras_applications==1.0.6 --no-deps
pip install -U --user keras_preprocessing==1.0.5 --no-deps
Bazel
公式:https://docs.bazel.build/versions/master/install-ubuntu.html#install-with-installer-ubuntu
- ビルド環境 で Bazel のバージョンを確認する(tensorflow-1.13.1 は Bazel 0.19.2 でした)
-
https://github.com/bazelbuild/bazel/releases からインストーラをダウンロード
- 0.19.2 のインストーラ
- タグ一覧 から探すと早いのでおすすめ
- 実行権限を与えて実行
- PATHに追加
Ubuntu 16.04 の場合
cd ~
wget https://github.com/bazelbuild/bazel/releases/download/0.19.2/bazel-0.19.2-installer-linux-x86_64.sh
chmod +x bazel-0.19.2-installer-linux-x86_64.sh
./bazel-0.19.2-installer-linux-x86_64.sh --user
export PATH="$PATH:$HOME/bin"
Tensorflow を clone
ビルドする Tensorflow のバージョンを公式リポジトリのブランチ一覧から確認
https://github.com/tensorflow/tensorflow/branches
tensorflow-1.13 の場合
git clone https://github.com/tensorflow/tensorflow.git -b r1.13
cd tensorflow
ビルド設定
./configure
対話式に聞いてくるので環境に合わせて答えていく
例
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.19.0 installed.
Please specify the location of python. [Default is /home/node1/anaconda3/bin/python]:
Found possible Python library paths:
/home/node1/anaconda3/lib/python3.7/site-packages
Please input the desired Python library path to use. Default is [/home/node1/anaconda3/lib/python3.7/site-packages]
Do you wish to build TensorFlow with XLA JIT support? [Y/n]:
XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: 10.1
Please specify the location where CUDA 10.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.1
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]:
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.1]:
Do you wish to build TensorFlow with TensorRT support? [y/N]:
No TensorRT support will be enabled for TensorFlow.
Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]:
Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=noignite # Disable Apacha Ignite support.
--config=nokafka # Disable Apache Kafka support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
追加の変更点
https://github.com/tensorflow/tensorflow/issues/26150#issuecomment-469058265 を参考に CUDA10.1 の変更に合わせる
sudo cp /usr/lib/x86_64-linux-gnu/libcublas.so.10.1.0.105 /usr/local/cuda-10.1/lib64/
sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10.1.0.105 /usr/local/cuda-10.1/lib64/libcublas.so.10.1
sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10.1 /usr/local/cuda-10.1/lib64/libcublas.so
sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcusolver.so.10 /usr/local/cuda-10.1/lib64/libcusolver.so.10.1
sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcurand.so.10 /usr/local/cuda-10.1/lib64/libcurand.so.10.1
sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcufft.so.10 /usr/local/cuda-10.1/lib64/libcufft.so.10.1
下は必要な時もあったりなかったり…
sudo cp /usr/include/cublas_v2.h /usr/local/cuda-10.1/include/
ビルド
Bazel ビルド
だいたい1時間はかかるので気長に
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
pip パッケージビルド
--project_name
オプションで名前がつけられる
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg --project_name tensorflow_gpu_1.13_cuda_10.1
インストール
pip install /tmp/tensorflow_pkg/tensorflow_gpu_1.13_cuda_10.1-1.13.1-cp37-cp37m-linux_x86_64.whl