More than 5 years have passed since last update.

TensorFlowをインストールする

Last updated at 2016-09-04Posted at 2016-08-29

環境：
CPU:Corei7 6700K
GPU:GTX1070
SSD:240GB
HDD:1TB
マザーボード:ASUS H170-pro
OS:Ubuntu14.04 LTS
python:2.7.6
CUDA:8.0 RC
cuDNN:5.1
など

過去３回で、Ubuntu14.04、CUDA、chainer、dqn、LISを順次インストールした。
http://qiita.com/masataka46/items/94417a5974dba810e7b8
http://qiita.com/masataka46/items/fddef236cb211ef3f145
http://qiita.com/masataka46/items/125c7900ec8ca83f6eb2
ただし、Unityがうまく立ち上がらないため、最後のLIS環境構築は完了していない。しかし先を急ぐ。

次はTensorFlowのインストール。

TensorFlowをインストールする

公式ドキュメント
https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html#download-and-setup
に従ってインストールする。

まず、The GPU version (Linux only) works best with Cuda Toolkit 7.5 and cuDNN v4. other versions are supported (Cuda toolkit >= 7.0 and cuDNN 6.5(v2), 7.0(v3), v5) only when installing from sources.だそう。面倒くさくなりそう。

GitHubからsourceをゲットする

まず、GitHubからsourceをcloneする。

git clone https://github.com/tensorflow/tensorflow

Bazelをインストールする

次にBazelというものをインストールする。まず公式サイト
https://www.bazel.io/versions/master/docs/install.html
を参考に進める。

dependenciesとしてJava JDK8といものが必要なようだ。そこでここ
http://qiita.com/niusounds/items/1f32dcd6fa1f57ade98a
を参考にインストールする。

sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

次にBazelのURIをパッケージソースとして追加するみたい。

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -

ようやくBazelをインストールする。直後にupdateも行う。

sudo apt-get update && sudo apt-get install bazel
sudo apt-get upgrade bazel

よくよくTensorFlowのHPを読むと、BazelはInstallerを使えと書かれている。まぁ、このへんは気にせずに進む。

他のdependenciesをインストールする

HPのままコマンドを打つ。numpyとか入ってるはずだが気にしない。

sudo apt-get install python-numpy swig python-dev python-wheel

このあとCUDAとcuDNNのインストール手順が書かれているが、このへんは既にインストールされているので割愛。

Configureを実行する

tensorflowのrootにあるconfigureを実行する。

./configure
Please specify the location of python. [Default is /usr/bin/python]: 
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Found possible Python library paths:
  /home/ohmasa/python-codec/src
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/home/ohmasa/python-codec/src]
/usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 
Please specify the location where CUDA  toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-8.0
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 
Please specify the location where cuDNN  library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-8.0]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
Default is: "3.5,5.2": 3.5,6.1

なんかいろいろ聞いてきた。

まずPython library pathsをどこにするか悩ましい。cythonとかnumpyとか調べたら/usr/local/lib/・・・っぽいので、それを入力。

次にCudaのSDKヴァージョンを入力すべきかどうか。8.0だが、defaultでsystem defaultを使う、とあるので、enter押してそれにする。

次にcudaのlocation。defaultは/usr/local/cudaだが、PATHにあるのは/usr/local/cuda-8.0なので、そちらにしてみる。

最後にcuda compute capability。nVIDIAのHPで調べたら6.1なのだが、書き方の例は3.5,5.2になっている。とりあえず3.5,6.1にしてみる。

結果、

All external dependencies fetched successfully.
Configuration finished

と表示されたので、うまくいったか？

ビルドする

次はGPUヴァージョンのビルドをする。

bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer

としたところ、

/home/ohmasa/tensorflow/tensorflow/tensorflow/core/kernels/BUILD:878:1: undeclared inclusion(s) in rule '//tensorflow/core/kernels:crop_and_resize_op_gpu':
this rule is missing dependency declarations for the following files included by 'tensorflow/core/kernels/crop_and_resize_op_gpu.cu.cc':
  '/usr/local/cuda-8.0/include/cuda_runtime.h'
  '/usr/local/cuda-8.0/include/host_config.h'
  '/usr/local/cuda-8.0/include/builtin_types.h'

などと/usr/loca/cude-8.0/include/下のファイルが多く表示され、最後に

nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'.
nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'.
Target //tensorflow/cc:tutorials_example_trainer failed to build
Use --verbose_failures to see the command lines of failed build steps.

と出た。失敗した原因はcudaのPATHがおかしいことだろうか？

そこで再度location of cudaをdefaultの/usr/local/cudaにして./configureし、buildを実行。しかし、それもダメ。

今度はcudaのPATHを再び/usr/local/cuda-8.0にし、cudaのヴァージョンを8.0と指定したところ、途中でnvccでwarningがいっぱい出たが、なんとかError無く終了。最終的な入力項目はこれ。

ohmasa@ohmasa-com:~/tensorflow/tensorflow$ ./configure
Please specify the location of python. [Default is /usr/bin/python]: 
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Found possible Python library paths:
  /home/ohmasa/python-codec/src
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/home/ohmasa/python-codec/src]
/usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-8.0
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5.0
Please specify the location where cuDNN 5.0 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-8.0]: 
Invalid path to cuDNN  toolkit. Neither of the following two files can be found:
/usr/local/cuda-8.0/lib64/libcudnn.so.5.0
/usr/local/cuda-8.0/libcudnn.so.5.0
.5.0
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5.1
Please specify the location where cuDNN 5.1 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-8.0]: 
Invalid path to cuDNN  toolkit. Neither of the following two files can be found:
/usr/local/cuda-8.0/lib64/libcudnn.so.5.1
/usr/local/cuda-8.0/libcudnn.so.5.1
.5.1
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 
Please specify the location where cuDNN  library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-8.0]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.Default is: "3.5,5.2": 3.5,6.1

最後にexampleを走らせる。

bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu

すると、

000003/000000 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
000000/000006 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
000009/000001 lambda =      nan x = [     nan      nan] y = [     nan      nan]
000000/000007 lambda =      nan x = [     nan      nan] y = [     nan      nan]
000007/000002 lambda =      nan x = [     nan      nan] y = [     nan      nan]
000007/000002 lambda =      nan x = [     nan      nan] y = [     nan      nan]
000000/000007 lambda =      nan x = [     nan      nan] y = [     nan      nan]
000000/000007 lambda =      nan x = [     nan      nan] y = [     nan      nan]

などとnanがいっぱい表示されたが、気にしない。

pipパッケージのインストール

次にpipパッケージをcreateして、インストールするらしい。

bazel build -c opt //tensorflow/tools/pip_package:build_pip_package

とした後で、その下のGPUヴァージョンのコマンドに気づいて

bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

とした。更によくわからないコマンドを打つ。

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

すると、

pip_package /tmp/tensorflow_pkg
2016年 9月 5日 月曜日 05:39:57 JST : === Using tmpdir: /tmp/tmp.4VdylwFmKc
/tmp/tmp.4VdylwFmKc ~/tensorflow/tensorflow
2016年 9月 5日 月曜日 05:39:57 JST : === Building wheel
~/tensorflow/tensorflow
2016年 9月 5日 月曜日 05:40:22 JST : === Output wheel file is in: /tmp/tensorflow_pkg

などと表示された。最後に

sudo pip install /tmp/tensorflow_pkg/tensorflow-0.10.0rc0-cp27-none-linux_x86_64.whl

とすると、またまたwarningは出てるがsuccessfully installedと出たので気にしない。

テストする

インストール出来てるかどうか、HPの指示に従ってテストする。

cd tensorflow/models/image/mnist
python convolutional.py

としたところ、以下のように表示された。

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties: 
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.797
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.56GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
Initialized!
Step 0 (epoch 0.00), 6.4 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 6.1 ms
Minibatch loss: 3.300, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 6.9%
Step 200 (epoch 0.23), 5.7 ms
Minibatch loss: 3.469, learning rate: 0.010000
Minibatch error: 12.5%
Validation error: 3.8%
Step 300 (epoch 0.35), 5.8 ms
Minibatch loss: 3.214, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 3.1%
Step 400 (epoch 0.47), 5.9 ms
Minibatch loss: 3.183, learning rate: 0.010000
Minibatch error: 7.8%
Validation error: 2.7%
・・・

deviceが０となっているのが気になるが、GeForce GTX 1070と出てるので、ちゃんとGPUが稼働してるのか？

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up