70
73

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

TensorFlow (GPU版) を Ubuntu にインストールしてみた

Last updated at Posted at 2015-11-16

概要

Googleが公開しているオープンソースの人工知能ライブラリTensorFlowをUbuntuにインストールしたときの記録。
CUDA EnableでCIFAR-10のトレーニングを動かすまで。

公式: http://www.tensorflow.org
Git: https://tensorflow.googlesource.com/tensorflow

マシン構成

  • OS: Ubuntu 14.04 LTS (64bit)
  • Shell: UbuntuデフォルトのBash
  • Python 2.7.6


  • CPU: i7-3770K CPU @ 3.50GHz
  • DDR3 32GB (8GB x 4)
  • Mother board: ASUSTeK P8H77-V
  • GPU: Nvidia GTX970 (ASUSTek)
  • Storage: SSD 128GB (DSSDA-120G-J25C)

セットアップ手順

基本的には公式通りだが自分が行った順番に記録しておく。

(1). Gitからソースツリーの取得

#Gitが入ってない人のみ
$ sudo apt-get install git

$ git clone --recurse-submodules https://github.com/tensorflow/tensorflow

(2). Cuda Toolkit 7.0のインストール

7.5は以降の手順で詰まったので7.0をインストール

下記からUbuntu 14.04 DEB (10KB) ネットワークインストーラー版(cuda-repo-ubuntu1404-7-0-local_7.0-28_amd64.deb)をDL&インストール

https://developer.nvidia.com/cuda-toolkit-70

$ sudo dpkg -i cuda-repo-ubuntu1404-7-0-local_7.0-28_amd64.deb
$ sudo apt-get update
$ sudo apt-get install cuda-7-0 

(3). CUDNN Toolkit 6.5のインストール

CUDNNのDLはNvidiaのサイトで登録する必要あり(しかも登録完了まで2,3日待たされたような気がする)

下記から cuDNN v2 Library for Linux (cudnn-6.5-linux-x64-v2.tgz)をDL&インストール

https://developer.nvidia.com/rdp/cudnn-archive

$ tar xvzf cudnn-6.5-linux-x64-v2.tgz 
$ sudo cp cudnn-6.5-linux-x64-v2/cudnn.h /usr/local/cuda/include
$ sudo cp cudnn-6.5-linux-x64-v2/libcudnn* /usr/local/cuda/lib64

ここでリブート

(4). VirtualEnvのインストールとコンテナ作成

#インストール
$ sudo apt-get install python-pip python-dev python-virtualenv

#コンテナ作成
$ virtualenv --system-site-packages ~/tensorflow-GPU

~/tensorflow-GPU/bin/activate を編集
末尾に下記2行を追加する

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda

(5). TensorFlow インストール

CUDAのライブラリパスが変わった場合に一度だけ下記を実効

#(1)で取得したソースディレクトリに移動
$ cd ~/tensorflow/tensorflow

#実効権限付与
$ chmod x+ ./configure

$ ./configure
Do you wish to bulid TensorFlow with GPU support? [y/n] y
GPU support will be enabled for TensorFlow

Please specify the location where CUDA 7.0 toolkit is installed. Refer to
README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda

Please specify the location where CUDNN 6.5 V2 library is installed. Refer to
README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda

Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Configuration finished

コンテナの有効化
今後、新しいターミナルを立ち上げてtensorflow-GPUコンテナで作業する場合はまず下記で有効化する

$ cd ~/tensorflow-GPU
$ source bin/activate

GPU版 TensorFlow をインストール

(tensorflow-GPU) $ pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl

動かしてみる

(1). MNIST

MNISTを動かそうしたところエラーが出たので(2015/11/15時点)、下記の変更を行う

(tensorflow-GPU) $ cd ~/tensorflow/tensorflow/g3doc/tutorials/mnist/
#置き換えるファイルをリネーム
(tensorflow-GPU) $ mv mnist.py mnist_org.py
# レポジトリから旧版を取得
(tensorflow-GPU) $ wget https://raw.githubusercontent.com/tensorflow/tensorflow/1d76583411038767f673a0c96174c80eaf9ff42f/tensorflow/g3doc/tutorials/mnist/mnist.py

fully_connected_feed.py の 23,24行目を次のようにする

#from tensorflow.g3doc.tutorials.mnist import input_data
#from tensorflow.g3doc.tutorials.mnist import mnist
import input_data
import mnist

動かしてみる

(tensorflow-GPU) $ python fully_connected_feed.py
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:888] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:88] Found device 0 with properties: 
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.22GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:122] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:47] Setting region size to 3144105984
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
Step 0: loss = 2.34 (0.300 sec)
Step 100: loss = 2.13 (0.002 sec)
Step 200: loss = 1.90 (0.002 sec)
Step 300: loss = 1.52 (0.002 sec)
Step 400: loss = 1.22 (0.002 sec)
Step 500: loss = 0.84 (0.002 sec)
Step 600: loss = 0.82 (0.002 sec)
Step 700: loss = 0.68 (0.002 sec)
Step 800: loss = 0.71 (0.002 sec)
Step 900: loss = 0.51 (0.002 sec)
Training Data Eval:
  Num examples: 55000  Num correct: 47651  Precision @ 1: 0.8664
Validation Data Eval:
  Num examples: 5000  Num correct: 4363  Precision @ 1: 0.8726
Test Data Eval:
  Num examples: 10000  Num correct: 8745  Precision @ 1: 0.8745
Step 1000: loss = 0.46 (0.002 sec)
Step 1100: loss = 0.44 (0.038 sec)
Step 1200: loss = 0.52 (0.002 sec)
Step 1300: loss = 0.43 (0.002 sec)
Step 1400: loss = 0.64 (0.002 sec)
Step 1500: loss = 0.34 (0.002 sec)
Step 1600: loss = 0.41 (0.002 sec)
Step 1700: loss = 0.34 (0.002 sec)
Step 1800: loss = 0.30 (0.002 sec)
Step 1900: loss = 0.35 (0.002 sec)
Training Data Eval:
  Num examples: 55000  Num correct: 49286  Precision @ 1: 0.8961
Validation Data Eval:
  Num examples: 5000  Num correct: 4529  Precision @ 1: 0.9058
Test Data Eval:
  Num examples: 10000  Num correct: 9012  Precision @ 1: 0.9012

(2). CIFAR-10
動かしてみる

(tensorflow-GPU) $ cd ~/tensorflow/tensorflow/models/image/cifar10/
(tensorflow-GPU) $ python cifar10_train.py
>> Downloading cifar-10-binary.tar.gz 100.0%
Succesfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:888] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:88] Found device 0 with properties: 
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.20GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:122] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:47] Setting region size to 3120906240
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
2015-11-17 02:14:46.611756: step 0, loss = 4.68 (6.9 examples/sec; 18.481 sec/batch)
2015-11-17 02:14:49.068440: step 10, loss = 4.65 (562.6 examples/sec; 0.228 sec/batch)
2015-11-17 02:14:51.224980: step 20, loss = 4.65 (617.0 examples/sec; 0.207 sec/batch)
2015-11-17 02:14:53.375918: step 30, loss = 4.62 (664.1 examples/sec; 0.193 sec/batch)
2015-11-17 02:14:55.513463: step 40, loss = 4.60 (610.3 examples/sec; 0.210 sec/batch)
2015-11-17 02:14:57.696431: step 50, loss = 4.58 (615.1 examples/sec; 0.208 sec/batch)
2015-11-17 02:14:59.877955: step 60, loss = 4.57 (567.3 examples/sec; 0.226 sec/batch)
2015-11-17 02:15:02.101614: step 70, loss = 4.55 (621.1 examples/sec; 0.206 sec/batch)
2015-11-17 02:15:04.593141: step 80, loss = 4.52 (490.3 examples/sec; 0.261 sec/batch)
2015-11-17 02:15:06.983452: step 90, loss = 4.52 (641.4 examples/sec; 0.200 sec/batch)
2015-11-17 02:15:09.232584: step 100, loss = 4.50 (563.8 examples/sec; 0.227 sec/batch)
2015-11-17 02:15:11.783752: step 110, loss = 4.48 (538.0 examples/sec; 0.238 sec/batch)
2015-11-17 02:15:13.997070: step 120, loss = 4.46 (589.4 examples/sec; 0.217 sec/batch)
2015-11-17 02:15:16.458028: step 130, loss = 4.45 (467.8 examples/sec; 0.274 sec/batch)
2015-11-17 02:15:19.128071: step 140, loss = 4.42 (581.1 examples/sec; 0.220 sec/batch)
2015-11-17 02:15:21.491835: step 150, loss = 4.40 (568.2 examples/sec; 0.225 sec/batch)
2015-11-17 02:15:23.962043: step 160, loss = 4.39 (635.4 examples/sec; 0.201 sec/batch)
...

ちなみにCPU版で実行すると一回のbatchにかかる時間が倍くらいかかるのでGPUを使うことで加速されているようだ

GPU用に tutorials_example_trainer をビルドする

コンテナ環境ではなく通常のシェルで bazel をインストール

# 必要なパッケージのインストール
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ sudo apt-get install pkg-config zip g++ zlib1g-dev unzip

# bazel インストーラーをDL
$ wget https://github.com/bazelbuild/bazel/releases/download/0.1.1/bazel-0.1.1-installer-linux-x86_64.sh

# インストール
$ chmod +x bazel-0.1.1-installer-linux-x86_64.sh 
$ ./bazel-0.1.1-installer-linux-x86_64.sh --user

~/.bashrc を編集して末尾に下記を追加

export PATH="$PATH:$HOME/bin"

下記のパスにファイルを作成する
~/tensorflow/third_party/gpus/cuda/cuda.config

ファイルの中身は下記

CUDA_TOOLKIT_PATH="/usr/local/cuda"
CUDNN_INSTALL_PATH="/usr/local/cuda"

デフォルト設定でインストールしているとシンボリックリンク /usr/local/cuda (=>/usr/local/cuda-7.0)があるはず
インストールパスを変えてる場合はそれにあわせて変更

ビルド前に ./configure を実行しておく

$ cd ~/tensorflow-GPU
$ source bin/activate
(tensorflow-GPU) $ cd ~/tensorflow
(tensorflow-GPU) $ ./configure
Do you wish to bulid TensorFlow with GPU support? [y/n] y
GPU support will be enabled for TensorFlow

Please specify the location where CUDA 7.0 toolkit is installed. Refer to
README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda

Please specify the location where CUDNN 6.5 V2 library is installed. Refer to
README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda

Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Configuration finished

ビルド

(tensorflow-GPU) $ bazel build -c opt --config=cuda tensorflow/cc:tutorials_example_trainer

ビルド完了まで10分ほどかかった

70
73
2

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
70
73

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?