【Ubuntu Server 16.04 LTS】機械学習用にPCを組んでみた その5(TensorFlowのチュートリアルを動かしてみる)

More than 1 year has passed since last update.

tensorflow-gpuを動かしてみます。

gpuで動かすためにはCUDAを導入する必要があるので、まだな方は下記をご参照ください。

【Ubuntu Server 16.04 LTS】機械学習用にPCを組んでみた その2(GTX-1080TiドライバとCUDA9.0をインストール)

そもそもpythonをまだインストールしてないよって方は下記をどうぞ。

【Ubuntu Server 16.04 LTS】機械学習用にPCを組んでみた その4(pythonインストール)


仮想環境を構築

はじめにtensorflow-gpu1.0.1をインストールした仮想環境を構築します。

※tensorflow-gpuのバージョンを1.0.1に指定しているのは普段使っているmacbook airの環境と合わせるためです。特に深い意味はありません。

$ conda create -n tensorflow-1.0.1 tensorflow-gpu=1.0.1

tensorflow-gpu以外は最新のバージョンがインストールされます。これでインストールを進めてよいか確認されるので問題なければそのまま進めます。

Fetching package metadata ...........

Solving package specifications: .

Package plan for installation in environment /home/hoge/.pyenv/versions/anaconda3-5.0.0/envs/tensorflow-1.0.1:

The following NEW packages will be INSTALLED:

ca-certificates: 2017.08.26-h1d4fec5_0
certifi: 2017.7.27.1-py36h8b7b77e_0
cudatoolkit: 7.5-2
cudnn: 5.1-0
intel-openmp: 2018.0.0-h15fc484_7
libedit: 3.1-heed3624_0
libffi: 3.2.1-h4deb6c0_3
libgcc-ng: 7.2.0-h7cc24e2_2
libprotobuf: 3.4.0-0
libstdcxx-ng: 7.2.0-h7a57d05_2
mkl: 2018.0.0-hb491cac_4
ncurses: 6.0-h06874d7_1
numpy: 1.13.3-py36ha12f23b_0
openssl: 1.0.2m-h8cfc7e7_0
pip: 9.0.1-py36h6c6f9ce_4
protobuf: 3.4.0-py36_0
python: 3.6.3-h0ef2715_3
readline: 7.0-hac23ff0_3
setuptools: 36.5.0-py36he42e2e1_0
six: 1.11.0-py36h372c433_1
sqlite: 3.20.1-h6d8b0f3_1
tensorflow-gpu: 1.0.1-py36_4
tk: 8.6.7-h5979e9b_1
wheel: 0.29.0-py36he7f4e38_1
xz: 5.2.3-h2bcbf08_1
zlib: 1.2.11-hfbfcf68_1

Proceed ([y]/n)? y

openssl-1.0.2m 100% |##########################| Time: 0:00:00 11.22 MB/s
python-3.6.3-h 100% |##########################| Time: 0:00:01 25.71 MB/s
numpy-1.13.3-p 100% |##########################| Time: 0:00:00 17.57 MB/s
six-1.11.0-py3 100% |##########################| Time: 0:00:00 12.91 MB/s
protobuf-3.4.0 100% |##########################| Time: 0:00:00 23.47 MB/s
tensorflow-gpu 100% |##########################| Time: 0:00:03 21.53 MB/s
pip-9.0.1-py36 100% |##########################| Time: 0:00:00 19.19 MB/s
#
# To activate this environment, use:
# > source activate tensorflow-1.0.1
#
# To deactivate an active environment, use:
# > source deactivate
#

tensorflowが問題なく動作するか試してみます。

まずは先程作成した仮想環境に切り替えます。

$ source activate tensorflow-1.0.1

次の簡単なプログラムで実行してみます。

import tensorflow as tf

hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello)

下記が実際に実行したときの結果です。

(tensorflow-1.0.1) $ python

Python 3.5.4 |Anaconda, Inc.| (default, Oct 13 2017, 11:22:58)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally
>>> hello = tf.constant('Hello, Tensorflow')
>>> sess = tf.Session()
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn'
t compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn'
t compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.683
pciBusID 0000:02:00.0
Total memory: 10.91GiB
Free memory: 10.75GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0)
>>> print(sess.run(hello))
b'
Hello, Tensorflow'

問題なく動作し、GPUも認識されているようです。


チュートリアルのMNISTを試してみる

最後にチュートリアルを試してみて、macbook airで動かしたときと比較してみます。

それぞれのスペックを載せておきます。

ubuntu(機械学習用)
macbook

CPU
Intel Core i7-7700K 4.2GHz
Intel Core i5 1.6GHz

Memory
32GB
8GB

GPU
GTX 1080ti 11GB
Intel HD Graphics 6000 1536 MB

チュートリアルの内容に関する詳細はこちら

まずはプログラムファイルをGitHubから取得します。

$ git clone https://github.com/tensorflow/tensorflow.git

$ cd tensorflow/tensorflow/examples/tutorials/mnist

そのまま実行してもいいですが、処理時間を計測したいので少し修正します。

$ vim mnist_deep.py

main関数の最初と最後に計測用のコードを追記します。追加するコードは【Python】処理にかかる時間を計測して表示を参照してください。

プログラムを実行します。

$ python mnist_deep.py

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally
test
Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz
Saving graph to: /tmp/tmpr3vt07e8
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn'
t compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn'
t compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.683
pciBusID 0000:02:00.0
Total memory: 10.91GiB
Free memory: 10.75GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0)
step 0, training accuracy 0.14
step 100, training accuracy 0.82
step 200, training accuracy 0.96

:(省略)

step 19900, training accuracy 1
test accuracy 0.9915
elapsed_time:121.55799126625061[sec]

処理時間:121秒

比較のため普段使っているmacbook airで実行してみます。

$ python mnist_deep.py

Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz
Saving graph to: /var/folders/hk/134y674s3t989cbc01hm54r40000gn/T/tmppqu54tya
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn'
t compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn'
t compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
step 0, training accuracy 0.08
step 100, training accuracy 0.88
step 200, training accuracy 0.9

:(省略)

step 19900, training accuracy 1
test accuracy 0.9908
elapsed_time:4494.314824104309[sec]

処理時間:4494秒

約1時間15分と結構な時間がかかります。

やはり本格的に機械学習に取り組むなら専用のPCが必須ですね。