More than 5 years have passed since last update.

TensorFlowが正式にWindowsサポートしてGPUが使えたので試してみた

Last updated at 2019-03-03Posted at 2016-11-30

Google から正式に Tensorflow が Windows 対応して GPU が使えるとのアナウンスがありました。
https://developers.googleblog.com/2016/11/tensorflow-0-12-adds-support-for-windows.html

セットアップ環境

OS) Windows 10 Pro
GPU) NVIDIA GeForce GTX 960

この環境で TensorFlow を試しみたいと思います。なるべくDドライブにセットアップするようにしています。

CUDA と cuDNN をインストール

CUDA Toolkit 8.0

Operating System: Windows
Architecture: x86_64
Version: 10
Installer Type: exe (network)

普通にインストーラを実行しました。

cuDNN v5.1

開発者アカウントを登録して利用規約に同意してダウンロードしました。

https://developer.nvidia.com/rdp/cudnn-download
cuDNN v5.1 Library for Windows 10 を落として zip 内の cuda フォルダ内を C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0 にコピー展開します。

Python 3.5 のセットアップ

Windows の場合 Anaconda を使った方が後々楽なのでこちらを奨めます。 (2017/01/08 追記)
http://tilfin.hatenablog.com/entry/2017/01/08/220556 内に記載してます。

普通にトップページからダウンロードすると 32bit 版なので、https://www.python.org/downloads/windows/ から Download Windows x86-64 executable installer をダウンロードしてインストールします。
※ なお、 D:\Python35 にインストールしました。

3.5系は環境変数の追加や pip も同時に入れてくれました。
PowerShell から一応 pip のアップグレードもしました。

virtualenv をインストール

実行環境用の作業フォルダを作れるモジュールを入れます。

PS D:\> pip install --upgrade virtualenv
Collecting virtualenv
  Downloading virtualenv-15.1.0-py2.py3-none-any.whl (1.8MB)
    100% |################################| 1.8MB 646kB/s
Installing collected packages: virtualenv
Successfully installed virtualenv-15.1.0

TensorFlow のパッケージをインストール

PowerShell から pip で tensorflow と tensorflow-gpu を入れます。
https://pypi.python.org/pypi/tensorflow

tensorflow

PS D:\> pip install tensorflow
Processing y:\tensorflow-0.12.0rc0-cp35-cp35m-win_amd64.whl
Collecting six>=1.10.0 (from tensorflow==0.12.0rc0)
  Using cached six-1.10.0-py2.py3-none-any.whl
Collecting protobuf==3.1.0 (from tensorflow==0.12.0rc0)
  Downloading protobuf-3.1.0-py2.py3-none-any.whl (339kB)
    100% |################################| 348kB 2.2MB/s
Collecting wheel>=0.26 (from tensorflow==0.12.0rc0)
  Using cached wheel-0.29.0-py2.py3-none-any.whl
Collecting numpy>=1.11.0 (from tensorflow==0.12.0rc0)
  Downloading numpy-1.11.2-cp35-none-win_amd64.whl (7.6MB)
    100% |################################| 7.6MB 179kB/s
Requirement already satisfied (use --upgrade to upgrade): setuptools in d:\python35\lib\site-packages (from protobuf==3
1.0->tensorflow==0.12.0rc0)
Installing collected packages: six, protobuf, wheel, numpy, tensorflow
Successfully installed numpy-1.11.2 protobuf-3.1.0 six-1.10.0 tensorflow-0.12.0rc0 wheel-0.29.0

tensorflow-gpu

PS D:\> pip install tensorflow-gpu
Collecting tensorflow-gpu
  Downloading tensorflow_gpu-0.12.0rc0-cp35-cp35m-win_amd64.whl (32.5MB)
    100% |################################| 32.5MB 40kB/s
Requirement already satisfied: wheel>=0.26 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: numpy>=1.11.0 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: six>=1.10.0 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: protobuf==3.1.0 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: setuptools in d:\python35\lib\site-packages (from protobuf==3.1.0->tensorflow-gpu)
Installing collected packages: tensorflow-gpu
Successfully installed tensorflow-gpu-0.12.0rc0

実行開始

virtualenv で D:\tensorflow に作業フォルダを作ります。

PS D:\> virtualenv --system-site-packages D:\tensorflow
Using base prefix 'd:\\python35'
New python executable in D:\tensorflow\Scripts\python.exe
Installing setuptools, pip, wheel...done.

学習解析サンプルを用意する

適当なところで
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
します。自分は普段 VirtualBox の Linux を動かしていて SMB でファイル共有するのでそちらでクローンしました。
tensorflow/tensorflow/models を D:\tensorflow\models となるようにコピーします。

MNIST を試す

手書き数字の解析プログラムを試してみます。

PS D:\> cd tensorflow\tensorflow\models\image\mnist

PS D:\tensorflow\models\image\mnist> python convolutional.py

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library curand64_80.dll locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data\train-images-idx3-ubyte.gz
Extracting data\train-labels-idx1-ubyte.gz
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] F
ound device 0 with properties:
name: GeForce GTX 960
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.64GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] D
MA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0
:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] C
reating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] C
ould not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been bu
ilt with NUMA support.
Initialized!
Step 0 (epoch 0.00), 50.9 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 12.1 ms
Minibatch loss: 3.226, learning rate: 0.010000
Minibatch error: 4.7%
Validation error: 7.3%
Step 200 (epoch 0.23), 12.0 ms
Minibatch loss: 3.404, learning rate: 0.010000
Minibatch error: 10.9%
（省略）
Minibatch loss: 1.609, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 1.0%
Test error: 0.8%

割と早く終わったので GPU が効いているのでしょう。上手く動いたことは確認できましたが、わかりやすい ImageNet を次に試します。

ImageNet を試す

画像を解析して何の画かを当てる ImageNet です。

PS D:> cd \tensorflow\models\imagenet

まず準備です。 python .\classify_image.py を実行します。

PS D:\tensorflow\models\image\imagenet> python .\classify_image.py
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_8
0.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.
dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80
.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll
 locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_8
0.dll locally
>> Downloading inception-2015-12-05.tgz 100.0%
Successfully downloaded inception-2015-12-05.tgz 88931400 bytes.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 960
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.64GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0)
 -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /jo
b:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_def_util.cc:332] Op BatchNormWithGlobalNormalization is depr
ecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate
 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89233)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00859)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00264)
custard apple (score = 0.00141)
earthstar (score = 0.00107)

適当に M:\fuji.jpg に富士山の写真をおきました。
python .\classify_image.py --image_file M:\fuji.jpg で解析させます。

PS D:\tensorflow\models\image\imagenet> python .\classify_image.py --image_file M:\fuji.jpg
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_8
0.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.
dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80
.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll
 locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_8
0.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 960
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.64GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0)
 -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /jo
b:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_def_util.cc:332] Op BatchNormWithGlobalNormalization is depr
ecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate
 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
volcano (score = 0.91087)
fire screen, fireguard (score = 0.00192)
alp (score = 0.00162)
lakeside, lakeshore (score = 0.00130)
geyser (score = 0.00077)

volcano (score = 0.91087) 火山と認識されましたね。ちなみに写真は雪化粧してる富士山でした。

W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate
 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

と警告が出ていたのでもっとメモリがあるといいのでしょう。

とりあえず特に嵌らずに動いたのでみなさんもお試しください。

tilfin's note よりクロスポスト

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up