Help us understand the problem. What is going on with this article?

TensorFlowが正式にWindowsサポートしてGPUが使えたので試してみた

More than 1 year has passed since last update.

Google から正式に Tensorflow が Windows 対応して GPU が使えるとのアナウンスがありました。

https://developers.googleblog.com/2016/11/tensorflow-0-12-adds-support-for-windows.html

セットアップ環境

  • OS) Windows 10 Pro
  • GPU) NVIDIA GeForce GTX 960

この環境で TensorFlow を試しみたいと思います。なるべくDドライブにセットアップするようにしています。

CUDA と cuDNN をインストール

CUDA Toolkit 8.0

https://developer.nvidia.com/cuda-downloads

  • Operating System: Windows
  • Architecture: x86_64
  • Version: 10
  • Installer Type: exe (network)

普通にインストーラを実行しました。

cuDNN v5.1

開発者アカウントを登録して利用規約に同意してダウンロードしました。

https://developer.nvidia.com/rdp/cudnn-download
cuDNN v5.1 Library for Windows 10 を落として zip 内の cuda フォルダ内を C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0 にコピー展開します。

Python 3.5 のセットアップ

Windows の場合 Anaconda を使った方が後々楽なのでこちらを奨めます。 (2017/01/08 追記)
http://tilfin.hatenablog.com/entry/2017/01/08/220556 内に記載してます。

普通にトップページからダウンロードすると 32bit 版なので、https://www.python.org/downloads/windows/ から Download Windows x86-64 executable installer をダウンロードしてインストールします。
※ なお、 D:\Python35 にインストールしました。

3.5系は環境変数の追加や pip も同時に入れてくれました。
PowerShell から一応 pip のアップグレードもしました。

virtualenv をインストール

実行環境用の作業フォルダを作れるモジュールを入れます。

PS D:\> pip install --upgrade virtualenv
Collecting virtualenv
  Downloading virtualenv-15.1.0-py2.py3-none-any.whl (1.8MB)
    100% |################################| 1.8MB 646kB/s
Installing collected packages: virtualenv
Successfully installed virtualenv-15.1.0

TensorFlow のパッケージをインストール

PowerShell から pip で tensorflow と tensorflow-gpu を入れます。
https://pypi.python.org/pypi/tensorflow

tensorflow

PS D:\> pip install tensorflow
Processing y:\tensorflow-0.12.0rc0-cp35-cp35m-win_amd64.whl
Collecting six>=1.10.0 (from tensorflow==0.12.0rc0)
  Using cached six-1.10.0-py2.py3-none-any.whl
Collecting protobuf==3.1.0 (from tensorflow==0.12.0rc0)
  Downloading protobuf-3.1.0-py2.py3-none-any.whl (339kB)
    100% |################################| 348kB 2.2MB/s
Collecting wheel>=0.26 (from tensorflow==0.12.0rc0)
  Using cached wheel-0.29.0-py2.py3-none-any.whl
Collecting numpy>=1.11.0 (from tensorflow==0.12.0rc0)
  Downloading numpy-1.11.2-cp35-none-win_amd64.whl (7.6MB)
    100% |################################| 7.6MB 179kB/s
Requirement already satisfied (use --upgrade to upgrade): setuptools in d:\python35\lib\site-packages (from protobuf==3
1.0->tensorflow==0.12.0rc0)
Installing collected packages: six, protobuf, wheel, numpy, tensorflow
Successfully installed numpy-1.11.2 protobuf-3.1.0 six-1.10.0 tensorflow-0.12.0rc0 wheel-0.29.0

tensorflow-gpu

PS D:\> pip install tensorflow-gpu
Collecting tensorflow-gpu
  Downloading tensorflow_gpu-0.12.0rc0-cp35-cp35m-win_amd64.whl (32.5MB)
    100% |################################| 32.5MB 40kB/s
Requirement already satisfied: wheel>=0.26 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: numpy>=1.11.0 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: six>=1.10.0 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: protobuf==3.1.0 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: setuptools in d:\python35\lib\site-packages (from protobuf==3.1.0->tensorflow-gpu)
Installing collected packages: tensorflow-gpu
Successfully installed tensorflow-gpu-0.12.0rc0

実行開始

virtualenv で D:\tensorflow に作業フォルダを作ります。

PS D:\> virtualenv --system-site-packages D:\tensorflow
Using base prefix 'd:\\python35'
New python executable in D:\tensorflow\Scripts\python.exe
Installing setuptools, pip, wheel...done.

学習解析サンプルを用意する

適当なところで
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
します。自分は普段 VirtualBox の Linux を動かしていて SMB でファイル共有するのでそちらでクローンしました。
tensorflow/tensorflow/modelsD:\tensorflow\models となるようにコピーします。

MNIST を試す

手書き数字の解析プログラムを試してみます。

PS D:\> cd tensorflow\tensorflow\models\image\mnist
PS D:\tensorflow\models\image\mnist> python convolutional.py
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library curand64_80.dll locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data\train-images-idx3-ubyte.gz
Extracting data\train-labels-idx1-ubyte.gz
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] F
ound device 0 with properties:
name: GeForce GTX 960
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.64GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] D
MA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0
:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] C
reating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] C
ould not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been bu
ilt with NUMA support.
Initialized!
Step 0 (epoch 0.00), 50.9 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 12.1 ms
Minibatch loss: 3.226, learning rate: 0.010000
Minibatch error: 4.7%
Validation error: 7.3%
Step 200 (epoch 0.23), 12.0 ms
Minibatch loss: 3.404, learning rate: 0.010000
Minibatch error: 10.9%
(省略)
Minibatch loss: 1.609, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 1.0%
Test error: 0.8%

割と早く終わったので GPU が効いているのでしょう。上手く動いたことは確認できましたが、わかりやすい ImageNet を次に試します。

ImageNet を試す

画像を解析して何の画かを当てる ImageNet です。

PS D:> cd \tensorflow\models\imagenet

まず準備です。 python .\classify_image.py を実行します。

PS D:\tensorflow\models\image\imagenet> python .\classify_image.py
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_8
0.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.
dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80
.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll
 locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_8
0.dll locally
>> Downloading inception-2015-12-05.tgz 100.0%
Successfully downloaded inception-2015-12-05.tgz 88931400 bytes.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 960
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.64GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0)
 -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /jo
b:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_def_util.cc:332] Op BatchNormWithGlobalNormalization is depr
ecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate
 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89233)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00859)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00264)
custard apple (score = 0.00141)
earthstar (score = 0.00107)

適当に M:\fuji.jpg に富士山の写真をおきました。
python .\classify_image.py --image_file M:\fuji.jpg で解析させます。

PS D:\tensorflow\models\image\imagenet> python .\classify_image.py --image_file M:\fuji.jpg
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_8
0.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.
dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80
.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll
 locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_8
0.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 960
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.64GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0)
 -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /jo
b:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_def_util.cc:332] Op BatchNormWithGlobalNormalization is depr
ecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate
 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
volcano (score = 0.91087)
fire screen, fireguard (score = 0.00192)
alp (score = 0.00162)
lakeside, lakeshore (score = 0.00130)
geyser (score = 0.00077)

volcano (score = 0.91087) 火山と認識されましたね。ちなみに写真は雪化粧してる富士山でした。

W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate
 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

と警告が出ていたのでもっとメモリがあるといいのでしょう。

とりあえず特に嵌らずに動いたのでみなさんもお試しください。

tilfin's note よりクロスポスト

tilfin
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
No comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
ユーザーは見つかりませんでした