LoginSignup
25
21

More than 5 years have passed since last update.

nvidia-dockerでtensorflow(GPU版)を動かす

Last updated at Posted at 2016-12-15

この記事は

  • Tensorflowをdockerで動かせるようにしたメモです

前提

  • ディープラーニング向けにTITAN Xの2枚差しマシンが手に入ってうれしいです
  • この記事でOSに認識させました

環境

  • NVIDIA TITAN X (Pascal) * 2
  • ubuntu-14.04

普通のdockerをインストール

  • まず普通にdockerを入れる
  • dev系ライブラリとかは適当に事前に入れておく
  • 下記でインストール
wget -qO- https://get.docker.com/ | sh

nvidia-dockerをインストール

  • dockerコンテナの中でホスト側のGPUを認識させるためにnvidia-dockerというdockerのラッパー的な?物を使うとはまらないらしい
  • こちらの手順をベースにする
$ wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.3/nvidia-docker_1.0.0.rc.3-1_amd64.deb
$ sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
  • 更に下記コマンドでvolumeを作らないとうまくいかなかった(詳細は不明)
$ nvidia-docker volume create
  • nvidia-dockerの動作確認
  • dockerコンテナの中でGPUが認識できている
$ nvidia-docker run nvidia/cuda nvidia-smi

Thu Dec 15 08:57:16 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.20                 Driver Version: 375.20                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 0000:01:00.0     Off |                  N/A |
| 23%   32C    P8    16W / 250W |      8MiB / 12218MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 0000:02:00.0     Off |                  N/A |
| 23%   30C    P8    17W / 250W |      8MiB / 12221MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

公式Dockerイメージ

  • いよいよtensorflowをdocker上で動かしたいけど、どのイメージを使えばいいのか
  • 既存の公式イメージ/DockerfileだとJupytor Notebookが入って立ち上がる
  • 普通にシェルで直接操作したい
    • docker execでつながるけどね・・

Dockerfileを自作

  • 結局Dockerfileを自作した
  • ベースのイメージはFROM nvidia/cuda:8.0-cudnn5-develとして公式Dockerfileに合わせる
    • このイメージは最初から諸々ディープラーニング系のNVIDIA公式ライブラリがインストール済
    • cudaとかcudnnとか、NVIDIAの公式HPに登録・ログインしてダウンロードして、サーバ転送してインストールして・・・とかあの手順が要らない!
  • メインのセットアップ部分はここの手順を参考に作成
Dockerfile
FROM nvidia/cuda:8.0-cudnn5-devel

MAINTAINER Shota Onishi <s-onishi@example.com>

RUN sudo apt-get -y update
RUN sudo apt-get -y upgrade
RUN sudo DEBIAN_FRONTEND=noninteractive apt-get -y install build-essential git python-pip libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev gfortran python-matplotlib libblas-dev liblapack-dev libatlas-base-dev python-dev python-pydot linux-headers-generic linux-image-extra-virtual unzip python-numpy swig python-pandas python-sklearn unzip wget pkg-config zip g++ zlib1g-dev libcurl3-dev
RUN sudo pip install -U pip

RUN sudo apt-get -y install software-properties-common python-software-properties
RUN sudo add-apt-repository -y ppa:webupd8team/java
RUN sudo apt-get -y update
RUN echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
RUN echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
RUN sudo apt-get -y install -y oracle-java8-installer

RUN echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
RUN curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
RUN sudo apt-get -y update
RUN sudo apt-get -y install bazel
RUN sudo apt-get -y upgrade bazel

RUN sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl

CMD /bin/bash
  • DEBIAN_FRONTEND=noninteractiveてのを入れないとエラーが出たので入れてる

Dockerイメージのビルド、コンテナ立ち上げ・接続

  • 普通にビルド、立ち上げ
$ nvidia-docker build -t s-onishi/tensorflow:v0_11 .
$ nvidia-docker run --name tensorflow_011_20161215 -v /share:/share -i -t s-onishi/tensorflow:v0_11 /bin/bash

tensorflowがGPUを認識しているか確認

  • Dockerコンテナにつなぐ
$ docker attach tensorflow_011_20161215
  • コンテナの中でpythonのREPL起動
# python
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
  • おお
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
  • 完全に認識してるぽい
>>> sess=tf.Session()
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:01:00.0
Total memory: 11.93GiB
Free memory: 11.78GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x1df29f0
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:02:00.0
Total memory: 11.93GiB
Free memory: 11.78GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1:   Y Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: TITAN X (Pascal), pci bus id: 0000:02:00.0)

終わりに

  • これでpython3とかopencvとかanacondaとかごっちゃごちゃになってもコンテナ捨てればいいから気楽ですね!
25
21
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
25
21