More than 5 years have passed since last update.

NVIDIA　Docker2+CUDA8.2でVisemeNet_tensorflowを動かす

nvidia-docker2

Last updated at 2019-02-21Posted at 2019-02-21

Dockerをインストール

Dockerのインストールは以下を参考にしたしました、ありがとうございます。
https://qiita.com/tkyonezu/items/0f6da57eb2d823d2611d

$ sudo apt install curl
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

$ sudo apt-get update
$ sudo apt-get install docker-ce
$ docker version 
Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:50 2019
 OS/Arch:           linux/amd64
 Experimental:      false
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.39/version: dial unix /var/run/docker.sock: connect: permission denied

これでDockerのインストールは完了です

NVIDIA Docker2環境構築

引き続きやっていきます。

NVIDIAの公式リポジトリにクイックスタートが書いてあるのでそのとおりにやっていきます
https://github.com/NVIDIA/nvidia-docker

$ docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f

すでに別バージョンのNVIDIA-Dockerが入ってる場合

 sudo apt-get purge -y nvidia-docker

上記を実行する必要がありますが今回は新規インストールなので要りません。

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
OK
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
$ sudo pkill -SIGHUP dockerd
$ nvidia-docker --version
Docker version 18.09.2, build 6247962
$ sudo groupadd docker
$ sudo usermod -aG docker $USER

インストールがこれで完了しました。

NIVIDIA-Dockerを動かしていきます。

動作チェックをします

$ sudo docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
Unable to find image 'nvidia/cuda:latest' locally
latest: Pulling from nvidia/cuda
38e2e6cd5626: Pull complete 
705054bc3f5b: Pull complete 
c7051e069564: Pull complete 
7308e914506c: Pull complete 
5260e5fce42c: Pull complete 
8e2b19e62adb: Pull complete 
9b3d4105edff: Pull complete 
4c87e0decc6a: Pull complete 
Digest: sha256:948ad13aa11d94cb42e7e193059fbd7460ef7268bfac97738925978e1336e3bd
Status: Downloaded newer image for nvidia/cuda:latest
Tue Feb 19 11:32:31 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:05:00.0  On |                  N/A |
|  0%   51C    P8    19W / 300W |    241MiB / 11175MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

いよいよコンテナイメージを作っていきます。
CUDA8.0+cudnn5.0のコンテナイメージです

$ sudo nvidia-docker run --name test -it nvidia/cuda:8.0-cudnn5-runtime-ubuntu16.04

次にminicondaを入れます、ここからはコンテナ内部での作業になります、wgetコマンドがないのでwgetをインストールしてから.shを落とします
bzip2はminicondaを解答するのに必要なパッケージです

# apt update 
# apt upgrade
# apt install bzip2
# apt install wget
# apt install git
# cd ~
# wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# touch .bashrc
# bash Miniconda3-latest-Linux-x86_64.sh 
# source .bashrc
# conda -V      
# conda 4.5.12

conda環境の構築はこれで完了です.

VsemeNetを動かす

# git clone https://github.com/yzhou359/VisemeNet_tensorflow.git  
# cd ./VisemeNet_tensorflow/
# conda create -n visnet_CUDA python=3.5
# conda activate visnet_CUDA
# pip install --ignore-installed --upgrade https://download.tensorflow.google.cn/linux/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-linux_x86_64.whl
# pip install matplotlib
# pip install numpy
# pip install python_speech_features
# pip install scipy

これで必要なモジュール自体は全部揃っているのですが
https://www.dropbox.com/sh/7nbqgwv0zz8pbk9/AAAghy76GVYDLqPKdANcyDuba?dl=0
から

VisemeNet_tensorflow_pretrain_biwi_modelをドロップボックスから落としていかないといけません
ひとまずこれを全部ダウンロードしてまとめてからホストマシンからコンテナにファイルを送信する方法を今回は使います

pretrain_biwi.ckpt.data-00000-of-00001
pretrain_biwi.ckpt.index  
pretrain_biwi.ckpt.meta

ホストマシン側

$ docker  cp ./checkpoint   630c6a3f2fa5:/root
$ docker  cp ./pretrain_biwi.ckpt.data-00000-of-00001   630c6a3f2fa5:/root
$ docker  cp ./pretrain_biwi.ckpt.index   630c6a3f2fa5:/root
$ docker  cp ./pretrain_biwi.ckpt.meta   630c6a3f2fa5:/root

コンテナ内で~/が/rootに相当するパスなので/VisemeNet_tensorflow/data/ckpt/pretrain_biwi以下に以上のフォルダをmv
すればOKです。

これでコンテナ内に送信できます、第二引数の16進数文字列はホストマシンで

$ docker ps
CONTAINER ID        IMAGE                                        COMMAND             CREATED             STATUS              PORTS               NAMES
630c6a3f2fa5        nvidia/cuda:8.0-cudnn5-runtime-ubuntu16.04   "/bin/bash"         4 hours ago         Up 3 hours                              test

以上のようにすれば確認できます.

ここまでやれば

 python ./main_test.py 
Warning: dir data/csv/visemenet_intro/ already exist! Continue program...
Warning: dir data/csv/visemenet_intro/test/ already exist! Continue program...

==================== Processing file data/test_audio/visemenet_intro.wav ====================
FPS: 25
WARNING:root:frame length (1103) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid.
WARNING:root:frame length (1103) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid.
WARNING:root:frame length (1103) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid.
Load #Clip 0/1, wav (1484, 65)
Save test - wav file as shape of (1484, 24)
2019-02-20 18:26:59.071332: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2019-02-20 18:26:59.071373: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2019-02-20 18:26:59.071385: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2019-02-20 18:26:59.071394: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2019-02-20 18:26:59.071406: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2019-02-20 18:26:59.234969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.6325
pciBusID 0000:05:00.0
Total memory: 10.91GiB
Free memory: 9.84GiB
2019-02-20 18:26:59.235015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2019-02-20 18:26:59.235028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2019-02-20 18:26:59.235045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0)
Warning: dir data/output_viseme/ already exist! Continue program...
Model loaded: data/ckpt/pretrain_biwi
data/csv/visemenet_intro/
Loading wav_raw.txt file in data/csv/visemenet_intro/
===================== TEST/CV CHUNK - data/csv/visemenet_intro/ ======================
Load Chunk 1, size 1484, total_size 1484 (1.00)
Warning: dir data/output_viseme/visemenet_intro already exist! Continue program...
Load Chunk 2, size 0, total_size 1484 (1.00)
Finish forward testing.
Create Viseme parameter in data/output_viseme/visemenet_i/mayaparam_viseme.txt
Done.

動作が確認できました。

ここまでCUDA8.2+GPU-Tesnsorflow1.1.0で動作させましたが試しに1.4.0など各種バージョンで試してみます.

# conda create -n visnet_CUDA2  python=3.5
# conda dactivate
# conda activate visnet_CUDA2 
# pip install tensorflow==1.4.0
# pip install python_speech_features
# pip install matplotlib
# pip install scipy
# python ./main_test.py

以上でテストしましたが大量にエラーが出てダメでした

1.4.1でも試してみます

# conda create -n visnet_CUDA3  python=3.5
# conda dactivate
# conda activate visnet_CUDA3
# pip install tensorflow-gpu==1.4.1
# pip install python_speech_features
# pip install matplotlib
# pip install scipy
# python ./main_test.py

以上でもダメでした。

まとめ

tensorflow1.1.0じゃないと動かないです。

追記cudnn6.0環境でもテストする

$　~$ nvidia-docker run  --name dnn6  --rm -i -t nvidia/cuda:8.0-cudnn6-runtime-ubuntu16.04 /bin/bash

CUDA8.0+cudnn6のコンテナを用意しました、

# apt update 
# apt upgrade
# apt install bzip2
# apt install wget
# apt install git
# cd ~
# wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# touch .bashrc
# bash Miniconda3-latest-Linux-x86_64.sh 
# source .bashrc
# conda -V      
# conda 4.5.1

さっきと同じ手順をくりかえします

# git clone https://github.com/yzhou359/VisemeNet_tensorflow.git

gitカラプロジェクト落としてホストマシンから必要なファイルををcpして

$ docker cp ./checkpoint   dnn6:/root/VisemeNet_tensorflow/data/ckpt/pretrain_biwi
$ docker cp ./pretrain_biwi.ckpt.index   dnn6:/root/VisemeNet_tensorflow/data/ckpt/pretrain_biwi
$ docker cp ./pretrain_biwi.ckpt.meta   dnn6:/root/VisemeNet_tensorflow/data/ckpt/pretrain_biwi
$ docker cp ./pretrain_biwi.ckpt.data-00000-of-00001   dnn6:/root/VisemeNet_tensorflow/data/ckpt/pretrain_biwi

こんなかんじでやればいいと思います。

# conda create -n cudnntest python=3.5
# pip install tensorflow-gpu==1.4.1
# pip install python_speech_features
# pip install matplotlib
# pip install scipy
# python ./main_test.py 
# pip list
Package                Version  
---------------------- ---------
bleach                 1.5.0    
certifi                2018.8.24
cycler                 0.10.0  
enum34                 1.1.6    
html5lib               0.9999999
kiwisolver             1.0.1    
Markdown               3.0.1    
matplotlib             3.0.2    
numpy                  1.16.1   
pip                    10.0.1   
protobuf               3.6.1    
pyparsing              2.3.1    
python-dateutil        2.8.0    
python-speech-features 0.6      
scipy                  1.2.1    
setuptools             40.2.0   
six                    1.12.0   
tensorflow-gpu         1.4.1    
tensorflow-tensorboard 0.4.0    
Werkzeug               0.14.1   
wheel                  0.31.1

結果としてはこれも大量にエラーを吐きました、 not found in checkpoint
が大量に出た感じですね、tensorflow-1.1.0でも試しましたがこれはこれで動かなかったので
cuda8.0+cudnn5.0+tensorflow1.1.0が推奨通り安定だと言う結論に最後帰結しました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

NVIDIA Docker2+CUDA8.2でVisemeNet_tensorflowを動かす

Dockerをインストール

NVIDIA Docker2環境構築

VsemeNetを動かす

まとめ

追記cudnn6.0環境でもテストする

NVIDIA　Docker2+CUDA8.2でVisemeNet_tensorflowを動かす