4
4

More than 5 years have passed since last update.

Keras、NVIDIA等の環境構築をAnsibleで自動構築

Last updated at Posted at 2018-09-18

はじめに

  • AWS EC2、オンプレ環境でKerasの環境構築を繰り返しています。
  • マシンが10〜100台となる計画があり、Ansibleで自動構築をしました。
  • Install TensorFlow on UbuntuTensorFlow GPU supportを参考にしています。

自動構築の概要

  • hostsのIPアドレスを修正して、ansible-playbookで自動構築できます!!!
  • 一応、流れを把握して頂いた後、自動実行して頂ければと思います
$ vim hosts
[keras]
xxx.xxx.xxx.xxx
yyy.yyy.yyy.yyy
zzz.zzz.zzz.zzz

$ ansible-playbook keras.yaml

前提

  • Ansibleの基本的な解説はしません
  • Ubuntu Server 16.04 LTS
  • Python3
  • Tesla K80

解説

NVIDIAドライバ

  • 384.x or higherが要求条件です
  • Ubuntu Server 16.04 LTSのnvidia-384を利用します

CUDA

  • CUDA 9.0が要求条件です
  • cuda-cublas-9-0の様にバージョンを指定しています

cuDNN

NCCL

  • NCCLは、NVIDIA Collective Communications Libraryと言う、マルチGPU、マルチノード対応のライブラリです
  • TensorFlow GPU supportでは、オプションとされてますが、確認のため入れました
  • このパッケージも上記の機械学習リポジトリからインストール出来ます

TensorRT

Ansible

keras.yaml

  • NVIDIAドライバの設定のため、rc.localを修正したものをコピーしてますので、別途解説します
---
- hosts: keras
  become: true
  tasks:
    - name: Install nvidia driver
      apt:
        name: nvidia-384
        update_cache: yes
    - name: Copy rc.local
      copy:
        src: rc.local
        dest: /etc/rc.local
        mode: 0755
    - name: Add nvidia key
      apt_key:
        url: http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
    - name: Install nvidia repos
      apt:
        deb: "{{ item }}"
      with_items:
        - http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.2.148-1_amd64.deb
        - http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
        - http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.0_1-1_amd64.deb
    - name: Install nvidia CUDA, cuDNN, CUPTI, NCCL, TensorRT
      apt:
        name: "{{ item }}"
        update_cache: yes
      with_items:
        - cuda-command-line-tools-9-0
        - cuda-cublas-9-0
        - cuda-cufft-9-0
        - cuda-curand-9-0
        - cuda-cusolver-9-0
        - cuda-cusparse-9-0
        - libcudnn7=7.2.1.38-1+cuda9.0
        - libnccl2=2.2.13-1+cuda9.0
        - libnvinfer4=4.1.2-1+cuda9.0
    - name: Install python3-pip
      apt:
        name: python3-pip
    - name: Install TensorFlow, Keras
      pip:
        name: "{{ item }}"
      with_items:
        - tensorflow-gpu
        - keras
    - name: Download Keras MNIST CNN
      get_url:
        url: https://raw.githubusercontent.com/keras-team/keras/master/examples/mnist_cnn.py
        dest: /home/ubuntu/mnist_cnn.py
        mode: 0644
        owner: ubuntu
        group: ubuntu

rc.local

  • nvidia-smi -pm 1は、設定の永続化のためです
  • nvidia-smi --auto-boost-default=0は、自動ブースト機能の無効化のためです
  • nvidia-smi -ac 2505,875は、K80のGPUクロック速度を最大周波数にするためです
  • 参考にV100とP100も記載してます
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.

nvidia-smi -pm 1
nvidia-smi --auto-boost-default=0
# V100
#nvidia-smi -ac 877,1530
# P100
#nvidia-smi -ac 715,1328
# K80
nvidia-smi -ac 2505,875

exit 0

/etc/ansible/ansible.cfg

  • 下記に変更してます
host_key_checking = False

/etc/ansible/hosts

  • xxx.xxx.xxx.xxx等は、適当なIPアドレスに変更してください
  • Pythonは、Python3にしてます
  • ユーザ名は、ubuntuにしてます
[keras]
xxx.xxx.xxx.xxx
yyy.yyy.yyy.yyy
zzz.zzz.zzz.zzz

[keras:vars]
ansible_python_interpreter=/usr/bin/python3
ansible_user=ubuntu

ansible-playbook

  • 自動構築を実施します
$ ansible-playbook keras.yaml

動作確認

  • サーバをリブートした後、下記で確認

Keras MNIST CNN

$ python3 mnist_cnn.py
Using TensorFlow backend.
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 4s 0us/step
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2018-09-18 16:45:15.263356: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-18 16:45:15.412813: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-09-18 16:45:15.413209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8755
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-09-18 16:45:15.413237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-18 16:45:16.921143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-18 16:45:16.921187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2018-09-18 16:45:16.921202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2018-09-18 16:45:16.922123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10757 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
60000/60000 [==============================] - 12s 200us/step - loss: 0.2574 - acc: 0.9199 - val_loss: 0.0546 - val_acc: 0.9826
Epoch 2/12
60000/60000 [==============================] - 8s 129us/step - loss: 0.0871 - acc: 0.9745 - val_loss: 0.0403 - val_acc: 0.9858
Epoch 3/12
60000/60000 [==============================] - 8s 129us/step - loss: 0.0665 - acc: 0.9806 - val_loss: 0.0358 - val_acc: 0.9885
Epoch 4/12
60000/60000 [==============================] - 8s 129us/step - loss: 0.0550 - acc: 0.9834 - val_loss: 0.0311 - val_acc: 0.9896
Epoch 5/12
60000/60000 [==============================] - 8s 130us/step - loss: 0.0481 - acc: 0.9856 - val_loss: 0.0313 - val_acc: 0.9896
Epoch 6/12
60000/60000 [==============================] - 8s 130us/step - loss: 0.0425 - acc: 0.9869 - val_loss: 0.0279 - val_acc: 0.9908
Epoch 7/12
60000/60000 [==============================] - 8s 130us/step - loss: 0.0373 - acc: 0.9884 - val_loss: 0.0272 - val_acc: 0.9905
Epoch 8/12
60000/60000 [==============================] - 8s 130us/step - loss: 0.0351 - acc: 0.9892 - val_loss: 0.0248 - val_acc: 0.9918
Epoch 9/12
60000/60000 [==============================] - 8s 130us/step - loss: 0.0316 - acc: 0.9902 - val_loss: 0.0270 - val_acc: 0.9918
Epoch 10/12
60000/60000 [==============================] - 8s 130us/step - loss: 0.0304 - acc: 0.9911 - val_loss: 0.0251 - val_acc: 0.9916
Epoch 11/12
60000/60000 [==============================] - 8s 130us/step - loss: 0.0292 - acc: 0.9910 - val_loss: 0.0259 - val_acc: 0.9914
Epoch 12/12
60000/60000 [==============================] - 8s 129us/step - loss: 0.0272 - acc: 0.9917 - val_loss: 0.0301 - val_acc: 0.9915
Test loss: 0.03007666835412515
Test accuracy: 0.9915

nvidia-smi

  • nvidia-smiでGPUの使用率等をチェック
$ watch -n 0.1 nvidia-smi
Every 0.1s: nvidia-smi                                                                                                                                                                                                                                  Tue Sep 18 16:45:35 2018

Tue Sep 18 16:45:35 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   70C    P0   126W / 149W |  10959MiB / 11439MiB |     78%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      3238      C   python3                                    10946MiB |
+-----------------------------------------------------------------------------+

おわりに

  • 今まで、cuDNNは、アカウント作成、ログイン、ダウンロード、インストールと言う流れでしたが、機械学習リポジトリのお陰で、自動構築が叶いました
  • NCCLやTensorRTも手動作業が不要になるのは、嬉しい発見でした
  • 今後は、NCCLを活用したHorovodの環境構築を目指す予定です
4
4
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
4