CentOS7にdeep learning環境を構築する

  • Fujitsu PRIMERGY RX200S6
    • XEON E5630*2
    • 1333RDIMM 12GB
    • SAS HDD 146GB*4 (RAID10)
    • NVIDIA GT710

1Uや2Uのラックマウントサーバでは、フルハイトで2スロット占有するようなGPUは使えないので、本当はタワー型や4Uのサーバを用意すべき。あくまで環境構築テストと言う位置づけである。なにせこのGPUではCPU(XEONの4C/8T*2 Westmere EP)で実行するのとほとんど差が出ないので、GPUを使う意味がない。


$ cat /etc/redhat-release 
CentOS Linux release 7.6.1810 (Core)
$ uname -a
Linux y1-rx200s6 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux




# lsmod | grep nouveau
nouveau 1869689 0
mxm_wmi 13021 1 nouveau
video 24538 1 nouveau
wmi 21636 2 mxm_wmi,nouveau
drm_kms_helper 179394 2 mgag200,nouveau
ttm  114635 2 mgag200,nouveau
drm  429744 5 ttm,drm_kms_helper,mgag200,nouveau
i2c_algo_bit  13413 3 igb,mgag200,nouveau
# nano /etc/modprobe.d/blacklist-nouveau.conf
# cat /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0

# dracut --force
# reboot


# yum -y install kernel-devel-$(uname -r) kernel-header-$(uname -r) gcc make


# lspci | grep VGA
04:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)
08:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05)


# wget http://jp.download.nvidia.com/XFree86/Linux-x86_64/418.43/NVIDIA-Linux-x86_64-418.43.run


# bash NVIDIA-Linux-x86_64-418.43.run
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 418.43................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


# nvidia-smi

| NVIDIA-SMI 418.43  Driver Version: 418.43   |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap|  Memory-Usage | GPU-Util Compute M. |
|  0 GeForce GT 710 Off | 00000000:04:00.0 N/A | N/A |
| 50%  28C P0 N/A / N/A | 0MiB /  980MiB |  N/A Default |
| Processes:  GPU Memory |
| GPU  PID  Type  Process name  Usage |
| 0 Not Supported  |

Not Supportedは気になるが、ローエンド機種なのでそういうものらしい


# wget https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.1.105-1.x86_64.rpm
# rpm -Uvh cuda-repo-rhel7-10.1.105-1.x86_64.rpm
# sed -i -e "s/enabled=1/enabled=0/g" /etc/yum.repos.d/cuda.repo
# yum --enablerepo=cuda,epel install cuda-10-1 xorg-x11-drv-nvidia dkms gcc make

# nano /etc/profile.d/cuda101.sh
# cat /etc/profile.d/cuda101.sh
export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}


# reboot


$ cuda-install-samples-10.1.sh ./
Copying samples to ./NVIDIA_CUDA-10.1_Samples now...
Finished copying samples.

$ cd NVIDIA_CUDA-10.1_Samples/1_Utilities/deviceQuery
$ make

$ ./deviceQuery
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 710"
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory:  981 MBytes (1028587520 bytes)
( 1) Multiprocessors, (192) CUDA Cores/MP:  192 CUDA Cores
GPU Max Clock rate: 954 MHz (0.95 GHz)
Memory Clock rate:  800 Mhz
Memory Bus Width: 64-bit
L2 Cache Size:  524288 bytes
Maximum Texture Dimension Size (x,y,z)  1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory:  65536 bytes
Total amount of shared memory per block:  49152 bytes
Total number of registers available per block: 65536
Warp size:  32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block:  1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment:  512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels:  No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping:  Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID:  0 / 4 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS




$ git clone https://github.com/yyuu/pyenv.git ~/.pyenv
$ echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
$ echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
$ echo 'eval "$(pyenv init -)"' >> ~/.bashrc
$ source ~/.bashrc


(以前はAnaconda3-5.3.1をインストールして$ conda install python=3.6でpythonだけバージョンダウンしていたのだが、2019年8月に再構築しようとしたらcondaが自動的に4.7.10にアップデートされ、その瞬間にanacondaの挙動がおかしくなるという現象が現れたため、最初からanaconda3-4.3.1を入れることにした。この記事を編集している時点でconda4.7.11がリリースされているので改善されているかもしれない。)

$ pyenv install anaconda3-4.3.1
$ pyenv rehash
$ pyenv global anaconda3-4.3.1
$ echo 'export PATH="$PYENV_ROOT/versions/anaconda3-4.3.1/bin/:$PATH"' >> ~/.bashrc
$ source ~/.bashrc


$ tar zxf cudnn-10.1-linux-x64-v7.6.2.24.tgz
$ sudo cp -a cuda/include/* /usr/local/cuda/include/
$ sudo cp -a cuda/lib64/* /usr/local/cuda/lib64/
$ sudo ldconfig

(cudnnをダウンロードするにはNVIDIAのサイトでメンバー登録が必要。以前は$ conda install cudnnで登録なしにインストールできていたのだが前述の通りconda4.7.10が一部正常に機能しないため、今回は手動インストールとした。)

$ sudo ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp
$ sudo ln -s /usr/lib64/libcublas.so. /usr/local/cuda-10.1/lib64/libcublas.so.
$ sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so. /usr/local/cuda-10.1/lib64/libcublas.so.10.1
$ sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10.1 /usr/local/cuda-10.1/lib64/libcublas.so
$ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcusolver.so.10.1.105 /usr/local/cuda-10.1/lib64/libcusolver.so.10.1
$ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcurand.so.10.1.105 /usr/local/cuda-10.1/lib64/libcurand.so.10.1
$ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcufft.so.10.1.105 /usr/local/cuda-10.1/lib64/libcufft.so.10.1


$ conda install tensorflow-gpu


$ python
Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 12:22:00) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2019-08-04 15:00:29.442682: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2019-08-04 15:00:29.442754: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2019-08-04 15:00:29.612838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-04 15:00:29.613333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GT 710
major: 3 minor: 5 memoryClockRate (GHz) 0.954
pciBusID 0000:04:00.0
Total memory: 980.94MiB
Free memory: 958.69MiB
2019-08-04 15:00:29.613394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2019-08-04 15:00:29.613416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2019-08-04 15:00:29.613452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 710, pci bus id: 0000:04:00.0)
[name: "/cpu:0"
device_type: "CPU"
memory_limit: 268435456
locality {
incarnation: 10112316233564255216
, name: "/gpu:0"
device_type: "GPU"
memory_limit: 769327104
locality {
  bus_id: 1
incarnation: 15150163897772535855
physical_device_desc: "device: 0, name: GeForce GT 710, pci bus id: 0000:04:00.0"


$ conda install keras=2.0.8


kerasのリポジトリをgit cloneしてmnist_cnn.pyというプログラムを走らせてみる

$ git clone https://github.com/fchollet/keras.git
$ cd keras/examples
$ python mnist_cnn.py
Using TensorFlow backend.
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 5s 0us/step
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


59776/60000 [============================>.] - ETA: 0s - loss: 0.0267 - acc: 0.959904/60000 [============================>.] - ETA: 0s - loss: 0.0269 - acc: 0.960000/60000 [==============================] - 76s 1ms/step - loss: 0.0271 - acc: 0.9917 - val_loss: 0.0344 - val_acc: 0.9881
Test loss: 0.03438230963442984
Test accuracy: 0.9881



