0.Intro
記事中のハードウェア環境の記載の通り僕は機械学習環境に初代がi7搭載された旧式のマシンを使っているでAVX2、FMAの都合上あるバージョン以降のTensorflowを使う場合はソースからビルドする必要があります。
Tensorflowのビルドは環境の設定が非常にシビアで今回も一度ビルドエラーとなりました。本エントリではTensorflow 2.2のビルドに成功したでOSインストール後からの手順を公開させて頂きます。
1.環境構築
・開発環境のインストール
sudo apt-get install build-essential libssl-dev libbz2-dev libreadline-dev libsqlite3-dev zip unzip nkf
・Nouveauドライバの無効化
NVIDIAのグラフィックカードの場合,デフォルトでnouveauというドライバが使用されている.
チェック方法
lsmod | grep -i nouveau
NVIDIAのドライバと競合する恐れがあるので無効化しておく.
/etc/modprobe.d/blacklist-nouveau.confを作成し,以下の設定を記述する.
blacklist nouveau
options nouveau modeset=0
カーネルモジュールをblacklistに追加した後,再読み込み.
sudo update-initramfs -u
リブート
sudo reboot
・CUDA Toolkit 10.1 update2 Archive のインストール
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
$ sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get -y install cuda
・cudnnインストール
$ tar -xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz
Copy the following files into the CUDA Toolkit directory, and change the file permissions.
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
$ sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
Install the developer library, for example:
$ sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
Install the code samples and the cuDNN Library User Guide, for example:
$ sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.1_amd64.deb
・libcupti-devをapt-getでインストール
sudo apt-get install libcupti-dev
・パスの設定
export PATH=/usr/local/cuda-10.1/bin:${PATH}
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:${LD_LIBRARY_PATH}
ここで一度リブート
その後CUDA,cudnnに付属のサンプルを動かして動作確認
cudnnにあるmnistの実行方法
$ ar vx libcudnn7-doc_7.6.5.32-1+cuda10.1_amd64.deb
$ tar Jxvf data.tar.xz
$ cd ~/cudnn_samples_v7/mnistCUDNN
でmake し実行してみる。
・anacondaインストール
bash Anaconda3-2020.07-Linux-x86_64.sh
・python3.7環境の作成と必要モジュールのインストール
conda create -n ml_env python=3.7
conda install six
conda install mock
conda install scikit-learn
conda install -c conda-forge keras-applications
conda install -c conda-forge keras-preprocessing
pip install numpy==1.18.0
最後のnumpyを最新版から1.18.0に置き換えるのがポイントで最新版の1.19.1だとTensorflowのビルドの最終盤で
ERROR: /home/aptx4869/github/tensorflow/tensorflow/python/tools/BUILD:281:1 C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed (Exit 1)
INFO: Elapsed time: 20.016s, Critical Path: 7.56s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
でコケました。
https://github.com/tensorflow/tensorflow/issues/40688
この記事辺りから色々探っているとnumpyが原因のようです。1.19.0に問題があるようで1.19.1だとBUG FIXしたようなことが書いてるのですが上手くいきませんでした。よって成功事例が報告されている1.18.0にてビルド再実施。
conda環境なので
conda install numpy=1.18.0
で置き換えることが出来るかと思いきやバージョン指定が上手くいかなかったのでpipにて置き換えました。
・bazelのインストール
tensorflow-2.2のソースにあるconfigure.pyに
_TF_BAZELRC_FILENAME = '.tf_configure.bazelrc'
_TF_WORKSPACE_ROOT = ''
_TF_BAZELRC = ''
_TF_CURRENT_BAZEL_VERSION = None
_TF_MIN_BAZEL_VERSION = '2.0.0'
_TF_MAX_BAZEL_VERSION = '2.0.0'
とあるので2.0.0を取得しインストール
bash bazel-2.0.0-installer-linux-x86_64.sh
・tensorflow-2.2のソース展開
githubよりtensorflow-2.2.0のソースを取得
$unzip tensorflow-2.2.0.zip 展開
・tensorflow-2.2コンパイル
(ml_env) XXXX@XXXX:~/tensorflow-2.2.0$ ~/tensorflow-2.2.0/configure
でビルドを構成。
Do you wish to build TensorFlow with CUDA support? [y/N]:
で y
Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 6.1]:
で 6.1
と答えた以外はEnterで進みました。
(ml_env) XXXX@XXXX: ~./bin/bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
でビルド開始。
ここで一回目実行したときはコンパイル前の諸々のfetchの時にエラーになりました。しかしもう一度上記のコマンドを叩くと無事にコンパイルまで到達しました。
ビルド終了までだいたい_5時間_程度でした。
・パッケージ作成
~/tensorflow-2.2.0/configure/bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
・パッケージインストール
pip install /tmp/tensorflow-2.2.0-cp37-cp37m-linux_x86_64.whl
・動作確認
Python 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.gpu_device_name()
2020-09-10 22:56:01.627824: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2672785000 Hz
2020-09-10 22:56:01.638981: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b8bdf41c00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-10 22:56:01.639018: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-09-10 22:56:01.665330: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-09-10 22:56:01.901801: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-10 22:56:01.902813: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b8bdfae070 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-10 22:56:01.902856: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1060 6GB, Compute Capability 6.1
2020-09-10 22:56:01.912217: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-10 22:56:01.913353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1
coreClock: 1.7845GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s
2020-09-10 22:56:01.928570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-09-10 22:56:02.112599: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-09-10 22:56:02.211270: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-09-10 22:56:02.253332: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-09-10 22:56:02.434494: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-09-10 22:56:02.472776: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-09-10 22:56:02.843482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-10 22:56:02.843662: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-10 22:56:02.844592: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-10 22:56:02.845390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-09-10 22:56:02.856635: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-09-10 22:56:02.868992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-10 22:56:02.869015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-09-10 22:56:02.869030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-09-10 22:56:02.887511: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-10 22:56:02.888358: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-10 22:56:02.889182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 5637 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:03:00.0, compute capability: 6.1)
'/device:GPU:0'
>>>
ハードウェア環境
[CPU]
(ml_env) XXXX@XXXX:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 5
microcode : 0x1d
cpu MHz : 1603.722
cache size : 8192 KB
physical id : 0
[Mem]
(ml_env) XXXX@XXXX:~$ cat /proc/meminfo
MemTotal: 20543584 kB
MemFree: 19978580 kB
MemAvailable: 20031036 kB
Buffers: 34636 kB
Cached: 277696 kB
[GPU]
(ml_env) XXXX@XXXX:~$ lspci | grep -i nvidia
03:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
バージョン一覧
[OS]
(ml_env) XXXX@XXXX:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"
[CUDA]
(ml_env) XXXX@XXXX:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
```
__[cudnn]__
```
(ml_env) XXXX@XXXX:~$ cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2
# define CUDNN_MAJOR 7
# define CUDNN_MINOR 6
# define CUDNN_PATCHLEVEL 5
--
# define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
# include "driver_types.h"
```
__[GCC]__
```
(ml_env) XXXX@XXXX:~$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
```
__[Anaconda]__
__・Python__
```
(ml_env) XXXX@XXXX:~$ python3 -V
Python 3.7.9
```
__・インストール済みパッケージ一覧__
```
(ml_env) XXXX@XXXX:~$ conda list
# packages in environment at /home/XXXX/anaconda3/envs/ml_env:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
absl-py 0.10.0 pypi_0 pypi
astunparse 1.6.3 pypi_0 pypi
ca-certificates 2020.6.20 hecda079_0 conda-forge
cachetools 4.1.1 pypi_0 pypi
certifi 2020.6.20 py37hc8dfbb8_0 conda-forge
chardet 3.0.4 pypi_0 pypi
gast 0.3.3 pypi_0 pypi
google-auth 1.21.1 pypi_0 pypi
google-auth-oauthlib 0.4.1 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.32.0 pypi_0 pypi
h5py 2.10.0 nompi_py37h90cd8ad_104 conda-forge
hdf5 1.10.6 nompi_h3c11f04_101 conda-forge
idna 2.10 pypi_0 pypi
importlib-metadata 1.7.0 pypi_0 pypi
keras-applications 1.0.8 py_1 conda-forge
keras-preprocessing 1.1.0 py_0 conda-forge
ld_impl_linux-64 2.33.1 h53a641e_7
libblas 3.8.0 17_openblas conda-forge
libcblas 3.8.0 17_openblas conda-forge
libedit 3.1.20191231 h14c3975_1
libffi 3.3 he6710b0_2
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.5.0 hdf63c60_16 conda-forge
liblapack 3.8.0 17_openblas conda-forge
libopenblas 0.3.10 pthreads_hb3c22a3_4 conda-forge
libstdcxx-ng 9.1.0 hdf63c60_0
markdown 3.2.2 pypi_0 pypi
mock 4.0.2 py_0
ncurses 6.2 he6710b0_1
numpy 1.18.0 pypi_0 pypi
oauthlib 3.1.0 pypi_0 pypi
openssl 1.1.1g h516909a_1 conda-forge
opt-einsum 3.3.0 pypi_0 pypi
pip 20.2.2 py37_0
protobuf 3.13.0 pypi_0 pypi
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
python 3.7.9 h7579374_0
python_abi 3.7 1_cp37m conda-forge
readline 8.0 h7b6447c_0
requests 2.24.0 pypi_0 pypi
requests-oauthlib 1.3.0 pypi_0 pypi
rsa 4.6 pypi_0 pypi
scipy 1.4.1 pypi_0 pypi
setuptools 49.6.0 py37_0
six 1.15.0 py_0
sqlite 3.33.0 h62c20be_0
tensorboard 2.2.2 pypi_0 pypi
tensorboard-plugin-wit 1.7.0 pypi_0 pypi
tensorflow 2.2.0 pypi_0 pypi
tensorflow-estimator 2.2.0 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
tk 8.6.10 hbc83047_0
urllib3 1.25.10 pypi_0 pypi
werkzeug 1.0.1 pypi_0 pypi
wheel 0.35.1 py_0
wrapt 1.12.1 pypi_0 pypi
xz 5.2.5 h7b6447c_0
zipp 3.1.0 pypi_0 pypi
zlib 1.2.11 h7b6447c_3
```
__[Bazel]__
```
(ml_env) XXXX@XXXX:~$ ./bin/bazel version
Build label: 2.0.0
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 19 12:30:18 2019 (1576758618)
Build timestamp: 1576758618
Build timestamp as int: 1576758618
```
以上