More than 5 years have passed since last update.

AWSでBNN-PYNQの学習をしてみた

Posted at 2018-11-25

FPGAでDeep Learningしてみる - きゅうりを選果するを、AWSで動かそうとして、いろいろハマったので、メモ書きです。

アジェンダ

GPU環境の構築
Python, Theano, Lasagne, Pylearn2のインストール
学習データとBNN-PYNQのインストール、学習の実行
学習パラメータのPYNQ上へのコピー、binary形式への変換
PYNQ上で推論
※以下、scp先やsshの鍵名などは適宜自分の実行環境のものに読み替えてください。

1. GPU環境の構築

AWSにはDeep Learning AMIというPython, Therano, Nvidia Dribvers, CUDA, cuDNNがプレインストールされたimageがあるのですが、これで後述のcucumber9.pyを実行しようとすると、どう頑張っても下記のようなエラーが出て動かないので、素のubuntuにnvidia driverからインストールことにしました。

ubuntu@ip-172-31-26-1:~/BNN-PYNQ/bnn/src/training$ python2.7 cucumber9.py
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release.  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 7104)
/home/ubuntu/.local/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py:620: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.1.
  warnings.warn(warn)
Traceback (most recent call last):
  File "cucumber9.py", line 43, in <module>
    import lasagne
  File "/home/ubuntu/.local/lib/python2.7/site-packages/lasagne/__init__.py", line 12, in <module>
    import theano
  File "/home/ubuntu/.local/lib/python2.7/site-packages/theano/__init__.py", line 115, in <module>
    theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
  File "/home/ubuntu/.local/lib/python2.7/site-packages/theano/sandbox/cuda/tests/test_driver.py", line 41, in test_nvidia_driver1
    raise Exception("The nvidia driver version installed with this OS "
Exception: The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem: http://developer.nvidia.com/cuda-downloads

AWS EC2のInstancesからLaunch Instanceして、Quick StartにあるUbuntu Server 16.04 LTS (HVM), SSD Volume Typeを選択

Instance TypeはGPU Computeからp2.xlargeを選択
(GPUインスタンスはデフォルトでは使えないので、EC2の左カラムのLimitsからp2.xlargeに対してRequest limit increaseしてlimit valueを1以上に変更しておく必要があります。)

Add StorageでRootのサイズを16GiBに増やしておく。
(8GiBのままだと、Pythonのインストールの途中でディスクがいっぱいになります)。

インスタンス起動してssh接続した後は、基本的にFPGAでDeep Learningしてみる - きゅうりを選果するに従ってNvidia Dribvers, CUDA, cuDNNのインストールを進めますが、driverのみ、記載のものでは、やはりdriver周りのエラーが出るので、下記のようにCUDA Toolkit 8.0 - Feb 2017のrunfileによるdriver, CUDAインストールにしました。

インストール中にgcc, makeを求められるので先にインストールしとく。

sudo apt-get update
sudo apt-get install gcc make

wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run
sudo sh cuda_8.0.61_375.26_linux-run

install optionはdefaultで進めます。
BLAS libraryのpatchも当てておく

wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/patches/2/cuda_8.0.61.2_linux-run 
sudo sh cuda_8.0.61.2_linux-run

cuDNNはFPGAでDeep Learningしてみる - きゅうりを選果するの通り、nvidiaのsiteでアカウント作って、cuDNN ArchiveからDownload cuDNN v5.1 (Jan 20, 2017), for CUDA 8.0を選択、libcudnn5_5.1.10-1+cuda8.0_amd64.debとlibcudnn5-dev_5.1.10-1+cuda8.0_amd64.debをローカルにダウンロード

AWSインスタンス上に転送してインストール
localから

scp -i my-key.pem libcudnn5_5.1.10-1+cuda8.0_amd64.deb ubuntu@ec2-52-87-164-28.compute-1.amazonaws.com:
scp -i my-key.pem libcudnn5-dev_5.1.10-1+cuda8.0_amd64.deb ubuntu@ec2-52-87-164-28.compute-1.amazonaws.com:

AWS上で

sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64.deb libcudnn5-dev_5.1.10-1+cuda8.0_amd64.deb

パスを通して、インストールしたdriver他を有効化するためreboot

sudo sh -c "echo 'CUDA_HOME=/usr/local/cuda' >> /etc/profile.d/cuda.sh"
sudo sh -c "echo 'export LD_LIBRARY_PATH=\${LD_LIBRARY_PATH}:\${CUDA_HOME}/lib64' >> /etc/profile.d/cuda.sh"
sudo sh -c "echo 'export LIBRARY_PATH=\${LIBRARY_PATH}:\${CUDA_HOME}/lib64' >> /etc/profile.d/cuda.sh"
sudo sh -c "echo 'export C_INCLUDE_PATH=\${C_INCLUDE_PATH}:\${CUDA_HOME}/include' >> /etc/profile.d/cuda.sh"
sudo sh -c "echo 'export CXX_INCLUDE_PATH=\${CXX_INCLUDE_PATH}:\${CUDA_HOME}/include' >> /etc/profile.d/cuda.sh"
sudo sh -c "echo 'export PATH=\${PATH}:\${CUDA_HOME}/bin' >> /etc/profile.d/cuda.sh"

sudo reboot

sshで再接続してnvidia-smiコマンド叩くとdriver versionは375.26に

ubuntu@ip-172-31-89-44:~$ nvidia-smi
Sun Oct  7 11:30:52 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   46C    P0    72W / 149W |      0MiB / 11439MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

2. Python, Theano, Lasagne, Pylearn2のインストール

Pythonをインストール
python cucumber9.pyするときにg++がないって怒られるので、g++もインストールしておく。

sudo apt-get install git g++ openssl libssl-dev libbz2-dev libreadline-dev libsqlite3-dev
git clone https://github.com/yyuu/pyenv.git ~/.pyenv

pyenvの設定。viで.bashrcに下記を追加

export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"

pythonバージョンを2.7とする。

source .bashrc
env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 2.7.13
pyenv global 2.7.13

あとはだいたいFPGAでDeep Learningしてみる - きゅうりを選果するの通り
Theanoのインストール

pip install --user git+https://github.com/Theano/Theano.git@rel-0.9.0beta1

Lasagneのインストール

pip install --user https://github.com/Lasagne/Lasagne/archive/master.zip

echo "[global]" >> ~/.theanorc
echo "floatX = float32" >> ~/.theanorc
echo "device = gpu" >> ~/.theanorc
echo "openmp = True" >> ~/.theanorc
echo "openmp_elemwise_minsize = 200000" >> ~/.theanorc
echo "" >> ~/.theanorc
echo "[nvcc]" >> ~/.theanorc
echo "fastmath = True" >> ~/.theanorc
echo "" >> ~/.theanorc
echo "[blas]" >> ~/.theanorc
echo "ldflags = -lopenblas" >> ~/.theanorc

Pylearn2のインストール

git clone https://github.com/lisa-lab/pylearn2
cd pylearn2/

setup.pyした時に、下記のエラーが出るので

Traceback (most recent call last):
  File "setup.py", line 8, in <module>
    from theano.compat.six.moves import input
ImportError: No module named theano.compat.six.moves

vi setup.pyして該当箇所を下記の通り編集
修正前

from theano.compat.six.moves import input

修正後

from six.moves import input

pylearn2のsetup

python setup.py develop --user

途中、下記のようにerror出るが無視

In file included from ext/_yaml.c:565:0:
ext/_yaml.h:2:18: fatal error: yaml.h: No such file or directory
compilation terminated.
Error compiling module, falling back to pure Python

3. 学習データとBNN-PYNQのインストール、学習の実行

きゅうりのデータをコピー

cd
git clone https://github.com/workpiles/CUCUMBER-9.git
cd CUCUMBER-9/prototype_1/
tar -zxvf cucumber-9-python.tar.gz

BNN-PYNQのソースもインストール

cd
git clone https://github.com/Xilinx/BNN-PYNQ.git
cd BNN-PYNQ/bnn/src/training/

FPGAでDeep Learningしてみる - きゅうりを選果するからcucumber9.pyを持ってきて実行

python cucumber9.py

ubuntu@ip-172-31-89-44:~/BNN-PYNQ/bnn/src/training$ python cucumber9.py
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release.  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5110)
batch_size = 50
alpha = 0.1
epsilon = 0.0001
W_LR_scale = Glorot
num_epochs = 500
LR_start = 0.001
LR_fin = 3e-07
LR_decay = 0.983907435305
save_path = cucumber9_parameters.npz
train_set_size = 2475
shuffle_parts = 1
Loading CUCUMBER9 dataset...
Building the CNN...
/home/ubuntu/.local/lib/python2.7/site-packages/theano/tensor/basic.py:2144: UserWarning: theano.tensor.round() changed its default from `half_away_from_zero` to `half_to_even` to have the same default as NumPy. Use the Theano flag `warn.round=False` to disable this warning.
  "theano.tensor.round() changed its default from"
W_LR_scale = 20.049938
H = 1
W_LR_scale = 27.712812
H = 1
W_LR_scale = 33.941124
H = 1
W_LR_scale = 39.191837
H = 1
W_LR_scale = 48.0
H = 1
W_LR_scale = 55.425625
H = 1
W_LR_scale = 22.627417
H = 1
W_LR_scale = 26.127892
H = 1
W_LR_scale = 18.63688
H = 1
Training...
Epoch 1 of 500 took 5.78331184387s
  LR:                            0.001
  training loss:                 1.485121870527462
  validation loss:               2.055072214868334
  validation error rate:         61.11111177338494%
  best epoch:                    1
  best validation error rate:    61.11111177338494%
  test loss:                     2.055072214868334
  test error rate:               61.11111177338494%

…

Epoch 500 of 500 took 5.06151795387s
  LR:                            3.04906731299e-07
  training loss:                 0.0023622200804645534
  validation loss:               0.15342154022720125
  validation error rate:         15.333333197567198%
  best epoch:                    373
  best validation error rate:    11.777777804268732%
  test loss:                     0.12793712980217403
  test error rate:               11.777777804268732%

同インスタンス上でFPGAでDeep Learningしてみる - きゅうりを選果するをもとに作成したcucumber9-gen-binary-weights.pyを実行すると、下記のように怒られるので、binary形式への変換はPYNQ上で行うこととした。

ubuntu@ip-172-31-89-44:~/BNN-PYNQ/bnn/src/training$ python cucumber9-gen-binary-weights.py cucumber9_parameters.npz
Using peCount = 16 simdCount = 3 for engine 0
Traceback (most recent call last):
  File "cucumber9-gen-binary-weights.py", line 63, in <module>
    (w,t) = rHW.readConvBNComplex(usePopCount=False)
TypeError: readConvBNComplex() takes at least 7 arguments (2 given)

4. 学習パラメータのPYNQ上へのコピー、binary形式への変換

PYNQ上にcucumber9-gen-binary-weights.pyとAWS上に出来たcucumber9_parameters.npzをscpする。
ローカルにcucumber9_parameters.npzをコピー

scp -i my-key.pem ubuntu@ec2-52-87-164-28.compute-1.amazonaws.com:~/BNN-PYNQ/bnn/src/training/cucumber9_parameters.npz  ./

PYNQ上へcucumber9-gen-binary-weights.pyとcucumber9_parameters.npzをコピー

scp cucumber9-gen-binary-weights.py　cucumber9_parameters.npz xilinx@192.168.0.9:

PYNQ上でcucumber9-gen-binary-weights.pyを実行

sudo cp cucumber9-gen-binary-weights.py /opt/python3.6/lib/python3.6/site-packages/bnn/src/training/
python /opt/python3.6/lib/python3.6/site-packages/bnn/src/training/cucumber9-gen-binary-weights.py cucumber9_parameters.npz

xilinx@pynq:~$ python /opt/python3.6/lib/python3.6/site-packages/bnn/src/training/cucumber9-gen-binary-weights.py cucumber9_parameters.npz
Using peCount = 16 simdCount = 3 for engine 0
Extracting conv-BN complex, OFM=64 IFM=3 k=3
Layer 0: 64 x 27
WMem = 36 TMem = 4
Using peCount = 32 simdCount = 32 for engine 1
Extracting conv-BN complex, OFM=64 IFM=64 k=3
Layer 1: 64 x 576
WMem = 36 TMem = 2
Using peCount = 16 simdCount = 32 for engine 2
Extracting conv-BN complex, OFM=128 IFM=64 k=3
Layer 2: 128 x 576
WMem = 144 TMem = 8
Using peCount = 16 simdCount = 32 for engine 3
Extracting conv-BN complex, OFM=128 IFM=128 k=3
Layer 3: 128 x 1152
WMem = 288 TMem = 8
Using peCount = 4 simdCount = 32 for engine 4
Extracting conv-BN complex, OFM=256 IFM=128 k=3
Layer 4: 256 x 1152
WMem = 2304 TMem = 64
Using peCount = 1 simdCount = 32 for engine 5
Extracting conv-BN complex, OFM=256 IFM=256 k=3
Layer 5: 256 x 2304
WMem = 18432 TMem = 256
Using peCount = 1 simdCount = 4 for engine 6
Extracting FCBN complex, ins = 256 outs = 512
Interleaving 256 channels in fully connected layer...
Layer 6: 512 x 256
WMem = 32768 TMem = 512
Using peCount = 1 simdCount = 8 for engine 7
Extracting FCBN complex, ins = 512 outs = 512
Layer 7: 512 x 512
WMem = 32768 TMem = 512
Using peCount = 4 simdCount = 1 for engine 8
Extracting FCBN complex, ins = 512 outs = 9
Layer 8: 12 x 512
WMem = 1536 TMem = 3

binary形式に変換したパラメータを所定の位置にコピー

sudo mkdir /opt/python3.6/lib/python3.6/site-packages/bnn/params/cucumber9
sudo cp binparam-cnv-pynq/* /opt/python3.6/lib/python3.6/site-packages/bnn/params/cucumber9/

5. PYNQ上で推論

きゅうりのデータをgit cloneすると下記のように証明書のエラーとなるので、無視する設定入れる。

xilinx@pynq:~$ git clone https://github.com/workpiles/CUCUMBER-9.git
Cloning into 'CUCUMBER-9'...
fatal: unable to access 'https://github.com/workpiles/CUCUMBER-9.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none

export GIT_SSL_NO_VERIFY=1

学習(推論)データのコピー

git clone https://github.com/workpiles/CUCUMBER-9.git
cd CUCUMBER-9/prototype_1/
tar -zxvf cucumber-9-python.tar.gz

Jupyter notebook上で、Cifar10.ipynbをDuplicate, Renameして編集、下記の通りCucumber9.ipynbを作成して、実行

"4. Launching BNN in hardware" で、FPGAだと正しく0.0016秒程度で推論できてるようです。
"5. Launching BNN in software"で、CPUだと、0.834秒程度かかるようです。FPGAは約500倍早いですね。

In[1]:

import bnn
print(bnn.available_params(bnn.NETWORK_CNV))

classifier = bnn.CnvClassifier('cucumber9')

In[2]:

print(classifier.bnn.classes)

In[3]:

import _pickle, random
from PIL import Image
import numpy as np

def unpickle(file):
    fo = open(file, 'rb')
    dict = _pickle.load(fo, encoding='latin-1')
    fo.close()
    return dict

def image_from_cucumber(index, dic):
    name = dic['filenames'][index]
    r = dic['data'][index*3]
    g = dic['data'][index*3+1]
    b = dic['data'][index*3+2]
    label = dic['labels'][index]
    data = np.array([r, g, b]).T.reshape(32, 32, 3)
    img = Image.fromarray(data, 'RGB')
    img.save(name)
    return name, label

dic = unpickle('/home/xilinx/CUCUMBER-9/prototype_1/test_batch')
filename, label = image_from_cucumber(random.randint(0, len(dic['filenames'])), dic)
im = Image.open('/home/xilinx/jupyter_notebooks/bnn/'+filename)
print(classifier.class_name(label))
im

In[4]:

class_out=classifier.classify_image(im)
print("Class number: {0}".format(class_out))
print("Class name: {0}".format(classifier.class_name(class_out)))
print("Correct Class number: {0}".format(label))
print("Correct Class name: {0}".format(classifier.class_name(label)))

In[5]:

sw_class = bnn.CnvClassifier('cucumber9', bnn.RUNTIME_SW)

class_out = sw_class.classify_image(im)
print("Class number: {0}".format(class_out))
print("Class name: {0}".format(classifier.class_name(class_out)))
print("Correct Class number: {0}".format(label))
print("Correct Class name: {0}".format(classifier.class_name(label)))

In[6]:

from IPython.display import display

im.thumbnail((64, 64), Image.ANTIALIAS)
display(im) 
class_detail = classifier.classify_details(im)
print("classes: {0}".format(class_detail))

In[7]:

%matplotlib inline
import matplotlib.pyplot as plt

x_pos = np.arange(len(class_detail))
fig, ax = plt.subplots()
ax.bar(x_pos, (class_detail/255)-1)
ax.set_xticklabels(classifier.bnn.classes, rotation='vertical')
ax.set_xticks(x_pos)
ax.set
plt.show()

In[8]:

from pynq import Xlnk

xlnk = Xlnk()
xlnk.xlnk_reset()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up