Movidius NCSをRaspberryPi3のpythonで動かしてみた #Python

はじめに

Intelから発売されたDeepLearningを小型軽量な端末で利用できるようにするUSBアクセラレータ Movidius を購入してみました。CVPR17で現地でのみ先行販売されていたものがRSからも販売開始されました。今回はサンプル実行と共に、pythonライブラリを触ってみます。

できること

CaffeのモデルをMovidius上で動くように変換
RasPiからPython経由でMovidiusを制御
AlexNetが97msec, GoogleNetが113msecで画像1枚あたり計算可能

実行環境

Movidius NCS (RS components から購入可能です。¥9,900)
Ubuntu16.04 (今回 MacOSXの上にDockerを立ち上げています)
RaspberryPi3 Raspbian Jessie (Ubuntuのみの対応でしたが現在Raspbianにも対応しています)

RasPi側は変換されたモデルを実行するだけなので必要なAPIだけインストールします。

手順

CaffeのモデルをMovidius上で動くように変換

環境構築

ここではMac上にDocker環境を立ち上げ、そこにSDKをインストールしてモデルを変換します。こちらの記事を参考に作っていきます。まずは適当なディレクトリにDocker環境を作ります。

mkdir Docker && cd Docker
git clone https://github.com/peisuke/MovidiusNCS-setup.git
cd MovidiusNCS-setup
docker build -t movidius .

... 環境構築に時間がかかります。
設定が終わったらdockerを起動します。変換したモデルファイルをMac側とやりとりするために共有フォルダを設定して起動します。

docker run -it --rm -v [Docker環境と共有したいMac側のディレクトリ]:/home/ubuntu/data movidius:latest /bin/bash

Caffeモデルの変換

変換元となるCaffeのモデルをダウンロードします。AlexNetの場合は以下の通りです。

mkdir -p data/AlexNet && cd data/AlexNet
wget http://dl.caffe.berkeleyvision.org/bvlc_alexnet.caffemodel
wget https://raw.githubusercontent.com/BVLC/caffe/master/models/bvlc_alexnet/deploy.prototxt

サンプルでは１枚の画像に対しての処理なのでネットワークのbatch数を10→1に変更します

data/AlexNet/deploy.prototxt

input_param { shape: { dim: 1 dim: 3 dim: 224 dim: 224 } }

ここでCaffeのモデルを変換します。後ほど比較しますが、(-s 12)オプションをつけることで実行速度が3〜4倍になります。

cd data/AlexNet
python3 ../../bin/mvNCCompile.pyc ./deploy.prototxt (-s 12) 
-w ./bvlc_alexnet.caffemodel -o graph

RasPiからPython経由で高速に実行

ここからはRasPiでの作業になります。
一般的なRaspbianのインストール方法については省略します。RasPi側は変換されたモデルを実行するだけなのでここではAPIなど必要なものだけインストールします。

wget https://ncs-forum-uploads.s3.amazonaws.com/ncsdk/MvNC_SDK_01_07_07/MvNC_SDK_1.07.07.tgz
tar xvf MvNC_SDK_1.07.07.tgz
tar xvf MvNC_API-1.07.07.tgz
cd ncapi/redist/pi_jessie
sudo dpkg -i *

ここでMvNC_SDK_1.07.07.tgzを解凍したときに得られるもう一方のファイルMvNC_Toolkit-1.07.06.tgzは実行には必要ないようです。（モデル変換にのみ使用）

先ほど変換したgraphファイルを、scpなどでRasPi側に移動します。

scp ./graph pi@***.***.**.**:~/***/ncapi/network/AlexNet/

Pythonのサンプルは以下を試します。
ncapi/py_examples/classification_example.py

ですが、このまま実行すると、ファイルがいくつか足りないエラーが出てしまいます。

$ python3 classification_example.py 2                           
Found stale device, resetting
Device 0 Address: 1.4 - VID/PID 03e7:2150
Starting wait for connect with 2000ms timeout
Found Address: 1.4 - VID/PID 03e7:2150
Found EP 0x81 : max packet size is 512 bytes
Found EP 0x01 : max packet size is 512 bytes
Found and opened device
Performing bulk write of 825136 bytes...
Successfully sent 825136 bytes of data in 47.187813 ms (16.676149 MB/s)
Boot successful, device address 1.4
Found Address: 1.4 - VID/PID 040e:f63b
done
Booted 1.4 -> VSC
Traceback (most recent call last):
  File "classification_example.py", line 52, in <module>
    ilsvrc_mean = numpy.load('../mean/ilsvrc12/ilsvrc_2012_mean.npy').mean(1).mean(1) #loading the mean file
  File "/usr/local/lib/python3.4/dist-packages/numpy/lib/npyio.py", line 370, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '../mean/ilsvrc12/ilsvrc_2012_mean.npy'

先ほどのインストールしなかったToolkitをsetupすると、適切なILSVRCのラベルファイルをダウンロードしてきてくれるのですが、スキップしたので必要なものだけエラーが出てきた場所に適宜ダウンロードします。

wget https://github.com/BVLC/caffe/blob/master/python/caffe/imagenet/ilsvrc_2012_mean.npy
wget https://github.com/HoldenCaulfieldRye/caffe/blob/master/data/ilsvrc12/synset_words.txt

実行

成功すると以下のようになります。

$ python3 classification_example.py 1
Device 0 Address: 1.4 - VID/PID 03e7:2150
Starting wait for connect with 2000ms timeout
Found Address: 1.4 - VID/PID 03e7:2150
Found EP 0x81 : max packet size is 512 bytes
Found EP 0x01 : max packet size is 512 bytes
Found and opened device
Performing bulk write of 825136 bytes...
Successfully sent 825136 bytes of data in 47.039351 ms (16.728781 MB/s)
Boot successful, device address 1.4
Found Address: 1.4 - VID/PID 040e:f63b
done
Booted 1.4 -> VSC

------- predictions --------
prediction 1 is n02123045 tabby, tabby cat
prediction 2 is n02124075 Egyptian cat
prediction 3 is n02127052 lynx, catamount
prediction 4 is n02123394 Persian cat
prediction 5 is n02971356 carton

Python Module

Pythonサンプルの中身を見ていきましょう。
ネットワークファイルの使用に関わる部分は以下の部分です。

from mvnc import mvncapi as mvnc
import cv2

mvnc.SetGlobalOption(mvnc.GlobalOption.LOGLEVEL, 2)
devices = mvnc.EnumerateDevices() #接続されたMovidiusを確認
if len(devices) == 0:
    print('No devices found')
    quit()
device = mvnc.Device(devices[0])
device.OpenDevice()
opt = device.GetDeviceOption(mvnc.DeviceOption.OPTIMISATIONLIST)


network_blob='../networks/AlexNet/graph' #変換済みモデルファイル名前
f = open(network_blob, mode='rb')   
blob = f.read()
graph = device.AllocateGraph(blob) #変換済みモデルをMovidiusにセット
graph.SetGraphOption(mvnc.GraphOption.ITERATIONS, 1)
iterations = graph.GetGraphOption(mvnc.GraphOption.ITERATIONS)

img = cv2.imread('***.jpg')
graph.LoadTensor(img.astype(numpy.float16), 'user object') #入力に画像データを格納
output, userobj = graph.GetResult() #ここでForward計算
graph.DeallocateGraph() #終了処理
device.CloseDevice()

処理時間比較 (Forward計算)

同じネットワークをPCでやったもの、Movidiusでオプションを変えて行ったものの、画像一枚あたりのそれぞれの処理時間は以下の通りです。

	AlexNet (224x224 RGB)	GoogleNet (227x227 RGB)
MacBookPro (CPU 2.7GHz Corei5)	0.091s	0.315s
Pi3 + Movidius (-s 12 オプション無)	0.287s	0.574s
Pi3 + Movidius (-s 12 オプション有)	0.097s	0.113s

GPUほどではありませんが、Corei5相当以上の速度が出ています。
RasPiでもいよいよ本格的なDeepLearningが計算できるようになってきたので
今後様々な利用方法が期待されますね。