Chainer 4.0 + iDeep がすごい

Last updated at 2018-12-02Posted at 2018-06-16

Chainer 4.0 + iDeepで簡単にパフォーマンス測定をしたところ、なかなかすごかったので、メモです。

せっかくなので、Google Cloud Platform(GCP)のAlway freeのインスタンス、f1-microで動かしてみて、パフォーマンスを比較してみました。（安いインスタンスでも速くなるというのを確認したかったので）

参考：
pip で MKL にリンクされた numpy, scipy が自動的にインストールされるようにする
 iDeep を使ってCPUでのChainerの推論速度をアップしよう

インストール方法

以下、Ubuntu 17.10 に入れています。

https://software.intel.com/en-us/mkl から、"Intel Math Kernel Library"をダウンロード
既存の chainer、numpy をアンインストールします $ sudo pip3 uninstall chainer numpy
MKLに必要なものをインストール $ apt install -qq cpio g++
$ tar xvzf downloads/l_mkl_2018.3.222.tgz && cd l_mkl_2018.3.222 && sudo ./install.sh 画面に従いインストール
$ export LD_LIBRARY_PATH="/opt/intel/mkl/lib/intel64:/opt/intel/lib/intel64:$LD_LIBRARY_PATH" (必要ならば、~/.bashrc などに書く)
pip で MKL にリンクされた numpy, scipy が自動的にインストールされるようにするを参考に ~/.numpy-site.cfg と ~/.config/pip/pip.conf を作成
$ pip3 install ideep4py (このとき、numpy がコンパイルされるため、時間がかかります)
$ pip3 install chainer

上記でインストールできました。

評価

iDeep を使ってCPUでのChainerの推論速度をアップしようの方法で速度比較してみます。

$ git clone https://github.com/terasakisatoshi/chainer-caltech-101.git
$ cd chainer-imagenet
$ wget http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz
$ tar xvzf 101_ObjectCategories.tar.gz
$ python reshape.py --source_dir 101_ObjectCategories --target_dir reshaped
$ python3 create_dataset.py reshaped
$ python3 compute_mean.py train.txt
$ python3 train.py train.txt test.txt -a googlenet -E 50 -g 0 -j 8
$ pip3 install imageio

学習をすると時間がかかるので、以下に学習済みモデルを置いておきました。

$ wget https://gist.github.com/ikeyasu/5e834ba10527b4eec22c16857ded45e7/raw/fa5edcfd5b9b6a1d8f50fed9dbfadada52853794/model_epoch_46.npz
$ mkdir result
$ mv model_epoch_46.npz result/

以下で推論できます。これで、時間を比較してみます。

$ time python3  predict.py

iDeep(mkl)無しの場合

real 1m59.391s
user 1m55.688s
sys 0m0.926s

約2分かかっています

iDeep(mkl)有りの場合

real 0m19.911s
user 0m12.775s
sys 0m0.529s

なんと20秒で済みました。

Chainer での iDeep の使い方

https://docs.chainer.org/en/latest/tips.html#how-do-i-accelerate-my-model-using-ideep-on-intel-cpu に書いてありますが、iDeepを使うには、

chainer.using_config('use_ideep', 'auto')

の設定をして、ロードしたモデルに対して、

model.to_intel64()

　
を呼ぶ必要があります。

use_ideepは、環境変数で

export CHAINER_USE_IDEEP="auto"

としても設定できます。

まとめ

Chainer 4.0 + iDeep を使う事で、Intel CPU で推論を簡単に速くできます。
GCPの f1-micro のような非力なインスタンスでも、その効果は絶大でした。

また、GPUと違い、データ転送のオーバーヘッドも無いため、とても使いやすいように思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up