Chainer
Preferred Networksが開発したニューラルネットワーク系のアルゴリズムを実装するためのライブラリです.
インストールが簡単で,サンプルがサラッと動いてくれるとてもありがたいライブラリです.
CUDA,cuDNN(Nividaが提供しているGPUプログラミング用ライブラリ)をサポートしているので,
高速に学習ができます.
まだ入れただけなので,中身や書き方については追々調べていきます.
めっちゃ苦労してCUDA入れた割りに,
学習させるようなデータ持ってなくて持て余した感があります.
とりあえず,どのくらい速くなるのかと本当にGPUで動くようになってるかを確認するために,
実行時間の比較をしてみました.
学習速度対決 CPU vs GPU
今回使用したのは,MNISTという文字認識用のデータセットを学習するサンプルです.
Chainerをインストールするとchainer/examples/mnist/の中にソースがあります.
mnistのデータセットをダウンロードしていなくても,ちゃんとダウンロードして実行してくれます(すげぇ).
PCスペック
OS | Ubuntu 14.04 |
CPU | Corei7-6700K @ 4.00Ghz 4.01GHz |
memory | 16GB |
GPU | GeForce GTX 980 Ti |
CPU実行
まずは,CPUで実行してみます.
念のため,データセットのダウンロードは終えて,読み込むのみで実行できるようになっています.
python chainer/examples/mnist/train_mnist.py
上記の命令をターミナルで実行すると学習が始まります.
実行結果は以下のようになります.
GPU: -1
# unit: 1000
# Minibatch-size: 100
# epoch: 20
Network type: simple
load MNIST dataset
epoch 1
graph generated
train mean loss=0.193684607968, accuracy=0.941816669144, throughput=382.497507814 images/sec
test mean loss=0.0897071482369, accuracy=0.971200006008
epoch 2
train mean loss=0.0741340038716, accuracy=0.977033343315, throughput=406.475950487 images/sec
test mean loss=0.0778348091705, accuracy=0.973800006509
epoch 3
train mean loss=0.0493809905124, accuracy=0.984500011106, throughput=416.919462236 images/sec
test mean loss=0.0983875677761, accuracy=0.970500003695
epoch 4
train mean loss=0.0371512368962, accuracy=0.988050009112, throughput=424.445908227 images/sec
test mean loss=0.0727728618858, accuracy=0.980100007057
epoch 5
train mean loss=0.026791135513, accuracy=0.99151667426, throughput=430.047949144 images/sec
test mean loss=0.0807308329478, accuracy=0.978300008774
epoch 6
train mean loss=0.0232141814014, accuracy=0.992383340001, throughput=434.615325625 images/sec
test mean loss=0.0789066048964, accuracy=0.978400005102
epoch 7
train mean loss=0.0213763673576, accuracy=0.993050006032, throughput=439.462914098 images/sec
test mean loss=0.0991408845171, accuracy=0.975400009751
epoch 8
train mean loss=0.0181176779406, accuracy=0.993850005468, throughput=443.003842377 images/sec
test mean loss=0.0786388919427, accuracy=0.979700005651
epoch 9
train mean loss=0.0155198848167, accuracy=0.994983338018, throughput=446.55789451 images/sec
test mean loss=0.0727195013624, accuracy=0.982300007343
epoch 10
train mean loss=0.0136605950887, accuracy=0.996000003715, throughput=447.0434427 images/sec
test mean loss=0.0977177753858, accuracy=0.980200006366
epoch 11
train mean loss=0.0158822959805, accuracy=0.995200004379, throughput=450.588657483 images/sec
test mean loss=0.0977549223528, accuracy=0.978200004697
epoch 12
train mean loss=0.0141575096284, accuracy=0.995650004148, throughput=451.742022117 images/sec
test mean loss=0.102174005379, accuracy=0.977900006771
epoch 13
train mean loss=0.011739959806, accuracy=0.996000003815, throughput=454.173978105 images/sec
test mean loss=0.0732116527992, accuracy=0.982700006366
epoch 14
train mean loss=0.0103860100758, accuracy=0.996800002952, throughput=453.137354408 images/sec
test mean loss=0.105503096388, accuracy=0.979400007725
epoch 15
train mean loss=0.0114992221524, accuracy=0.996383336782, throughput=456.086147842 images/sec
test mean loss=0.0908791080594, accuracy=0.982300008535
epoch 16
train mean loss=0.0102321072909, accuracy=0.997166669369, throughput=455.044407309 images/sec
test mean loss=0.0982518416016, accuracy=0.981600006819
epoch 17
train mean loss=0.00967611400099, accuracy=0.996700003147, throughput=454.125450447 images/sec
test mean loss=0.116737326634, accuracy=0.979400006533
epoch 18
train mean loss=0.0112101602861, accuracy=0.996600003242, throughput=454.676580709 images/sec
test mean loss=0.109036157257, accuracy=0.979900007844
epoch 19
train mean loss=0.0100171123226, accuracy=0.997566668987, throughput=454.506987354 images/sec
test mean loss=0.128417006888, accuracy=0.979100006223
epoch 20
train mean loss=0.00690145226964, accuracy=0.997883335352, throughput=454.166769316 images/sec
test mean loss=0.148765663427, accuracy=0.980600007176
save the model
save the optimizer
GPU実行
次は,GPUをオンにして実行してみます.
下記の命令をターミナルで実行するとGPUを使用して学習してくれます.
python /chainer/examples/mnist/train_mnist.py -g 0
実行結果は以下のようになります.
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20
Network type: simple
load MNIST dataset
/home/===/.local/lib/python2.7/site-packages/chainer/cuda.py:85: UserWarning: cuDNN is not enabled.
Please reinstall chainer after you install cudnn
(see https://github.com/pfnet/chainer#installation).
'cuDNN is not enabled.\n'
epoch 1
graph generated
train mean loss=0.195293062939, accuracy=0.941783336749, throughput=12157.8257794 images/sec
test mean loss=0.099062378899, accuracy=0.968000005484
epoch 2
train mean loss=0.0731572592833, accuracy=0.977750009596, throughput=34844.6214764 images/sec
test mean loss=0.0714594736902, accuracy=0.978200006485
epoch 3
train mean loss=0.0480061460981, accuracy=0.984883343677, throughput=34896.321176 images/sec
test mean loss=0.0802571357274, accuracy=0.975300004482
epoch 4
train mean loss=0.0354296163357, accuracy=0.988533341686, throughput=34755.0474305 images/sec
test mean loss=0.0762832925502, accuracy=0.97810000658
epoch 5
train mean loss=0.0294489985892, accuracy=0.99051667432, throughput=34728.8745577 images/sec
test mean loss=0.087090963278, accuracy=0.973800008297
epoch 6
train mean loss=0.0256013906075, accuracy=0.991333340704, throughput=34881.2254504 images/sec
test mean loss=0.0878038381525, accuracy=0.976900004148
epoch 7
train mean loss=0.0190409717603, accuracy=0.993816672166, throughput=34850.2719866 images/sec
test mean loss=0.0915725244294, accuracy=0.977500004172
epoch 8
train mean loss=0.0171634630352, accuracy=0.994500005047, throughput=34788.4766754 images/sec
test mean loss=0.110596073772, accuracy=0.975500004292
epoch 9
train mean loss=0.0170808044669, accuracy=0.9945500048, throughput=34524.4872325 images/sec
test mean loss=0.0786540051977, accuracy=0.981700006723
epoch 10
train mean loss=0.0165474765735, accuracy=0.994733338257, throughput=34791.1795738 images/sec
test mean loss=0.108565880053, accuracy=0.97650000751
epoch 11
train mean loss=0.0152545220839, accuracy=0.99521667103, throughput=34624.981013 images/sec
test mean loss=0.100296210904, accuracy=0.977500011325
epoch 12
train mean loss=0.0115985524833, accuracy=0.996383336782, throughput=34589.1693057 images/sec
test mean loss=0.0895931552666, accuracy=0.981100007892
epoch 13
train mean loss=0.0109584858765, accuracy=0.99655000329, throughput=34633.3723763 images/sec
test mean loss=0.0907685719846, accuracy=0.982000006437
epoch 14
train mean loss=0.0128967459446, accuracy=0.996000003815, throughput=34653.0539405 images/sec
test mean loss=0.108828655625, accuracy=0.980200007558
epoch 15
train mean loss=0.0112410126412, accuracy=0.996466669937, throughput=34728.9320689 images/sec
test mean loss=0.0861825023301, accuracy=0.982900007367
epoch 16
train mean loss=0.0048500731786, accuracy=0.998566668034, throughput=34807.3672744 images/sec
test mean loss=0.120737487848, accuracy=0.980800008774
epoch 17
train mean loss=0.0155691574255, accuracy=0.995716670652, throughput=34771.2591689 images/sec
test mean loss=0.105860878132, accuracy=0.980900008678
epoch 18
train mean loss=0.0109717246999, accuracy=0.996816669703, throughput=34872.5492669 images/sec
test mean loss=0.0907865288936, accuracy=0.982700006366
epoch 19
train mean loss=0.00805343958089, accuracy=0.997816668749, throughput=34720.4943407 images/sec
test mean loss=0.102527958901, accuracy=0.982100006342
epoch 20
train mean loss=0.00828038509314, accuracy=0.997533335686, throughput=34750.9824616 images/sec
test mean loss=0.123670719845, accuracy=0.980000007153
save the model
save the optimizer
速いっ!体感でも、ものすごく速い!!
CUDA入れるのが精一杯で、cuDNNは入っていません.なのでWorningが出てます.
入れるともっと速くなると思います.
結果
学習時間はepochごとに出力されます.epochとは,ニューラルネットワークの学習単位で,
今回のサンプルですと,学習データを一巡学習したら1epochです.
学習の速度に関する数値は,「throughput」に表示されます.
単位が「Images/sec」で,学習画像枚数ごとの秒数ですので数値が高い方が速いということになります.
では,CPUのみとGPU使用の20エポックの平均を出してみます(epoch 1だけ,
なぜがスループットが低下してますが,知識不足でまだ説明できません).
mnistのサンプルでは,学習枚数が60000枚なので,計算して秒にも直してみます.
throughput avg | time [sec] | |
---|---|---|
CPU | 440.4659276154 | 136.219390056 |
GPU | 33618.52462778 | 1.784730314 |
は,はやい・・・なんと,約76倍速くなってます.
最後に
今回は,インストールできた喜びで動作確認ついでに,ざっくりと速度比較だけしました.
学習したいデータが出てきたら,ちょっとずつchainerをいじっていこうかと思います.
#追記 cuDNN入れてやってみた
もうちょい速くなるかと思い,cuDNN入れてやってみました.
throughput avg | time [sec] | |
---|---|---|
CPU | 440.4659276154 | 136.219390056 |
GPU | 33618.52462778 | 1.784730314 |
cuDNN | 35423.65693952 | 1.693783341 |