LoginSignup
11
8

More than 5 years have passed since last update.

Chainer+CUDA7.5インストール記念 学習速度比較

Last updated at Posted at 2016-05-24

Chainer

Preferred Networksが開発したニューラルネットワーク系のアルゴリズムを実装するためのライブラリです.
インストールが簡単で,サンプルがサラッと動いてくれるとてもありがたいライブラリです.
CUDA,cuDNN(Nividaが提供しているGPUプログラミング用ライブラリ)をサポートしているので,
高速に学習ができます.
まだ入れただけなので,中身や書き方については追々調べていきます.

めっちゃ苦労してCUDA入れた割りに,
学習させるようなデータ持ってなくて持て余した感があります.

とりあえず,どのくらい速くなるのかと本当にGPUで動くようになってるかを確認するために,
実行時間の比較をしてみました.

学習速度対決 CPU vs GPU

今回使用したのは,MNISTという文字認識用のデータセットを学習するサンプルです.
Chainerをインストールするとchainer/examples/mnist/の中にソースがあります.
mnistのデータセットをダウンロードしていなくても,ちゃんとダウンロードして実行してくれます(すげぇ).

PCスペック

OS Ubuntu 14.04
CPU Corei7-6700K @ 4.00Ghz 4.01GHz
memory 16GB
GPU GeForce GTX 980 Ti

CPU実行

まずは,CPUで実行してみます.
念のため,データセットのダウンロードは終えて,読み込むのみで実行できるようになっています.

python chainer/examples/mnist/train_mnist.py

上記の命令をターミナルで実行すると学習が始まります.
実行結果は以下のようになります.

GPU: -1
# unit: 1000
# Minibatch-size: 100
# epoch: 20
Network type: simple

load MNIST dataset
epoch 1
graph generated
train mean loss=0.193684607968, accuracy=0.941816669144, throughput=382.497507814 images/sec
test  mean loss=0.0897071482369, accuracy=0.971200006008
epoch 2
train mean loss=0.0741340038716, accuracy=0.977033343315, throughput=406.475950487 images/sec
test  mean loss=0.0778348091705, accuracy=0.973800006509
epoch 3
train mean loss=0.0493809905124, accuracy=0.984500011106, throughput=416.919462236 images/sec
test  mean loss=0.0983875677761, accuracy=0.970500003695
epoch 4
train mean loss=0.0371512368962, accuracy=0.988050009112, throughput=424.445908227 images/sec
test  mean loss=0.0727728618858, accuracy=0.980100007057
epoch 5
train mean loss=0.026791135513, accuracy=0.99151667426, throughput=430.047949144 images/sec
test  mean loss=0.0807308329478, accuracy=0.978300008774
epoch 6
train mean loss=0.0232141814014, accuracy=0.992383340001, throughput=434.615325625 images/sec
test  mean loss=0.0789066048964, accuracy=0.978400005102
epoch 7
train mean loss=0.0213763673576, accuracy=0.993050006032, throughput=439.462914098 images/sec
test  mean loss=0.0991408845171, accuracy=0.975400009751
epoch 8
train mean loss=0.0181176779406, accuracy=0.993850005468, throughput=443.003842377 images/sec
test  mean loss=0.0786388919427, accuracy=0.979700005651
epoch 9
train mean loss=0.0155198848167, accuracy=0.994983338018, throughput=446.55789451 images/sec
test  mean loss=0.0727195013624, accuracy=0.982300007343
epoch 10
train mean loss=0.0136605950887, accuracy=0.996000003715, throughput=447.0434427 images/sec
test  mean loss=0.0977177753858, accuracy=0.980200006366
epoch 11
train mean loss=0.0158822959805, accuracy=0.995200004379, throughput=450.588657483 images/sec
test  mean loss=0.0977549223528, accuracy=0.978200004697
epoch 12
train mean loss=0.0141575096284, accuracy=0.995650004148, throughput=451.742022117 images/sec
test  mean loss=0.102174005379, accuracy=0.977900006771
epoch 13
train mean loss=0.011739959806, accuracy=0.996000003815, throughput=454.173978105 images/sec
test  mean loss=0.0732116527992, accuracy=0.982700006366
epoch 14
train mean loss=0.0103860100758, accuracy=0.996800002952, throughput=453.137354408 images/sec
test  mean loss=0.105503096388, accuracy=0.979400007725
epoch 15
train mean loss=0.0114992221524, accuracy=0.996383336782, throughput=456.086147842 images/sec
test  mean loss=0.0908791080594, accuracy=0.982300008535
epoch 16
train mean loss=0.0102321072909, accuracy=0.997166669369, throughput=455.044407309 images/sec
test  mean loss=0.0982518416016, accuracy=0.981600006819
epoch 17
train mean loss=0.00967611400099, accuracy=0.996700003147, throughput=454.125450447 images/sec
test  mean loss=0.116737326634, accuracy=0.979400006533
epoch 18
train mean loss=0.0112101602861, accuracy=0.996600003242, throughput=454.676580709 images/sec
test  mean loss=0.109036157257, accuracy=0.979900007844
epoch 19
train mean loss=0.0100171123226, accuracy=0.997566668987, throughput=454.506987354 images/sec
test  mean loss=0.128417006888, accuracy=0.979100006223
epoch 20
train mean loss=0.00690145226964, accuracy=0.997883335352, throughput=454.166769316 images/sec
test  mean loss=0.148765663427, accuracy=0.980600007176
save the model
save the optimizer

GPU実行

次は,GPUをオンにして実行してみます.
下記の命令をターミナルで実行するとGPUを使用して学習してくれます.

python /chainer/examples/mnist/train_mnist.py -g 0

実行結果は以下のようになります.

GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20
Network type: simple

load MNIST dataset
/home/===/.local/lib/python2.7/site-packages/chainer/cuda.py:85: UserWarning: cuDNN is not enabled.
Please reinstall chainer after you install cudnn
(see https://github.com/pfnet/chainer#installation).
  'cuDNN is not enabled.\n'
epoch 1
graph generated
train mean loss=0.195293062939, accuracy=0.941783336749, throughput=12157.8257794 images/sec
test  mean loss=0.099062378899, accuracy=0.968000005484
epoch 2
train mean loss=0.0731572592833, accuracy=0.977750009596, throughput=34844.6214764 images/sec
test  mean loss=0.0714594736902, accuracy=0.978200006485
epoch 3
train mean loss=0.0480061460981, accuracy=0.984883343677, throughput=34896.321176 images/sec
test  mean loss=0.0802571357274, accuracy=0.975300004482
epoch 4
train mean loss=0.0354296163357, accuracy=0.988533341686, throughput=34755.0474305 images/sec
test  mean loss=0.0762832925502, accuracy=0.97810000658
epoch 5
train mean loss=0.0294489985892, accuracy=0.99051667432, throughput=34728.8745577 images/sec
test  mean loss=0.087090963278, accuracy=0.973800008297
epoch 6
train mean loss=0.0256013906075, accuracy=0.991333340704, throughput=34881.2254504 images/sec
test  mean loss=0.0878038381525, accuracy=0.976900004148
epoch 7
train mean loss=0.0190409717603, accuracy=0.993816672166, throughput=34850.2719866 images/sec
test  mean loss=0.0915725244294, accuracy=0.977500004172
epoch 8
train mean loss=0.0171634630352, accuracy=0.994500005047, throughput=34788.4766754 images/sec
test  mean loss=0.110596073772, accuracy=0.975500004292
epoch 9
train mean loss=0.0170808044669, accuracy=0.9945500048, throughput=34524.4872325 images/sec
test  mean loss=0.0786540051977, accuracy=0.981700006723
epoch 10
train mean loss=0.0165474765735, accuracy=0.994733338257, throughput=34791.1795738 images/sec
test  mean loss=0.108565880053, accuracy=0.97650000751
epoch 11
train mean loss=0.0152545220839, accuracy=0.99521667103, throughput=34624.981013 images/sec
test  mean loss=0.100296210904, accuracy=0.977500011325
epoch 12
train mean loss=0.0115985524833, accuracy=0.996383336782, throughput=34589.1693057 images/sec
test  mean loss=0.0895931552666, accuracy=0.981100007892
epoch 13
train mean loss=0.0109584858765, accuracy=0.99655000329, throughput=34633.3723763 images/sec
test  mean loss=0.0907685719846, accuracy=0.982000006437
epoch 14
train mean loss=0.0128967459446, accuracy=0.996000003815, throughput=34653.0539405 images/sec
test  mean loss=0.108828655625, accuracy=0.980200007558
epoch 15
train mean loss=0.0112410126412, accuracy=0.996466669937, throughput=34728.9320689 images/sec
test  mean loss=0.0861825023301, accuracy=0.982900007367
epoch 16
train mean loss=0.0048500731786, accuracy=0.998566668034, throughput=34807.3672744 images/sec
test  mean loss=0.120737487848, accuracy=0.980800008774
epoch 17
train mean loss=0.0155691574255, accuracy=0.995716670652, throughput=34771.2591689 images/sec
test  mean loss=0.105860878132, accuracy=0.980900008678
epoch 18
train mean loss=0.0109717246999, accuracy=0.996816669703, throughput=34872.5492669 images/sec
test  mean loss=0.0907865288936, accuracy=0.982700006366
epoch 19
train mean loss=0.00805343958089, accuracy=0.997816668749, throughput=34720.4943407 images/sec
test  mean loss=0.102527958901, accuracy=0.982100006342
epoch 20
train mean loss=0.00828038509314, accuracy=0.997533335686, throughput=34750.9824616 images/sec
test  mean loss=0.123670719845, accuracy=0.980000007153
save the model
save the optimizer

速いっ!体感でも、ものすごく速い!!
CUDA入れるのが精一杯で、cuDNNは入っていません.なのでWorningが出てます.
入れるともっと速くなると思います.

結果

学習時間はepochごとに出力されます.epochとは,ニューラルネットワークの学習単位で,
今回のサンプルですと,学習データを一巡学習したら1epochです.

学習の速度に関する数値は,「throughput」に表示されます.
単位が「Images/sec」で,学習画像枚数ごとの秒数ですので数値が高い方が速いということになります.
では,CPUのみとGPU使用の20エポックの平均を出してみます(epoch 1だけ,
なぜがスループットが低下してますが,知識不足でまだ説明できません).
mnistのサンプルでは,学習枚数が60000枚なので,計算して秒にも直してみます.

throughput avg time [sec]
CPU 440.4659276154 136.219390056
GPU 33618.52462778 1.784730314

は,はやい・・・なんと,約76倍速くなってます.

最後に

今回は,インストールできた喜びで動作確認ついでに,ざっくりと速度比較だけしました.
学習したいデータが出てきたら,ちょっとずつchainerをいじっていこうかと思います.

追記 cuDNN入れてやってみた

もうちょい速くなるかと思い,cuDNN入れてやってみました.

throughput avg time [sec]
CPU 440.4659276154 136.219390056
GPU 33618.52462778 1.784730314
cuDNN 35423.65693952 1.693783341
11
8
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
11
8