More than 5 years have passed since last update.

Chainer でマルチGPUを試してみる

Last updated at 2019-04-09Posted at 2019-04-09

やりたいこと

せっかくPCに2枚GPUがあるので、マルチGPUでDeepLearningしてみたい！
ということで、Chainerでやってみました。

環境

実行環境は下記の通りです。

OS: Windows 10 Pro
CPU: Intel Xeon E3-1240v3 3.40GHz
メインメモリ: 24GB(4GBx2, 8GBx2)
GPU 0: NVIDIA GeForce GTX 1050 Ti 4GB
GPU 1: NVIDIA Quadro K2000 2GB
Disk: Samsung SSD 860 EVO 500GB
python=3.6.8
chainer=5.3.0 (condaで入れた)
cupy-cuda100=5.4.0 (pipで入れた)

OSライセンス付、GPU(K2000)付の中古PCを約3.2万円で購入し、モニタ1万円未満、GPU(GTX1050Ti) 2万円未満、SSD 1万円未満、メモリ16GB 1万円未満の総額約8万円です。
なぜかcondaではcupy=4.1.0しか入らなかったので、アンインストールしてpipで入れなおしました。

実行

GitHubからChainerのbranch v5をクローンし、/examples/mnist/ を試してみます。

> git clone -b v5 https://github.com/chainer/chainer.git
> cd chainer/examples/mnist

CPU: Intel Xeon E3-1240v3 3.40GHz

> python train_mnist.py
GPU: -1
# unit: 1000
# Minibatch-size: 100
# epoch: 20

Downloading from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz...
epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.188287    0.0964475             0.942783       0.9711                    48.1507
2           0.0740424   0.092968              0.976533       0.9704                    84.4993
3           0.0471156   0.0895041             0.9851         0.9737                    120.279
4           0.0368677   0.0782165             0.988          0.9766                    154.746
5           0.0272672   0.0831235             0.991333       0.977                     188.724
6           0.02559     0.0846559             0.99175        0.9786                    224.771
7           0.0204407   0.0845285             0.99305        0.9796                    258.854
8           0.0201815   0.0869562             0.9936         0.978                     293.897
9           0.016543    0.0926294             0.994783       0.9777                    328.331
10          0.0141709   0.0804844             0.995367       0.9821                    362.436
11          0.0118971   0.080963              0.99615        0.9823                    396.654
12          0.012708    0.084998              0.9962         0.9809                    430.591
13          0.0135935   0.0925437             0.995783       0.9803                    467.608
14          0.0108333   0.098339              0.996567       0.9804                    503.071
15          0.0098921   0.123784              0.996867       0.9795                    537.966
16          0.0143277   0.0788444             0.9964         0.9831                    572.556
17          0.00864772  0.090874              0.997117       0.984                     604.493
18          0.011913    0.0743239             0.996917       0.9843                    635.828
19          0.00436262  0.107345              0.99865        0.983                     667.157
20          0.00707226  0.0903141             0.998217       0.9837                    698.653

GPU 0: NVIDIA GeForce GTX 1050 Ti 4GB

> python train_mnist.py --gpu 0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.189976    0.0892177             0.942833       0.9735                    12.0189
2           0.0752845   0.0832106             0.976366       0.9741                    16.1882
3           0.0471824   0.0665943             0.984682       0.98                      19.9812
4           0.034498    0.0805068             0.989182       0.977                     23.9403
5           0.0285627   0.0817467             0.990881       0.9791                    27.7564
6           0.0246776   0.0762155             0.991915       0.9808                    31.5498
7           0.0209579   0.0818291             0.993148       0.9803                    35.3837
8           0.0173544   0.0905734             0.994382       0.977                     39.4301
9           0.0153213   0.0809816             0.995099       0.9824                    43.419
10          0.0141218   0.081298              0.995582       0.9817                    47.2647
11          0.0122369   0.0851226             0.995982       0.9811                    51.1495
12          0.0152793   0.0824623             0.995365       0.9828                    55.0064
13          0.0127705   0.087196              0.996265       0.9811                    58.7889
14          0.01036     0.0989774             0.996733       0.9814                    62.6396
15          0.00783721  0.0930669             0.997566       0.9805                    66.3864
16          0.0120825   0.0977217             0.996332       0.9799                    70.1693
17          0.00943458  0.121491              0.997449       0.9799                    74.0379
18          0.0117653   0.0909474             0.996516       0.9825                    77.9659
19          0.00977425  0.0968582             0.997549       0.9829                    81.7787
20          0.0117059   0.113914              0.996666       0.9794                    85.7096

GPU 1: NVIDIA Quadro K2000 2GB

> python train_mnist.py --gpu 1
GPU: 1
# unit: 1000
# Minibatch-size: 100
# epoch: 20

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.19171     0.100066              0.94255        0.9696                    11.4576
2           0.0754682   0.0875764             0.976649       0.9738                    16.7115
3           0.0471999   0.0760614             0.984499       0.9785                    21.9246
4           0.0335111   0.0781458             0.989282       0.9759                    27.1607
5           0.0279633   0.0822519             0.990799       0.9778                    32.3802
6           0.025783    0.0711645             0.991965       0.9816                    37.601
7           0.0176336   0.0900763             0.994332       0.9792                    42.7692
8           0.0175025   0.0811717             0.994532       0.9809                    47.9583
9           0.0163443   0.104123              0.994765       0.9786                    53.2008
10          0.016325    0.0901095             0.994698       0.9805                    58.4468
11          0.0148685   0.0930068             0.995565       0.9808                    63.6781
12          0.0125574   0.0930329             0.996132       0.9812                    68.9363
13          0.0129305   0.095227              0.995966       0.9823                    74.184
14          0.0088038   0.118182              0.996949       0.9803                    79.4221
15          0.013182    0.101077              0.995799       0.9811                    84.649
16          0.0113774   0.090629              0.996616       0.9816                    89.8784
17          0.00734026  0.0970724             0.997716       0.9832                    95.1036
18          0.0116805   0.0985688             0.996632       0.9832                    100.348
19          0.00937403  0.102181              0.997116       0.9816                    105.602
20          0.00310589  0.116704              0.99895        0.9825                    110.866

マルチGPU (NVIDIA GeForce GTX 1050 Ti 4GB + NVIDIA Quadro K2000 2GB)

> python train_mnist_data_parallel.py
GPU: 0, 1
# unit: 1000
# Minibatch-size: 400
# epoch: 20

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.279714    0.120431              0.919166       0.9613                    6.57181
2           0.0852208   0.0769835             0.974567       0.9749                    10.1082
3           0.0517447   0.0801815             0.983967       0.9737                    13.6456
4           0.03252     0.0904261             0.989734       0.9727                    17.1756
5           0.0225429   0.0669389             0.993167       0.98                      20.7171
6           0.0181462   0.0622369             0.9942         0.9822                    24.2495
7           0.0133678   0.0835941             0.995733       0.9765                    27.785
8           0.014366    0.0858466             0.9953         0.978                     31.3237
9           0.00786593  0.0903438             0.997534       0.9795                    34.8555
10          0.0089393   0.0749918             0.9968         0.9821                    38.3886
11          0.0072351   0.0788488             0.997867       0.9823                    41.9232
12          0.00581059  0.0780663             0.9983         0.9826                    45.4561
13          0.0120998   0.0941829             0.995334       0.9785                    48.9994
14          0.0116896   0.0939382             0.995967       0.9792                    52.5245
15          0.00721718  0.0831597             0.997733       0.9822                    56.0741
16          0.00422235  0.0846977             0.9986         0.9832                    59.6151
17          0.00593313  0.0860614             0.998133       0.9827                    63.1654
18          0.00420095  0.0804558             0.998767       0.9836                    66.7144
19          0.00655401  0.0929561             0.997733       0.982                     70.2485
20          0.00847227  0.0957024             0.997434       0.9821                    73.7951

結果まとめ

モデルは3層のMLPで、validation/main/lossが数epochで上がり出す過学習ですが、GPU 0単体に比べ約14%学習時間を短縮できました。
GPUメモリを合わせて6GB使えればGTX 1060並みなのでは？

xPU	elapsed_time
CPU	698.653
GPU 0	85.710
GPU 1	110.866
2GPUs	73.795

マルチGPUの指定方法

下記の通り、updater を StandardUpdater から ParallelUpdater にして、 devices= で指定するだけで、マルチGPUにできるようです。これは簡単！

シングルGPU train_mnist.py

    updater = training.updaters.StandardUpdater(
        train_iter, optimizer, device=args.gpu)

マルチGPU train_mnist_data_parallel.py

    updater = training.updaters.ParallelUpdater(
        train_iter,
        optimizer,
        # The device of the name 'main' is used as a "master", while others are
        # used as slaves. Names other than 'main' are arbitrary.
        devices={'main': args.gpu0, 'second': args.gpu1},
    )

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up