学習時間評価 [TensorFlowでDeep Learning 11]

Last updated at 2017-11-15Posted at 2016-05-30

(目次はこちら)

はじめに

これまで、MNISTというデータセットを使って、ロジスティック回帰から畳み込みニューラルネットワークまで拡張しつつ、性能評価をやってきたが、特には学習時間については触れてこなかった。

前回の記事で、

人手で決めるパラメータもわりとあるので、試行錯誤やるには処理速度が早いに越したことない

と書きつつ、私自身、学習時間の評価していなかったのでやってみる。

評価環境

手元には古いMacBookProしかないので、Amazon EC2を使う。

評価対象は、以下。

Type	GPUs	vCPU	Mem(GiB)
c4.2xlarge	-	8	15
g2.2xlarge	1	8	15
g2.8xlarge	4	32	60

評価用に実行するものは、
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/cifar10/cifar10_train.py
で、複数GPU用に、
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py
を。

学習時間計測

c4.2xlarge

2016-05-30 05:09:20.010463: step 200, loss = 4.32 (267.5 examples/sec; 0.478 sec/batch)
2016-05-30 05:09:25.299022: step 210, loss = 4.31 (270.9 examples/sec; 0.472 sec/batch)
2016-05-30 05:09:30.089110: step 220, loss = 4.30 (261.6 examples/sec; 0.489 sec/batch)
2016-05-30 05:09:34.912969: step 230, loss = 4.28 (263.1 examples/sec; 0.487 sec/batch)
2016-05-30 05:09:39.770633: step 240, loss = 4.26 (258.6 examples/sec; 0.495 sec/batch)
2016-05-30 05:09:44.644984: step 250, loss = 4.24 (264.3 examples/sec; 0.484 sec/batch)
2016-05-30 05:09:49.479784: step 260, loss = 4.23 (266.1 examples/sec; 0.481 sec/batch)
2016-05-30 05:09:54.287899: step 270, loss = 4.21 (265.0 examples/sec; 0.483 sec/batch)
2016-05-30 05:09:59.165484: step 280, loss = 4.20 (261.2 examples/sec; 0.490 sec/batch)
2016-05-30 05:10:04.025648: step 290, loss = 4.18 (263.1 examples/sec; 0.486 sec/batch)

g2.2xlarge

2016-05-30 05:21:55.004440: step 200, loss = 4.33 (527.9 examples/sec; 0.242 sec/batch)
2016-05-30 05:21:57.512823: step 210, loss = 4.31 (535.3 examples/sec; 0.239 sec/batch)
2016-05-30 05:21:59.821739: step 220, loss = 4.30 (547.2 examples/sec; 0.234 sec/batch)
2016-05-30 05:22:02.215690: step 230, loss = 4.29 (595.0 examples/sec; 0.215 sec/batch)
2016-05-30 05:22:04.603561: step 240, loss = 4.27 (521.9 examples/sec; 0.245 sec/batch)
2016-05-30 05:22:06.887362: step 250, loss = 4.25 (541.2 examples/sec; 0.237 sec/batch)
2016-05-30 05:22:09.200980: step 260, loss = 4.24 (540.7 examples/sec; 0.237 sec/batch)
2016-05-30 05:22:11.491492: step 270, loss = 4.22 (556.0 examples/sec; 0.230 sec/batch)
2016-05-30 05:22:13.807076: step 280, loss = 4.21 (525.3 examples/sec; 0.244 sec/batch)
2016-05-30 05:22:16.096672: step 290, loss = 4.19 (586.2 examples/sec; 0.218 sec/batch)

g2.8xlarge

2016-05-30 05:47:42.305082: step 200, loss = 4.32 (1328.2 examples/sec; 0.096 sec/batch)
2016-05-30 05:47:46.316455: step 210, loss = 4.31 (1331.8 examples/sec; 0.096 sec/batch)
2016-05-30 05:47:50.058900: step 220, loss = 4.29 (1388.5 examples/sec; 0.092 sec/batch)
2016-05-30 05:47:53.623737: step 230, loss = 4.27 (1416.1 examples/sec; 0.090 sec/batch)
2016-05-30 05:47:57.277992: step 240, loss = 4.26 (1589.8 examples/sec; 0.081 sec/batch)
2016-05-30 05:48:00.921081: step 250, loss = 4.24 (1449.4 examples/sec; 0.088 sec/batch)
2016-05-30 05:48:04.558838: step 260, loss = 4.23 (1424.1 examples/sec; 0.090 sec/batch)
2016-05-30 05:48:08.242038: step 270, loss = 4.21 (1231.7 examples/sec; 0.104 sec/batch)
2016-05-30 05:48:11.875426: step 280, loss = 4.20 (1525.9 examples/sec; 0.084 sec/batch)
2016-05-30 05:48:15.530232: step 290, loss = 4.18 (1509.4 examples/sec; 0.085 sec/batch)
2016-05-30 05:48:19.162920: step 300, loss = 4.17 (1609.2 examples/sec; 0.080 sec/batch)

評価

MNISTのときは、g2.2xlargeだと、c4.2xlargeに比べて（正確に計測していないが）7倍くらいになってたけど、CIFAR10だと、2倍程度。なぜだろう。当然であるが、g2.8xlargeが最も早い。

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py
に、

System        | Step Time (sec/batch)  |     Accuracy
--------------------------------------------------------------------
1 Tesla K20m  | 0.35-0.60              | ~86% at 60K steps  (5 hours)
1 Tesla K40m  | 0.25-0.35              | ~86% at 100K steps (4 hours)
2 Tesla K20m  | 0.13-0.20              | ~84% at 30K steps  (2.5 hours)
3 Tesla K20m  | 0.13-0.18              | ~84% at 30K steps
4 Tesla K20m  | ~0.10                  | ~84% at 30K steps

ってあるので、まぁ60K回せばいいかんじ。

（今回は途中で止めたけど）60K回したとすると、学習時間はこのくらいになるかと。

Type	TrainingTime
c4.2xlarge	8 hours
g2.2xlarge	4 hours
g2.8xlarge	1.5 hours

オンデマンドインスタンスだと、（スポットインスタンスだと75% offくらい）

Type	Price
c4.2xlarge	$0.419 / hour
g2.2xlarge	$0.65 / hour
g2.8xlarge	$2.6 / hour

なので、コスパが最も高いのは、g2.2xlargeか。でも、待ち時間がもったいないので、ある程度検証できたら、g2.8xlargeを利用するのがよさそう。

東京リージョンだと、GPUインスタンスは、1.5倍くらいの値段なのか・・。

==
P2インスタンスが使えるようになったので、TensorFlowでDeep Neural Networks (13) 学習時間評価その2を書きました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up