0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Batch Sizeの影響を減らす方法

Last updated at Posted at 2024-05-12

Batch Sizeを変えるとloss curveが変わる

GPUがVRAMのサイズが違うと、Batch Sizeが変わる。Batch Sizeだけを変えるとTrainingの結果が大きく変わってしまう事がある。

learning rateのscaling

Batch Sizeを増やした割合だけ、Learning Rateも増やす

new_lr = base_lr * (new_batch_size / base_batch_size)

image.png

Learning RateのLinear Scaling RuleとWarmupを使えば、batch_size=8Kまでerrorを増やさずに学習できる。batch_sizeが大きすぎるとOver Smoothingで精度が落ちる。
image.png

Reference

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?