More than 5 years have passed since last update.

【今日のアブストラクト】Understanding Batch Normalization【論文 DeepL 翻訳】

Posted at 2020-04-25

この記事は自分用のメモみたいなものです.
ほぼ DeepL 翻訳でお送りします.
間違いがあれば指摘していだだけると嬉しいです.

Abstract

訳文

バッチ正規化 (BN) は, ディープニューラルネットワークの中間層の活性化を正規化する手法である. 精度の向上や学習の高速化が可能であることから, ディープラーニングでは人気の手法となっている. しかし, 大きな成功を収めているにもかかわらず, その背景にある正確な理由やメカニズムについては, まだほとんどコンセンサスが得られていないのが現状である. 本論文では, 経験的なアプローチにより, BN の理解を深めるための一歩を踏み出す. 我々はいくつかの実験を行い, BN が主により大きな学習率での学習を可能にし, それがより速い収束とより良い一般化の原因であることを示した. BN を用いないネットワークでは, 大きな勾配更新を行うと, ネットワークの深さに応じて発散する損失と活性化が制御不能に増大し, 可能な学習率が制限されることを実証した. BN は常にゼロ平均で単位標準偏差の活性化を修正することでこの問題を回避し, より大きな勾配ステップを可能にし, より速い収束をもたらし, 鋭い局所最小値を回避するのに役立つかもしれない. さらに, 正規化されていない深層ネットワークの勾配や活性化がどのようにして不良な挙動を示すのか, 様々な方法を示す. 我々の結果をランダム行列理論における最近の知見と対比させ, 古典的な初期化スキームとその結果に新たな光を与える.

原文

Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks. Its tendency to improve accuracy and speed up training have established BN as a favorite technique in deep learning. Yet, despite its enormous success, there remains little consensus on the exact reason and mechanism behind these improvements. In this paper we take a step towards a better understanding of BN, following an empirical approach. We conduct several experiments, and show that BN primarily enables training with larger learning rates, which is the cause for faster convergence and better generalization. For networks without BN we demonstrate how large gradient updates can result in diverging loss and activations growing uncontrollably with network depth, which limits possible learning rates. BN avoids this problem by constantly correcting activations to be zero-mean and of unit standard deviation, which enables larger gradient steps, yields faster convergence and may help bypass sharp local minima. We further show various ways in which gradients and activations of deep unnormalized networks are ill-behaved. We contrast our results against recent findings in random matrix theory, shedding new light on classical initialization schemes and their consequences.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up