Help us understand the problem. What is going on with this article?

【今日のアブストラクト】Understanding Batch Normalization【論文 DeepL 翻訳】

この記事は自分用のメモみたいなものです.
ほぼ DeepL 翻訳でお送りします.
間違いがあれば指摘していだだけると嬉しいです.

翻訳元
Understanding Batch Normalization

Abstract

訳文

バッチ正規化 (BN) は, ディープニューラルネットワークの中間層の活性化を正規化する手法である. 精度の向上や学習の高速化が可能であることから, ディープラーニングでは人気の手法となっている. しかし, 大きな成功を収めているにもかかわらず, その背景にある正確な理由やメカニズムについては, まだほとんどコンセンサスが得られていないのが現状である. 本論文では, 経験的なアプローチにより, BN の理解を深めるための一歩を踏み出す. 我々はいくつかの実験を行い, BN が主により大きな学習率での学習を可能にし, それがより速い収束とより良い一般化の原因であることを示した. BN を用いないネットワークでは, 大きな勾配更新を行うと, ネットワークの深さに応じて発散する損失と活性化が制御不能に増大し, 可能な学習率が制限されることを実証した. BN は常にゼロ平均で単位標準偏差の活性化を修正することでこの問題を回避し, より大きな勾配ステップを可能にし, より速い収束をもたらし, 鋭い局所最小値を回避するのに役立つかもしれない. さらに, 正規化されていない深層ネットワークの勾配や活性化がどのようにして不良な挙動を示すのか, 様々な方法を示す. 我々の結果をランダム行列理論における最近の知見と対比させ, 古典的な初期化スキームとその結果に新たな光を与える.

原文

Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks. Its tendency to improve accuracy and speed up training have established BN as a favorite technique in deep learning. Yet, despite its enormous success, there remains little consensus on the exact reason and mechanism behind these improvements. In this paper we take a step towards a better understanding of BN, following an empirical approach. We conduct several experiments, and show that BN primarily enables training with larger learning rates, which is the cause for faster convergence and better generalization. For networks without BN we demonstrate how large gradient updates can result in diverging loss and activations growing uncontrollably with network depth, which limits possible learning rates. BN avoids this problem by constantly correcting activations to be zero-mean and of unit standard deviation, which enables larger gradient steps, yields faster convergence and may help bypass sharp local minima. We further show various ways in which gradients and activations of deep unnormalized networks are ill-behaved. We contrast our results against recent findings in random matrix theory, shedding new light on classical initialization schemes and their consequences.

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
No comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
ユーザーは見つかりませんでした