More than 5 years have passed since last update.

【今日のアブストラクト】Neural Tangent Kernel: Convergence and Generalization in Neural Networks【論文 DeepL 翻訳】

Posted at 2020-03-21

1 日 1 回 (努力目標) 論文の Abstract を DeepL 翻訳の力を借りて読んでいきます.

この記事は自分用のメモみたいなものです.
ほぼ DeepL 翻訳でお送りします.
間違いがあれば指摘していだだけると嬉しいです.

翻訳元
Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Abstract

訳文

初期化時には, 人工ニューラルネットワーク (ANNs) は無限幅限界のガウス過程と等価であるため, カーネル法に接続されます. 学習中の ANN の進化はカーネルによっても記述できることを証明する: ANN のパラメータの勾配降下の間、ネットワーク関数 $f_θ$ (入力ベクトルを出力ベクトルに写像する) は, 新しいカーネル: the Neural Tangent Kernel (NTK) を参照して, 関数コスト (パラメータコストとは対照的に凸である) のカーネル勾配に従う. このカーネルは, ANNs の一般化機能を記述するための中核である. NTK は初期化時にはランダムであり, 訓練中は変化するが, 無限幅限界では明示的な限界カーネルに収束し, 訓練中は一定である. これにより, パラメータ空間ではなく関数空間での ANNs の学習を検討することが可能となる. トレーニングの収束は, 制限 NTK の正定値性に関連する可能性がある. データが球体でサポートされ, 非線形性が非多項式である場合, 制限 NTK の正定値性を証明する.
次に, 最小二乗回帰の設定に焦点を当て, 無限幅限界において, 学習中のネットワーク関数 $f_θ$ が線形微分方程式に従うことを示す. 収束は入力データの最大カーネル主成分に沿って NTK に対して最速であり, アーリーストッピングの理論的動機を示唆している.
最後に, 数値的に,NTK を調べ, 広いネットワークでの振る舞いを観察し, 無限幅限界との比較を行う.

原文

At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function $f_θ$ (which maps input vectors to output vectors) follows the kernel gradient of the functional cost (which is convex, in contrast to the parameter cost) w.r.t. a new kernel: the Neural Tangent Kernel (NTK). This kernel is central to describe the generalization features of ANNs. While the NTK is random at initialization and varies during training, in the infinite-width limit it converges to an explicit limiting kernel and it stays constant during training. This makes it possible to study the training of ANNs in function space instead of parameter space. Convergence of the training can then be related to the positive-definiteness of the limiting NTK. We prove the positive-definiteness of the limiting NTK when the
data is supported on the sphere and the non-linearity is non-polynomial.
We then focus on the setting of least-squares regression and show that in the infinite-width limit, the network function $f_θ$ follows a linear differential equation during training. The convergence is fastest along the largest kernel principal components of the input data with respect to the NTK, hence suggesting a theoretical motivation for early stopping.
Finally we study the NTK numerically, observe its behavior for wide networks, and compare it to the infinite-width limit.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up