Abstract

訳文

スパースなニューラルネットワークは密なネットワークに比べてパラメータや計算効率が高く, 場合によっては実時間の推論時間を短縮するために使用できることが示されている. 密なネットワークを訓練して, 推論のためのスパースなネットワークを得ることについては, 多くの研究が行われている. これは, 最大の訓練可能なスパースなモデルのサイズを, 最大の訓練可能な密なモデルのサイズに制限している. 本論文では, 既存の密対スパースの学習法と比較して精度を犠牲にすることなく, 学習中に固定のパラメータ数と固定の計算コストでスパースなニューラルネットワークを学習する方法を紹介する. 我々の手法は, パラメータの大きさと頻度の低い勾配計算を用いて, 訓練中にネットワークのトポロジーを更新する. この手法は, 従来の手法と比較して, 所定のレベルの精度を達成するために必要な浮動小数点演算 (FLOP) が少ないことを示している. 重要なことは, トポロジーを調整することで, "ラッキー" なものだけでなく, どのような初期化からでも始めることができるということである. 我々は, ImageNet-2012データセット上のResNet-50, MobileNet v1 および MobileNet v2, CIFAR-10 データセット上の WideResNets, および WikiText-103 データセット上の RNN を用いて, 最先端のスパーストレーニングの結果を実証している. 最後に, 最適化中にトポロジーを変化させることで, トポロジーが静的なままの場合に発生する局所的な最小値を克服できる理由について, いくつかの洞察を提供する.

原文

Sparse neural networks have been shown to be more parameter and compute efficient compared to dense networks and in some cases are used to decrease wall clock inference times. There is a large body of work on training dense networks to yield sparse networks for inference. This limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. Our method updates the topology of the network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. Importantly, by adjusting the topology it can start from any initialization - not just "lucky" ones. We demonstrate state-of-the-art sparse training results with ResNet-50, MobileNet v1 and MobileNet v2 on the ImageNet-2012 dataset, WideResNets on the CIFAR-10 dataset and RNNs on the WikiText-103 dataset. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static.

Rigging the Lottery: Making All Tickets Winners【Abstract】【論文 DeepL 翻訳】

Abstract

訳文

原文