More than 5 years have passed since last update.

ニューラルネットワークでマスキングをしても勾配のことは気にしなくて良い

Last updated at 2017-08-23Posted at 2017-08-23

疑問

ニューラルネットワークで可変長の行列を扱うときに図のようにデータをマスキングすることがある。

$W$や$x$の勾配がマスクされた場所がまるでなかったように振る舞うことを期待する。
このとき、各$x_{i, j}$に対する$W$の勾配は、マスクがある場所では0、ない場所では適当な数字になるはずなので、平均をとると$W$の勾配がマスク分"薄められて"しまうのではと疑問に思った。

結論

影響をうけない。なぜならば、勾配の計算は平均ではなく加算だから。

確認

確認のためにやったのではなく、やってみている途中で気づいた。
下記はchainerを使った確認。

import chainer
import chainer.functions as F
import numpy as np

>> x_orig = np.random.random((3, 2)).astype(np.float32)
>> W = chainer.Variable(np.random.random((2, 1)).astype(np.float32))
>> x = chainer.Variable(x_orig)
>> print x
variable([[ 0.45719743  0.99944282]
          [ 0.6675607   0.33180818]
          [ 0.37650901  0.48325235]])

>> y = F.matmul(x, W)
>> print y
variable([[ 0.26743633]
          [ 0.2829439 ]
          [ 0.18782631]])

>> loss = F.squared_error(y, np.zeros(y.shape, dtype=np.float32))
>> mask = np.array([[True], [True], [False]])
>> loss_masked = F.where(mask, loss, np.zeros(loss.shape, dtype=loss.dtype))
>> print loss_masked
variable([[ 0.07152219]
          [ 0.08005726]
          [ 0.        ]])

>> loss_tot = F.sum(loss_masked)
>> loss_tot.grad = np.array(1.0, dtype=np.float32)
>> loss_tot.backward(retain_grad=True)
>> print loss.grad
[[ 1.]
 [ 1.]
 [ 0.]]

>> print W.grad
[[ 0.62230688]
 [ 0.72234082]]

次に、マスキングを行うかわりに、本当にその行をなくしてしまう。

>> W.cleargrad()
>> x = chainer.Variable(x_orig[:2, :])
>> print x
variable([[ 0.45719743  0.99944282]
          [ 0.6675607   0.33180818]])

>> y = F.matmul(x, W)
>> loss = F.squared_error(y, np.zeros(y.shape, dtype=np.float32))
>> print loss
variable([[ 0.07152219]
          [ 0.08005726]])

>> loss_tot = F.sum(loss)
>> loss_tot.grad = np.array(1.0, dtype=np.float32)
>> loss_tot.backward(retain_grad=True)
>> print loss.grad
[[ 1.]
 [ 1.]]

>> print W.grad
[[ 0.62230688]
 [ 0.72234082]]

このとおり、勾配はかわっていない。

考察

考えてみたらあまりにも当たり前だった。ニューラルネットワークまわりで平均を使う処理はほとんどない。
一応、batch normalizationだとかを使うときは少し気をつけたほうがよいかもしれない。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up