More than 1 year has passed since last update.

【Pytorch】損失関数の勾配を計算する

Last updated at 2022-12-04Posted at 2022-11-12

はじめに

NNの学習時で計算する、勾配を計算する方法を調べたのでメモしておきます。

\theta \leftarrow \theta + \eta \nabla_\theta L(\theta)

通常は、これらはoptimizer が良しなに更新してくれるので、自分では learning rate をいじったりはしません。なので、何次元のパラメータかなども意識しません。が、それを見てみよう、というのが今回の趣旨です。

Modelのgradient vector を計算

ネットワークモデルについては、pytorchのtuotrial に沿って行います。

モデルの定義とパラメータの参照

net.py

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

このように、モデルを定義するようです。Linear や Conv2d では次元数を設定してあげると、中に重みパラメータをもったネットワークが作られます。

モデルのパラメータを列挙するには、nn.parameters() や nn.named_parameters() を使えます。
nn.parameters の説明は下記のリンクに。Tensorのsubclass で、nn.Module に追加すると自動的に参照されるようになるようです。data と requires_grad を持っています。データ数はnumel() で取得できます。

from mynet import Net

net = Net()
for p in net.parameters():
    print(f'size={p.size()} {p.requires_grad}')

各モジュールの名前とともに取得することもできます。named_paremters() で出力される順番は、定義した順番のようでした。（自分が動かした範囲では）

for n, p in net.named_parameters():
    print(f'name={n} size={list(p.size())} ({p.numel()}) {p.requires_grad}')

出力：

name=conv1.weight size=[6, 1, 5, 5] (150) True
name=conv1.bias size=[6] (6) True
name=fc1.weight size=[120, 400] (48000) True
name=fc1.bias size=[120] (120) True
name=fc2.weight size=[84, 120] (10080) True
name=fc2.bias size=[84] (84) True
name=fc3.weight size=[10, 84] (840) True
name=fc3.bias size=[10] (10) True
name=conv2.weight size=[16, 6, 5, 5] (2400) True
name=conv2.bias size=[16] (16) True

モデルによる値の計算

入力を入れると出力を得られます。入力はMNIST を想定して 32 x 32 のサイズを推定しています。
ダミーのデータを入力すると、１０次元の値が出力されます。ここのところ、良く分かっていないのですが、
4D Tensor で nSamples x nChannels x Height x Widthを入力するようです。これはこの入力が最初に入力されるモデルConv2dの入力の仕様のようです。

x0 = torch.randn(1, 1, 32, 32)
y = net(x0)
print(y) # # torch.Size([10])

出力はこんな感じに。

tensor([[ 0.0110, 0.0724, -0.0437, -0.0916, -0.0550, 0.0797, 0.1916, -0.0253,
-0.0806, 0.1785]], grad_fn=)

損失関数

この予測値と実際の正解値との差を、定義した損失関数で計算します。
ここでは既にあるMean Saure Error を使います。

target = torch.randn(10)  # torch.Size([10])
target = target.view(1, -1)  # torch.Size([1, 10])

criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)

ここでは view という関数を用いて、Tensor のサイズを torch.Size([10])から# torch.Size([1, 10]) に変更しています。これは、loss 関数への入力としてy=net(x) の出力と同じ形にするためです。

出力:

tensor(0.9695, grad_fn=

勾配の計算

直接gradient の値を取り出してベクトルのする方法

損失と誤差逆伝搬法

ニューラルネットワークの出力の計算では、ネットワークでforward 計算を行い、隠れ層の値を計算します。隠れ層の値と、先程計算した損失関数の値から、微分値を計算できます。これが、back propagation で計算されます。

loss 関数から、逆にたどることを確認できます。もともとのモデルは

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d -> flatten -> linear -> relu -> linear -> relu -> linear -> MSELoss -> loss

となっていますが、

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

で

<MseLossBackward0 object at 0x7f7c50a149d0>
<AddmmBackward0 object at 0x7f7c50a149a0>
<AccumulateGrad object at 0x7f7c50a149a0>

となっています。この逆伝搬で勾配を計算します。勾配計算をする前に、勾配の値を保持している変数をクリアします。
Back propagation の計算は、backward()で実行されます。

print('clear gradient variables.')
net.zero_grad()     # zeroes the gradient buffers of all parameters
print(f'before backward: conv1.bias.grad={net.conv1.bias.grad}')
loss.backward()
print(f'after backward: conv1.bias.grad={net.conv1.bias.grad}')

出力:

clear gradient variables.
before backward: conv1.bias.grad=None
after backward: conv1.bias.grad=tensor([ 0.0155, 0.0101, -0.0022, -0.0421, 0.0010, -0.0052])

この後、パラメータ更新を直接計算

\vec{\theta} \leftarrow \vec{\theta} + \eta \cdot \nabla_{\theta} L(\theta)

を実行したかったら、下記でできます。

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

勾配の保持

全ての要素の勾配を結合させます。

grad_vec = []
for n, f in net.named_parameters():
    grad_vec.append(p.grad.data.view(-1))
grad_vec = torch.cat(grad_vec)
print(grad_vec)
print(grad_vec.shape)

出力:

tensor([ 0.0004, -0.0112, 0.0313, ..., 0.0365, -0.1341, -0.2248])
torch.Size([61706])

パラメータ数は多いんですね。

まとめ

とりあえず、ニューラルネットワークで訓練データに対する勾配の値を直接取り出すことができるようになった。これをどう使おうかな。
(2022/11/12)

追記

tensorについて data.view(-1) として多次元配列をなくすところを修正 (2022/12/04)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up