別れて初めてChainerのLinearが好きだったことに気づいた。

Last updated at 2020-05-12Posted at 2019-12-20

2020/05/12更新
https://github.com/pfnet/pytorch-pfn-extras
こちらをimportすることで実現可能になりました。

===

Chainerのメジャーアップデート終了が告知されて、来るべき時が来たのかなあと思いつつ、今後のために早いところPyTorchに触れておこうと決意し、今までChainerで自作していたモデルをPyTorchに実装し直すことにした。

そして早速、とあるエラーにぶち当たる・・・

該当箇所は以下のコード。

self.fc1 = nn.Linear(None, 120)

そう、Noneを設定できないのである！！！！！
そう、明示的にinputサイズを指定する必要があるのである！！！

だ・・・・だるい・・・。
畳み込み層の後とか、自分で計算するか一回デバッグするかしなきゃじゃん・・・。
つかハードコーディングしたら画像サイズ可変にできないじゃん。

これはなにか思い違いをしているんじゃないのかと、ソースを見てみた。
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear

以下はソースの一部を抜粋。

    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

    def forward(self, input):
        return F.linear(input, self.weight, self.bias)

見ての通り、self.weightは初期化時点で決定している。
順伝播の際はself.weightは初期化済みの前提であったのだ・・・。くぅ。

ちなみにChainerはというと下記
https://github.com/chainer/chainer/blob/master/chainer/links/connection/linear.py

以下はソースの一部を抜粋。


    def forward(
            self,
            x: variable.Variable,
            n_batch_axes: int = 1
    ) -> variable.Variable:
        """Applies the linear layer.
        Args:
            x (~chainer.Variable): Batch of input vectors.
            n_batch_axes (int): The number of batch axes. The default is 1. The
                input variable is reshaped into
                (:math:`{\\rm n\\_batch\\_axes} + 1`)-dimensional tensor.
                This should be greater than 0.
        Returns:
            ~chainer.Variable: Output of the linear layer.
        """
        if self.W.array is None:
            in_size = utils.size_of_shape(x.shape[n_batch_axes:])
            self._initialize_params(in_size)
        return linear.linear(x, self.W, self.b, n_batch_axes=n_batch_axes)

順伝播の際にself.W.arrayが初期化されていなければ、改めて初期化するという処理が入っているのだ。何という親切心、世界よ、これがDefine By Runだ！（ていうか、コメントもすごく丁寧だ。）

こうして私は、Chainerと別れて初めて、ChainerのLinearが好きだったことに気づいたのであった。。。

ちなみに、issueにこの不便さを指摘している人がいた。
https://github.com/pytorch/pytorch/issues/30880
MXNetにリスペクトしているようだが、伝えたいことは一緒だろう。
静かに見守ろうと思う。

おまけ

苦肉の策として、nn.Linearを順伝播で初期化するようにしてみた。
以下、pytorchのチュートリアルに記載していたソースを一部改修したものである。
具体的にはself.fc1を順伝播で初期化している。


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = None
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        if self.fc1 is None:
            self.fc1 = nn.Linear(self.num_flat_features(x), 120)

        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

うーん、なんか気持ちわりぃ。

おまけ２

ちなみに、pytorchは畳み込みから順伝播をつなぐ際に、
Adaptive Average Poolingを多様しているに見えた。

下記具体例
https://github.com/pytorch/vision/blob/d2c763e14efe57e4bf3ebf916ec243ce8ce3315c/torchvision/models/vgg.py#L29

これは、outputサイズを指定すれば、いい感じにストライドとかフィルタサイズを調整するやつ、だったはず。ただし、このレイヤが、inputサイズをあまり意識しないようにしてくれているためなのか、精度よくするためなのかはわからない。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up