More than 5 years have passed since last update.

夏休みは終わった，Chainer v 1.13.0の新機能を確認しよう

Posted at 2016-08-17

夏休み最終日．
レジに着いたら0.3秒で買える本を入手し，りんかい線に乗った私はふとChainerJPのツイートを読んだ．
どうやらChainerがv 1.13.0になったらしい．
前回の更新から2週間で以下の機能が追加されたことになる．
ガッズィーラ並みの更新頻度である．

LinearとConvolution2Dで input channelの初期化が自動化された

in_size = Noneとすることで，最初に__call__が呼ばれた際にパラメータの初期化が行われるようになった．
試行錯誤する際，非常に便利である．
https://github.com/pfnet/chainer/blob/master/chainer/links/connection/linear.py

class MnistModel(chainer.Chain):
    def __init__(self):
        super(MnistModel,self).__init__(
                l1 = L.Linear(out_size=100,in_size=None),
                l2 = L.Linear(out_size=100,in_size=None),
                l3 = L.Linear(out_size=10,in_size=None)
        )

    def __call__(self,x):
        h = F.relu(self.l1(x))
        h = F.relu(self.l2(h))
        return self.l3(h)

Linearの__init__が

__init__(self, in_size, out_size, wscale=1, (略))

から

__init__(self, out_size, in_size = None,wscale=1, (略))

になるともっと嬉しいのだが，Linear,Convolution2D以外のlinksとの整合性を保つためにこのようになっているものと思われる．
Chainerは人間に優しい．

linksにCRF1d追加

いわゆる条件付き確率を学習するためのLayerで，系列ラベリングなどに用いられている．
( https://github.com/pfnet/chainer/issues/1020 )

論文( http://arxiv.org/pdf/1508.01991v1.pdf )では Part-Of-Speech Tagging, Chunking, Entity Recognitionにおいて精度向上が見られたと書かれている．

実装としてはパラメータ付きのloss関数のようになっているようである．

バッチサイズN，長さL，クラス数Kの系列ラベリング問題に対して
y = [ Variable(N,K) * L個 ] , t = [Variable(N) * L個 ]という形で入力すると動作する（List of Variable）

おまけ の部分にCRF1dを用いた熊本弁翻訳プログラムを掲載しておく．
何かの参考になれば幸いである．
（熊本弁翻訳解説 http://qiita.com/_329_/items/1d5de7b8213b112a3df7 ）

functionsにseparate追加

chainer.functions.stackの逆の動作を行う.
http://docs.chainer.org/en/latest/_modules/chainer/functions/array/separate.html

Stateful Peephole LSTM　追加

LSTMのVariantであるPeephole LSTMが追加された．
PeepholeLSTM自体の解説はわかるLSTM ～最近の動向と共に
( http://qiita.com/t_Signull/items/21b82be280b46f467d1b )で．DRYDRY．

Cupyに関する追加

fill_diagonalという，numpy.fill_diagonal相当の関数が追加された．
また，cupy.core.core.scanに関するパフォーマンスの向上が行われた．
学びがある．( https://github.com/pfnet/chainer/pull/1367/files )
https://github.com/pfnet/chainer/pull/1367 を見ると，10倍近く高速化されたのだろうか．

バグフィックス

以下の項目についてバグフィックスが行われた．

cupy.ndarray.view (dtype)
get_cifar10 , get_cifar100
ImageDataset
LogReport (on Windows)
ParallelUpdater (for dict mini-batch)
SerialIterator(dataset sizeがbatchsizeで割り切れなかった場合, shuffleされない)

get_cifar10 , get_cifar100のバグフィックスはこれからChainerを触る人にとって重要なBug Fixであると思う．
chainer.datasets.get_cifar10と唱えるとTupleDatasetで包まれたデータを手に入れることができる．
その他の変更については，普段からChainerを触る人にとっても重要な修正であると思われる．
みんなUpdateしよう．

各種改善

Chain.zerogradsに関する改善と，astypeを使っているいくつかの箇所でのコピーを削減した．
https://github.com/pfnet/chainer/pull/1444/files を読むことで，自前のコードのパフォーマンス向上に繋がるかもしれない．
学びがある．

以上で，Chainer v 1.13.0におけるすべての新機能・変更点についての解説が終了したはずである．

暑い中，Chainerの開発に関わって下さったすべての方に感謝の気持ちを伝えたい．
お疲れ__サマー__でした．

おまけ

CRF1dを用いた熊本弁翻訳プログラム

# !/usr/bin/env python
# -*- coding:utf-8 -*-

import chainer

import chainer.functions as F
import chainer.links as L
from chainer import optimizers
import numpy as np

def make_vocab_dict(vocab):
    id2word = {}
    word2id = {}
    for id, word in enumerate(vocab):
        id2word[id] = word
        word2id[word] = id
    return id2word, word2id


class Seq2Seq(chainer.Chain):
    dropout_ratio = 0.5

    def __init__(self, input_vocab, output_vocab, feature_num, hidden_num):
        """
        :param input_vocab: array of input  vocab
        :param output_vocab: array of output  vocab
        :param feature_num: size of feature layer
        :param hidden_num: size of hidden layer
        :return:
        """
        self.id2word_input, self.word2id_input = make_vocab_dict(input_vocab)
        self.id2word_output, self.word2id_output = make_vocab_dict(output_vocab)
        self.input_vocab_size = len(self.word2id_input)
        self.output_vocab_size = len(self.word2id_output)

        # CRF Layer
        self.crf = L.CRF1d(n_label=4)

        super(Seq2Seq, self).__init__(
                # encoder
                word_vec=L.EmbedID(self.input_vocab_size, feature_num),
                input_vec=L.LSTM(feature_num, hidden_num),

                # connect layer
                context_lstm=L.LSTM(hidden_num, self.output_vocab_size),

                # decoder
                output_lstm=L.LSTM(self.output_vocab_size, self.output_vocab_size),
                out_word=L.Linear(self.output_vocab_size, self.output_vocab_size),
        )

    def encode(self, src_text):
        """

        :param src_text: input text embed id ex.) [ 1, 0 ,14 ,5 ]
        :return: context vector
        """
        for word in src_text:
            word = chainer.Variable(np.array([[word]], dtype=np.int32))
            embed_vector = F.tanh(self.word_vec(word))
            input_feature = self.input_vec(embed_vector)
            context = self.context_lstm(input_feature)

        return context

    def decode(self, context):
        """
        :param context: context vector made `encode` function
        :return: decoded embed vector ( Batchsize , ClassNum)
        """

        return self.out_word(context)

    def initialize(self):
        """
        LSTM state initialize

        :param image_feature:
        :param train:
        :return:
        """
        self.input_vec.reset_state()
        self.context_lstm.reset_state()
        self.output_lstm.reset_state()

    def generate(self, start_word_id, sentence_limit):
        """
        文章生成
        :param start_word_id:
        :param sentence_limit:
        :return:
        """

        context = self.encode([start_word_id])
        sentence = ""

        for _ in range(sentence_limit):
            context = self.decode(context)
            word = self.id2word_output[np.argmax(context.data)]
            if word == "<eos>":
                break
            sentence = sentence + word + " "
        return sentence

    def __call__(self, input_seq, teacher_seq):

        """

        :param input_seq: list of
        :param teacher_seq: list of valiable
        :return:
        """

        context = model.encode(input_seq)

        # make list of Valiable
        predict_seq = []
        for _ in range(4):
            context = self.decode(context)
            predict_seq.append(context)

        loss = self.crf(predict_seq, teacher_seq)

        return loss

if __name__ == "__main__":

    input_vocab = ["<start>", u"黄昏に", u"天使の声", u"響く時，", u"聖なる泉の前にて", u"待つ", "<eos>"]
    output_vocab = [u"5時に", u"噴水の前で", u"待ってます", "<eos>"]

    model = Seq2Seq(input_vocab, output_vocab, feature_num=4, hidden_num=10)

    # reverse jp
    input = [model.word2id_input[word] for word in reversed(input_vocab)]

    # list of Variable
    teacher = []
    for i in range(len(output_vocab)):
        t = chainer.Variable(np.array([i], dtype=np.int32))
        teacher.append(t)


    optimizer = optimizers.RMSprop()
    optimizer.setup(model)

    for i in range(10):
        # init
        model.initialize()
        loss = 0

        # calc loss
        loss = model(input, teacher)

        # update
        model.zerograds()
        loss.backward()
        loss.unchain_backward()
        optimizer.update()

        # 進捗確認
        start = model.word2id_input["<start>"]
        sentence = model.generate(start, 7)

        print "teacher : ", "".join(input_vocab[1:6])
        print i , ") -> ", sentence
        print

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up