More than 5 years have passed since last update.

機械学習ツールを掘り下げる by 日経 xTECH ビジネスAI③Advent Calendar 2019

Chainer PyTorch同盟 VS TensorFlow(Keras)

Last updated at 2019-12-17Posted at 2019-12-17

はじめに

先日 Chainer の開発終了、PyTorch へ移行が発表されました。
Keras Document によると、2018 末の時点でシェアは TensorFlow, (及び Keras), 次点で PyTorch, Caffe ...と続いています。
群雄割拠の時代も落ち着きを迎えつつあり、合併再編が進む DeepLearning 界では
2 大巨頭 PyTorch と TensorFlow(Keras) の頂上決戦 が始まろうとしているのかもしれません。

さて、Chainer が PyTorch を選んだ理由として 思想が近い ことが上げられていました。
悲しくもお世話になった Chainer に感謝をこめて、Chainer ともう一つの雄 TensorFlow(Keras) を MNIST を通して比べてみます。
どっちがいい悪いといった野暮な話はしません。
各人が心に秘めた最高のフレームワークを持てればそれでよいのです。

Chainer

Chainer は優れた抽象化、直感的表記、そのわかりやすさから実装のハードルがとても低く、
私は 初学者がディープラーニングの実装の世界に足を踏み込むためにとても適したフレームワーク だと思っています。

PyTorch もまた、その設計思想に影響を受けているそうです。
Chainer の思想から PyTorch が生まれ、2019 末に一つになる。なんかちょっと素敵ですよね。

TensorFlow

TensorFlow は元は Google の社内ツールとして生まれたそうです。
計算グラフを定義し、その中でテンソルを流れるように計算する、名の通りのツールです。
Keras を通さず、TensorFlow のコードで組むと、ノードを定義し組み立て最後に Session.run() で計算していく流れに、その思想が読み取れます。
計算グラフを用いた自由な計算の実現による汎用性の高さ が TensorFlow の何よりの特徴なのだと思います。
Keras は TensorFlow を抽象化し、扱いやすくした Wrapper です。
tf.keras として TensowFlow のフロントとして取り込まれてもいます。
拡張機能やライブラリも充実度合いもその勢いを表しています。

比較

それでは、各フレームワークを MNIST を通して見ていきます。

データのダウンロード

ダウンロードの方法はほぼ同じです。

Chainer

import して chainer.datasets にある get_mnist() を叩くだけです。。

import chainer
train, test = chainer.datasets.get_mnist()

するとこんな感じでデータが降りてきます。

Downloading from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz...

Keras

使い勝手は Chainer とほぼ同じです。

tf.keras.datasets.mnist にある load_data() を叩くだけですね。

import tensorflow as tf
train, test = tf.keras.datasets.mnist.load_data()

同じように mnist が降りてきます。

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step

ダウンロードしたデータ

同じ MNIST のデータダウンロードでも、降りてくる形式がちょっと違ったりします。
主に配列の並べ方の違いですね。細かいですが。

Chainer

まずデータの型チェックをしてみます。

>> type(train)
chainer.datasets.tuple_dataset.TupleDataset

chainer.datasets.tuple_dataset.TupleDataset らしいです。これは何かさらに掘り下げてみましょう。

>> len(train)
60000
>> type(train[0])
tuple
>> type(train[0][0])
numpy.ndarray
>> type(train[0][1])
numpy.int32

画像とラベルをセットにしたものを tuple として、60,000 個並べたタプルとなっていることがわかります。
図にすると、以下のような感じですね。

肝心要の画像データは以下のような形式です。
28×28=784 のピクセルを一列に並べた形をしています。

>> train[0][0].shape
(784,)

画像データの中身はこんな感じ。注目すべきは値が 0 ～ 1 に収まっているところです。

>> train[0][0]
array([0.        , 0.        , 0.        , 0.        , 0.        ,
    0.        , 0.        , 0.        , 0.        , 0.        ,
    ...
    0.        , 0.11764707, 0.14117648, 0.36862746, 0.6039216 ,
    0.6666667 , 0.9921569 , 0.9921569 , 0.9921569 , 0.9921569
    ...
    0.        , 0.        , 0.        , 0.        , 0.        ,
    0.        , 0.        , 0.        , 0.        ], dtype=float32)

Keras

同様に型のチェックを行います。

>> type(train)
tuple
>> len(train)
2

どうやら素直なタプルのようですね。
長さを見るに画像データの配列とラベルの組だろうと思われます。
確かめてみましょう。

>> len(train[0])
60000
>> type(train[0])
numpy.ndarray
>> train[0].shape
(60000, 28, 28)
>> train[1].shape
(60000,)
>> type(train[1][0])
numpy.uint8

図に表すと以下のような感じです。

画像データの中身はこんな感じです。

>> train[0][0]
array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
        0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
        ...
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   3,
        18,  18,  18, 126, 136, 175,  26, 166, 255, 247, 127,   0,   0,
        0,   0],
        ...
        0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
        0,   0]], dtype=uint8)

Keras の場合、値が 0 ～ 1 の間に収まっていないので、255.0 で割って丸める必要があります。

train = (train[0] / 255.0, train[1])
test = (test[0] / 255.0, test[1])

ネットワークの定義

Chainer

クラスで定義します。
L.Linearを用いて全結合を表現し、 self.l1 で保持しておきます。
F.relu(self.l1(x)) で活性化関数 relu を表現します。
結合と活性化関数を分けて書けるのが特色です。

import chainer.links as L
import chainer.functions as F

class Net(chainer.Chain):

    def __init__(self, n_in=784, n_hidden=100, n_out=10):
        super().__init__()
        with self.init_scope():
            self.l1 = L.Linear(n_in, n_hidden)
            self.l2 = L.Linear(n_hidden, n_hidden)
            self.l3 = L.Linear(n_hidden, n_out)

    def forward(self, x):
        h = F.relu(self.l1(x))
        h = F.relu(self.l2(h))
        h = self.l3(h)

        return h

これをインスタンス化して、L.Classifier を用いて model 化します。
じつは何も指定しなければ、この中で損失関数として、cross_entropy が使われるようになっています。

model = L.Classifier(Net())

Keras

Keras はとにかく短く書けます。
結合の仕方と活性化関数をセットで 1 行にし、一つ一つの層を意識して書けるのが特色です。

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(100, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

最適化関数

ここではモーメンタム SGD を使ってみます。

Chainer

optimisers の中に色々な最適化関数が用意されています。
ハイパーパラメータを引数で指定して生成します。
生成した optimizer は先ほど作った model に setup() で紐づけます。

from chainer import optimizers

optimizer = optimizers.MomentumSGD(lr=0.01, momentum=0.9)
optimizer.setup(model)

Keras

ほぼ Chainer と同じです。
model と紐づけるのはあとで compile する時に行います。

optimizer=tf.keras.optimizers.SGD(lr=0.01, momentum=0.9)

学習

各フレームワークで色々ツールが用意されています。

Chainer

Chainer は学習に便利な SerialIterator, Trainer を使うと直感的でわかりやすいのかもしれません。
まずは SerialIterator の作成を行います。
先ほどの学習データを詰め込みます。

from chainer.iterators import SerialIterator

batchsize = 10

train_iterator = SerialIterator(train, batchsize, True, True)
test_iterator  = SerialIterator(valid, batchsize, False, False)

これをネットワークの重み更新を決める Updater に詰めます。

from chainer import training
from chainer.training import extensions
updater = training.StandardUpdater(train_iterator, optimizer)

ここで Trainer の登場。
作った updater を詰めます。
result のディレクトリに結果が保存されます。

max_epoch =25
trainer = training.Trainer(updater, (max_epoch, 'epoch'), out='result')

検証用データセットを渡します。

trainer.extend(extensions.Evaluator(test_iter, model))

loss を記録する設定です。

trainer.extend(extensions.LogReport())

loss, accuracy を出力します。

trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'main/accuracy', 'validation/main/loss', 'validation/main/accuracy', 'elapsed_time']))

以下も追加しておくと。学習過程グラフが png として保存されます。

trainer.extend(extensions.PlotReport(['main/loss', 'validation/main/loss'], x_key='epoch', file_name='loss.png', marker=""))
trainer.extend(extensions.PlotReport(['main/accuracy', 'validation/main/accuracy'], x_key='epoch', file_name='accuracy.png', marker=""))

学習開始！

>>trainer.run()
epoch       main/loss   main/accuracy  validation/main/loss  validation/main/accuracy  elapsed_time
1           0.110192    0.965714       0.118806              0.966056                  26.6098
2           0.085265    0.973643       0.16923               0.953444                  54.2031
3           0.0698936   0.977643       0.107316              0.970056                  81.3108
4           0.0554769   0.982143       0.113628              0.968389                  108.159
5           0.0494781   0.983738       0.106652              0.971389                  134.941
6           0.0414912   0.986595       0.119612              0.969                     162.266
7           0.0348387   0.98881        0.10666               0.973944                  188.789
...
24          0.0149409   0.995738       0.172438              0.971611                  648.015
25          0.01749     0.994762       0.193056              0.970722                  675.749

jupyter 上でグラフの表示。

from IPython.display import Image
Image('result/loss.png')

Image('result/accuracy.png')

Keras

先ほど作った optimizer を詰め込みます。
損失関数 cross_entropy はここで指定します。

from keras import metrics
model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=[metrics.categorical_accuracy])

学習開始！

>> model.fit(train[0], train[1], batch_size=10 , epochs=25)
Train on 60000 samples
Epoch 1/25
60000/60000 [==============================] - 8s 134us/sample - loss: 0.2221 - acc: 0.9329
Epoch 2/25
60000/60000 [==============================] - 8s 136us/sample - loss: 0.1072 - acc: 0.9669
Epoch 3/25
60000/60000 [==============================] - 8s 129us/sample - loss: 0.0766 - acc: 0.9764
Epoch 4/25
60000/60000 [==============================] - 8s 128us/sample - loss: 0.0609 - acc: 0.9808
Epoch 5/25
60000/60000 [==============================] - 8s 128us/sample - loss: 0.0499 - acc: 0.9845
Epoch 6/25
60000/60000 [==============================] - 8s 133us/sample - loss: 0.0406 - acc: 0.9865
Epoch 7/25
...
Epoch 25/25
60000/60000 [==============================] - 7s 125us/sample - loss: 5.4506e-04 - acc: 1.0000

格納された結果を見てみましょう。

>> result.history.keys()
dict_keys(['loss', 'categorical_accuracy'])

グラフにしてみます。

import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(range(1, 25+1), result.history['loss'], label="loss")
plt.plot(range(1, 25+1), result.history['categorical_accuracy'], label="accuracy")
plt.show()

終わりに

TensorFlow も Version 2.0 が登場し Keras の吸収、DataSets の登場などかなり使いやすく進化しています。
フレームワークはみんな違ってみんないいです。
みなさまが最高のフレームワークを見つけられることを願っています。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up