※以下、個人的な勉強のためのレポートです。
※間違い多々あると存じますが、現在の理解レベルのスナップショットのようなものです。
※勉強のためWebサイトや書籍からとても参考になったものを引用させていただいております。
http://ai999.careers/rabbit/
KerasのRNN関連のドキュメント
https://keras.io/ja/layers/recurrent/#simplernn
RNN/GRU/LSTMなど
試作RNN(二進数の足し算)
a_int[0]とb_int[0]を足したものをd_int[0]とする
⇒これを二進数で行う
a_bin[0],b_bin[0],d_bin[0]
(array([0, 0, 1, 1, 0, 0, 0, 0], dtype=uint8),
array([0, 0, 1, 0, 1, 1, 0, 0], dtype=uint8),
array([0, 0, 0, 0, 0, 0, 1, 0], dtype=uint8))
桁の繰り上がりは、過去の情報を保持した時系列の構造を持っているのでRNNと題材としている。
8ケタなので、8つの要素をとらえる。
モデル
Layer (type) Output Shape Param #
=================================================================
simple_rnn_1 (SimpleRNN) (None, 8, 16) 304
_________________________________________________________________
dense_1 (Dense) (None, 8, 1) 17
=================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________
実行結果
Epoch 5/5
10000/10000 [==============================] - 13s 1ms/step - loss: 2.7492e-04 - acc: 1.0000
Test loss: 0.0002324703154786204
Test accuracy: 1.0
諸要素の変更
出力ノード数を128へ変更
model.add(SimpleRNN(units=128,
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn_3 (SimpleRNN) (None, 8, 128) 16768
_________________________________________________________________
dense_3 (Dense) (None, 8, 1) 129
=================================================================
Total params: 16,897
Trainable params: 16,897
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
10000/10000 [==============================] - 15s 2ms/step - loss: 0.0725 - acc: 0.9250
Epoch 2/5
10000/10000 [==============================] - 15s 1ms/step - loss: 0.0020 - acc: 1.0000
Epoch 3/5
10000/10000 [==============================] - 15s 2ms/step - loss: 7.3124e-04 - acc: 1.0000
Epoch 4/5
10000/10000 [==============================] - 14s 1ms/step - loss: 4.2671e-04 - acc: 1.0000
Epoch 5/5
10000/10000 [==============================] - 14s 1ms/step - loss: 2.9521e-04 - acc: 1.0000
Test loss: 0.000255679790335713
Test accuracy: 1.0
⇒計算時間はかかるが、精度100%を達成
RNNの活性化関数をsigmoidに変更
model.add(SimpleRNN(units=16,
return_sequences=True,
input_shape=[8, 2],
go_backwards=False,
activation='sigmoid',
_________________________________________________________________
Epoch 1/5
10000/10000 [==============================] - 14s 1ms/step - loss: 0.2495 - acc: 0.5205
Epoch 2/5
10000/10000 [==============================] - 13s 1ms/step - loss: 0.2470 - acc: 0.5541
Epoch 3/5
10000/10000 [==============================] - 13s 1ms/step - loss: 0.2417 - acc: 0.6195
Epoch 4/5
10000/10000 [==============================] - 14s 1ms/step - loss: 0.2046 - acc: 0.7451
Epoch 5/5
10000/10000 [==============================] - 13s 1ms/step - loss: 0.0639 - acc: 0.9714
Test loss: 0.015347118608386043
Test accuracy: 1.0
⇒今回の解きたい問題に関しては、sigmoidではreluより精度が出ない。
RNNの活性化関数をtanhに変更
model.add(SimpleRNN(units=16,
return_sequences=True,
input_shape=[8, 2],
go_backwards=False,
activation='tanh',
Layer (type) Output Shape Param #
=================================================================
simple_rnn_5 (SimpleRNN) (None, 8, 16) 304
_________________________________________________________________
dense_5 (Dense) (None, 8, 1) 17
=================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
10000/10000 [==============================] - 14s 1ms/step - loss: 0.0813 - acc: 0.8843
Epoch 2/5
10000/10000 [==============================] - 13s 1ms/step - loss: 0.0010 - acc: 1.0000
Epoch 3/5
10000/10000 [==============================] - 14s 1ms/step - loss: 4.4771e-04 - acc: 1.0000
Epoch 4/5
10000/10000 [==============================] - 14s 1ms/step - loss: 2.8390e-04 - acc: 1.0000
Epoch 5/5
10000/10000 [==============================] - 14s 1ms/step - loss: 2.0629e-04 - acc: 1.0000
Test loss: 0.00018070489439529262
Test accuracy: 1.0
⇒今回の解きたい問題に関しては、sigmoidよりはtanhの方が精度が出る。
最適化法をAdamに変更
model.compile(loss='mean_squared_error', optimizer='Adam', metrics=['accuracy'])
# model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn_6 (SimpleRNN) (None, 8, 16) 304
_________________________________________________________________
dense_6 (Dense) (None, 8, 1) 17
=================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
10000/10000 [==============================] - 15s 2ms/step - loss: 0.1627 - acc: 0.7796
Epoch 2/5
10000/10000 [==============================] - 14s 1ms/step - loss: 0.0050 - acc: 1.0000
Epoch 3/5
10000/10000 [==============================] - 15s 1ms/step - loss: 1.8203e-04 - acc: 1.0000
Epoch 4/5
10000/10000 [==============================] - 15s 2ms/step - loss: 1.1885e-05 - acc: 1.0000
Epoch 5/5
3010/10000 [========>.....................] - ETA: 10s - loss: 1.7200e-06 - acc: 1.0000
⇒2epoch目で100%を達成
dropout=0.5に変更
model.add(SimpleRNN(units=16,
return_sequences=True,
input_shape=[8, 2],
go_backwards=False,
activation='relu',
dropout=0.5,
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn_7 (SimpleRNN) (None, 8, 16) 304
_________________________________________________________________
dense_7 (Dense) (None, 8, 1) 17
=================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
10000/10000 [==============================] - 15s 2ms/step - loss: 0.2333 - acc: 0.5856
Epoch 2/5
10000/10000 [==============================] - 16s 2ms/step - loss: 0.2190 - acc: 0.6150
Epoch 3/5
10000/10000 [==============================] - 16s 2ms/step - loss: 0.2096 - acc: 0.6336
Epoch 4/5
10000/10000 [==============================] - 17s 2ms/step - loss: 0.2042 - acc: 0.6350
Epoch 5/5
10000/10000 [==============================] - 16s 2ms/step - loss: 0.2003 - acc: 0.6379
Test loss: 0.11175846526793616
Test accuracy: 0.8735873587477957
⇒学習は進むが、精度の上がりが遅い。dropoutは汎化性能は上げるが、その分、学習収束速度が落ちてしまう。Recurent_Dropout=0.3の追加しても同じ傾向が続く。
短い時系列に有効なunrollの追加
https://keras.io/ja/layers/recurrent/
nroll: 真理値(デフォルトはFalse).Trueなら,ネットワークは展開され, そうでなければシンボリックループが使われます. 展開はよりメモリ集中傾向になるが,RNNをスピードアップできます. 展開は短い系列にのみ適している。
model.add(SimpleRNN(units=16,
return_sequences=True,
input_shape=[8, 2],
go_backwards=False,
activation='relu',
#dropout=0.5,
#recurrent_dropout=0.3,
unroll = True,
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn_8 (SimpleRNN) (None, 8, 16) 304
_________________________________________________________________
dense_8 (Dense) (None, 8, 1) 17
=================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
10000/10000 [==============================] - 12s 1ms/step - loss: 0.1197 - acc: 0.8792
Epoch 2/5
10000/10000 [==============================] - 12s 1ms/step - loss: 0.0051 - acc: 1.0000
Epoch 3/5
10000/10000 [==============================] - 11s 1ms/step - loss: 2.5406e-04 - acc: 1.0000
Epoch 4/5
10000/10000 [==============================] - 11s 1ms/step - loss: 1.7620e-05 - acc: 1.0000
Epoch 5/5
10000/10000 [==============================] - 12s 1ms/step - loss: 1.4201e-06 - acc: 1.0000
Test loss: 3.270467229372976e-07
Test accuracy: 1.0
⇒学習速度が高く、かつ精度も出る。
GRU/LSTM
model.add(GRU(units=16,
return_sequences=True,
input_shape=[8, 2],
))
model.add(LSTM(units=16,
return_sequences=True,
input_shape=[8, 2],
))
⇒タスクによって、得意不得意があるため、簡単に書き換えられることで試行することが可能なKerasは非常に有効な手段となる。