第9章: RNN, CNN
この章でもGoogle Colaboratoryを使用します。
問題51でGoogleドライブに保存したデータを使用するため、でGoogleドライブのフォルダをマウントしておきます。
from google.colab import drive
drive.mount('/content/drive')
%cd "/content/drive/My Drive/NL100/data/"
80. ID番号への変換
問題51で構築した学習データ中の単語にユニークなID番号を付与したい.学習データ中で最も頻出する単語に1,2番目に頻出する単語に2,……といった方法で,学習データ中で2回以上出現する単語にID番号を付与せよ.そして,与えられた単語列に対して,ID番号の列を返す関数を実装せよ.ただし,出現頻度が2回未満の単語のID番号はすべて0とせよ.
まず、単語のID採番をして、単語からIDを取得できる辞書を作成します。
単語の出現頻度をカウントするのに、scikitlearnのCountVectorizerを使用します。
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
ds1 = pd.read_csv("train.feature.csv", sep='\t')
ds2 = pd.read_csv("valid.feature.csv", sep='\t')
ds3 = pd.read_csv("test.feature.csv", sep='\t')
df = pd.concat([ds1,ds2,ds3],axis=0)
vectorizer = CountVectorizer()
bag = vectorizer.fit_transform(df['TITLE'])
#特徴量ベクトルに変換(出現頻度)
vector = bag.toarray()
print(vector[0:3])
#単語の出現頻度を確認
df_w = pd.DataFrame({"WORD":[],"COUNT":[]})
vec=[]
for i in range(len(vector[0,:])): # 出現単語の種類の数だけループ
sumvec=0
for j in vector[:,i]: # 記事の数だけループ
sumvec+=j # 単語の出現回数を加算
vec.append(sumvec)
# 出現単語とそのカウントを取得しながらループし、DataFrameに登録
for word,count in zip(vectorizer.get_feature_names()[:], vec[:]):
series = pd.Series([word,count],index=["WORD","COUNT"])
df_w = df_w.append(series, ignore_index=True)
# 単語を出現数でソート
df_w = df_w.sort_values(by="COUNT", ascending=False)
# 1から連番でIDを振る
df_w["ID"]=np.arange(1,(1+len(df_w)))
# 出現回数が2より小さければIDを0にする
for index, row in df_w.iterrows():
if df_w.at[index, 'COUNT'] < 2:
df_w.at[index, 'ID'] = 0
df_w.head()
# 単語からIDを引ける辞書を作成
dictWord={}
for index, row in df_w.iterrows():
dictWord[df_w.at[index,"WORD"]] = df_w.at[index,"ID"]
<出力>
WORD COUNT ID
12310 to 3589.0 1
6007 in 2405.0 2
12159 the 1981.0 3
8305 of 1813.0 4
4750 for 1685.0 5
次に、訓練データ・検証データ・テストデータに対し、タイトル文字列の単語ID表現を追加します。
import pandas as pd
import numpy as np
# 文章を単語ID表現に変換する関数
def ConvWordId(strLine):
retlist=[]
seq=""
lines = strLine.split(' ')
for s in lines:
if s in dictWord:
retlist.append(str(dictWord[s]))
else:
retlist.append("0")
if len(retlist) > 0:
seq = ','.join(retlist)
print(seq)
return seq
# 訓練データのタイトル文字列から単語ID表現に変換し、データセットに追加
TITLE_ID=[]
print(len(ds1))
for index, row in ds1.iterrows():
strW = ConvWordId(ds1.at[index,"TITLE"])
TITLE_ID.append(strW)
ds1["TITLE_ID"] = TITLE_ID
# 検証データのタイトル文字列から単語ID表現に変換し、データセットに追加
TITLE_ID=[]
print(len(ds2))
for index, row in ds2.iterrows():
strW = ConvWordId(ds2.at[index,"TITLE"])
#print(strW)
TITLE_ID.append(strW)
ds2["TITLE_ID"] = TITLE_ID
# テストデータのタイトル文字列から単語ID表現に変換し、データセットに追加
TITLE_ID=[]
print(len(ds3))
for index, row in ds3.iterrows():
strW = ConvWordId(ds3.at[index,"TITLE"])
#print(strW)
TITLE_ID.append(strW)
ds3["TITLE_ID"] = TITLE_ID
作成したデータの確認
ds1.loc[range(3),["TITLE","TITLE_ID"]]
<出力>
TITLE TITLE_ID
0 refile update 0 european car sales up for sixt... 249,9,0,187,380,48,16,5,6633,61,7,2634,3645
1 amazon plans to fight ftc over mobile app purc... 180,231,1,327,1268,26,236,614,2451
2 kids still get codeine in emergency rooms desp... 397,155,177,0,2,2617,0,395,4926,1888,1836,0,152,0
81. RNNによる予測
ID番号で表現された単語列x=(x1,x2,…,xT)がある.ただし,Tは単語列の長さ,xt∈RVは単語のID番号のone-hot表記である(Vは単語の総数である).再帰型ニューラルネットワーク(RNN: Recurrent Neural Network)を用い,単語列xからカテゴリyを予測するモデルとして,次式を実装せよ.
h→0=0,h→t=RNN−→−−(emb(xt),h→t−1),y=softmax(W(yh)h→T+b(y))
ただし,emb(x)∈Rdwは単語埋め込み(単語のone-hot表記から単語ベクトルに変換する関数),h→t∈Rdhは時刻tの隠れ状態ベクトル,RNN−→−−(x,h)は入力xと前時刻の隠れ状態hから次状態を計算するRNNユニット,W(yh)∈RL×dhは隠れ状態ベクトルからカテゴリを予測するための行列,b(y)∈RLはバイアス項である(dw,dh,Lはそれぞれ,単語埋め込みの次元数,隠れ状態ベクトルの次元数,ラベル数である).RNNユニットRNN−→−−(x,h)には様々な構成が考えられるが,典型例として次式が挙げられる.
RNN−→−−(x,h)=g(W(hx)x+W(hh)h+b(h))
ただし,W(hx)∈Rdh×dw,W(hh)∈Rdh×dh,b(h)∈RdhはRNNユニットのパラメータ,gは活性化関数(例えばtanhやReLUなど)である.
なお,この問題ではパラメータの学習を行わず,ランダムに初期化されたパラメータでyを計算するだけでよい.次元数などのハイパーパラメータは,dw=300,dh=50など,適当な値に設定せよ(以降の問題でも同様である).
まず、データの前処理を行います。
from keras.utils.np_utils import to_categorical
# リストのサイズを揃える関数
def padding(sequence, max_len, pad_int):
if len(sequence) > max_len:
ret = sequence[0:max_len]
else:
ret = sequence.copy()
for i in range(max_len):
if len(sequence) <= i:
ret.append(pad_int)
return ret
# 訓練データの単語系列の系列長を揃える
maxlen = 40
sequences = []
for seq in ds1["TITLE_ID"].values: # データフレームから1行ずつ単語ID列を取り出す
sequence = list(map(int, seq.split(","))) # 単語ID列のリスト化
print(sequence)
sequences.append(padding(sequence, maxlen,0)) # 単語IDリストの長さを40にそろえる
trainX = np.array(sequences)
# 検証データの単語系列の系列長を揃える
sequences = []
for seq in ds2["TITLE_ID"].values:
sequence = list(map(int, seq.split(",")))
print(sequence)
sequences.append(padding(sequence, maxlen,0))
validX = np.array(sequences)
# テストデータの単語系列の系列長を揃える
sequences = []
for seq in ds3["TITLE_ID"].values:
sequence = list(map(int, seq.split(",")))
print(sequence)
sequences.append(padding(sequence, maxlen,0))
testX = np.array(sequences)
# ラベルベクトルの作成
category_dict = {'b': 0, 't': 1, 'e':2, 'm':3}
trainY = ds1['CATEGORY'].map(lambda x: category_dict[x]).values
validY = ds2['CATEGORY'].map(lambda x: category_dict[x]).values
testY = ds3['CATEGORY'].map(lambda x: category_dict[x]).values
trainY = to_categorical(trainY)
validY = to_categorical(validY)
testY = to_categorical(testY)
次に、学習モデルの作成です。
from keras.layers import SimpleRNN
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout
from keras.layers.embeddings import Embedding
n_hidden = 50
n_in=1
n_out = 4
epochs = 5
batch_size = 10
VOCAB_SIZE = np.max(df_w["ID"])+1 # 辞書のID数 + パディングID
EMB_SIZE = 300
PADDING_IDX = np.max(df_w["ID"])
model = Sequential()
model.add(Embedding(VOCAB_SIZE, EMB_SIZE, input_length=maxlen, mask_zero=True))
model.add(SimpleRNN(n_hidden, input_shape=(maxlen, n_in), kernel_initializer='random_normal'))
model.add(Activation("tanh"))
model.add(Dropout(0.8))
model.add(Dense(n_out, kernel_initializer='random_normal'))
model.add(Activation("softmax"))
model.compile(optimizer='Adam',
loss='categorical_crossentropy',
metrics='accuracy')
output = model.predict(testX)
for pred, correct in zip(output,testY):
print("predict:{} correct:{}".format(pred,correct))
<出力>
訓練前なので、予測結果は大体均等な割合になっています。
predict:[0.25140637 0.24426733 0.24648695 0.25783938] correct:[0. 0. 1. 0.]
predict:[0.24791518 0.26165664 0.24359217 0.24683607] correct:[0. 0. 1. 0.]
predict:[0.25174165 0.25658995 0.24985325 0.24181521] correct:[1. 0. 0. 0.]
predict:[0.24825716 0.25279102 0.25099677 0.24795501] correct:[1. 0. 0. 0.]
predict:[0.2441823 0.25148126 0.25326496 0.25107148] correct:[0. 0. 1. 0.]
・・・(省略)・・・
82. 確率的勾配降下法による学習
確率的勾配降下法(SGD: Stochastic Gradient Descent)を用いて,問題81で構築したモデルを学習せよ.訓練データ上の損失と正解率,評価データ上の損失と正解率を表示しながらモデルを学習し,適当な基準(例えば10エポックなど)で終了させよ.
結果を可視化するため、TensorBoardを使用するためのおまじないをします。
from __future__ import absolute_import, division, print_function, unicode_literals
try:
# %tensorflow_version only exists in Colab.
%tensorflow_version 2.x
except Exception:
pass
# Load the TensorBoard notebook extension
%load_ext tensorboard
先ほどのモデルで最適化ロジックをSGDに変更して訓練します。
from keras.layers import SimpleRNN
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout
from keras.layers.embeddings import Embedding
import keras
f_log = 'log' # ログ用フォルダ
n_hidden = 50
n_in=1
n_out = 4
epochs = 10
batch_size = 10
VOCAB_SIZE = np.max(df_w["ID"])+1 # 辞書のID数 + パディングID
EMB_SIZE = 300
PADDING_IDX = np.max(df_w["ID"])
model = Sequential()
model.add(Embedding(VOCAB_SIZE, EMB_SIZE, input_length=maxlen, mask_zero=True))
model.add(SimpleRNN(n_hidden, input_shape=(maxlen, n_in), kernel_initializer='random_normal'))
model.add(Activation("tanh"))
model.add(Dropout(0.8))
model.add(Dense(n_out, kernel_initializer='random_normal'))
model.add(Activation("softmax"))
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics='accuracy')
tb_cb = keras.callbacks.TensorBoard(log_dir=f_log, histogram_freq=1)
model.fit(trainX, trainY, batch_size=batch_size, epochs=epochs, callbacks=[tb_cb], validation_data=(validX, validY))
<出力>
Epoch 1/10
2/1069 [..............................] - ETA: 2:09 - loss: 1.3785 - accuracy: 0.3500WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0218s vs `on_train_batch_end` time: 0.2197s). Check your callbacks.
1069/1069 [==============================] - 16s 15ms/step - loss: 1.1879 - accuracy: 0.4318 - val_loss: 1.1420 - val_accuracy: 0.4716
Epoch 2/10
1069/1069 [==============================] - 16s 15ms/step - loss: 1.0259 - accuracy: 0.6118 - val_loss: 0.8500 - val_accuracy: 0.7021
Epoch 3/10
1069/1069 [==============================] - 16s 15ms/step - loss: 0.8197 - accuracy: 0.7253 - val_loss: 0.6730 - val_accuracy: 0.7612
Epoch 4/10
1069/1069 [==============================] - 16s 15ms/step - loss: 0.6821 - accuracy: 0.7629 - val_loss: 0.6151 - val_accuracy: 0.7740
Epoch 5/10
1069/1069 [==============================] - 16s 15ms/step - loss: 0.5828 - accuracy: 0.7869 - val_loss: 1.0985 - val_accuracy: 0.6999
Epoch 6/10
1069/1069 [==============================] - 16s 15ms/step - loss: 0.5072 - accuracy: 0.8162 - val_loss: 0.5451 - val_accuracy: 0.7964
Epoch 7/10
1069/1069 [==============================] - 16s 15ms/step - loss: 0.4638 - accuracy: 0.8282 - val_loss: 0.4979 - val_accuracy: 0.8249
Epoch 8/10
1069/1069 [==============================] - 16s 15ms/step - loss: 0.4110 - accuracy: 0.8531 - val_loss: 0.4707 - val_accuracy: 0.8301
Epoch 9/10
1069/1069 [==============================] - 16s 15ms/step - loss: 0.3799 - accuracy: 0.8601 - val_loss: 0.5409 - val_accuracy: 0.8054
Epoch 10/10
1069/1069 [==============================] - 17s 16ms/step - loss: 0.3459 - accuracy: 0.8685 - val_loss: 0.4307 - val_accuracy: 0.8458
<tensorflow.python.keras.callbacks.History at 0x7fade37ea828>
訓練データの正解率は右肩上がりで、エポック数を重ねれば更に精度は向上すると思われます。
検証データは4エポック目までは訓練データより良好ですが、4エポック目で一旦精度が落ちた後、右肩上がりに訓練データの正解率に追随しています。
83. ミニバッチ化・GPU上での学習
問題82のコードを改変し,B事例ごとに損失・勾配を計算して学習を行えるようにせよ(Bの値は適当に選べ).また,GPU上で学習を実行せよ.
バッチサイズを32に設定し、CPU,GPU,TPUで実行し、実行速度を比較してみました。
from keras.layers import SimpleRNN
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout
from keras.layers.embeddings import Embedding
import matplotlib.pyplot as plt
import keras
import time
# 時間計測のためのクラスを追加
class TimeHistory(keras.callbacks.Callback):
def __init__(self):
self.times = []
def on_epoch_begin(self, batch, logs={}):
self.Epoch_time_start = time.time()
def on_epoch_end(self, batch, logs={}):
self.times.append(time.time() - self.Epoch_time_start)
f_log = 'log' # ログ用フォルダ
n_hidden = 50
n_in=1
n_out = 4
epochs = 10
batch_size = 32
VOCAB_SIZE = np.max(df_w["ID"])+1 # 辞書のID数 + パディングID
EMB_SIZE = 300
PADDING_IDX = np.max(df_w["ID"])
model = Sequential()
model.add(Embedding(VOCAB_SIZE, EMB_SIZE, input_length=maxlen, mask_zero=True))
model.add(SimpleRNN(n_hidden, input_shape=(maxlen, n_in), kernel_initializer='random_normal'))
model.add(Activation("tanh"))
model.add(Dropout(0.8))
model.add(Dense(n_out, kernel_initializer='random_normal'))
model.add(Activation("softmax"))
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics='accuracy')
tb_cb = keras.callbacks.TensorBoard(log_dir=f_log, histogram_freq=1)
time_callback = TimeHistory()
model.fit(trainX, trainY, batch_size=batch_size, epochs=epochs, callbacks=[tb_cb, time_callback], validation_data=(validX, validY))
plt.plot( time_callback.times, label="batch size:%d" % batch_size, ls="-", marker="o")
plt.ylabel("time")
plt.xlabel("epoch")
plt.legend(loc="best")
plt.show()
かなり予想外の結果となりました。
混み具合のせいでしょうか?
84. 単語ベクトルの導入
事前学習済みの単語ベクトル(例えば,Google Newsデータセット(約1,000億単語)での学習済み単語ベクトル)で単語埋め込みemb(x)を初期化し,学習せよ.
問題60と同様にGoogle Newsのデータセットを読み込みます。
from gensim.models import KeyedVectors
# Google NewsデータセットからWord2Vecデータを読み込み
emb = KeyedVectors.load_word2vec_format('/content/drive/My Drive/NL100/data/GoogleNews-vectors-negative300.bin.gz', binary=True)
# 重みの行列を0で初期化
embedding_matrix = np.zeros((max(list(df_w["ID"])) + 1, emb.vector_size), dtype="float32")
# 重み行列の単語ID行に、その単語の単語ベクトルをセット
for key, idx in dictWord.items():
try:
embedding_matrix[idx] = emb[key]
except KeyError:
pass
f_log = 'log' # ログ用フォルダ
model = Sequential()
model.add(Embedding(VOCAB_SIZE, emb.vector_size, input_length=maxlen, weights=[embedding_matrix], mask_zero=True, trainable=False)) # すでに学習済みのモデルであるので、重み更新されないようtrainable=False
model.add(SimpleRNN(n_hidden, input_shape=(maxlen, n_in), kernel_initializer='random_normal'))
model.add(Activation("tanh"))
model.add(Dropout(0.8))
model.add(Dense(n_out, kernel_initializer='random_normal'))
model.add(Activation("softmax"))
model.compile(optimizer='Adam',
loss='categorical_crossentropy',
metrics='accuracy')
tb_cb = keras.callbacks.TensorBoard(log_dir=f_log, histogram_freq=1)
model.fit(trainX, trainY, batch_size=batch_size, epochs=epochs, callbacks=[tb_cb], validation_data=(validX, validY))
model.evaluate(testX, testY, batch_size=10, verbose=1)
<出力>
Epoch 1/10
2/334 [..............................] - ETA: 43s - loss: 1.4101 - accuracy: 0.2031WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0325s vs `on_train_batch_end` time: 0.2271s). Check your callbacks.
334/334 [==============================] - 7s 21ms/step - loss: 0.7323 - accuracy: 0.7419 - val_loss: 0.4513 - val_accuracy: 0.8488
Epoch 2/10
334/334 [==============================] - 6s 19ms/step - loss: 0.4892 - accuracy: 0.8365 - val_loss: 0.3820 - val_accuracy: 0.8743
Epoch 3/10
334/334 [==============================] - 6s 19ms/step - loss: 0.4235 - accuracy: 0.8642 - val_loss: 0.3776 - val_accuracy: 0.8795
Epoch 4/10
334/334 [==============================] - 6s 19ms/step - loss: 0.3904 - accuracy: 0.8769 - val_loss: 0.3265 - val_accuracy: 0.8885
Epoch 5/10
334/334 [==============================] - 6s 19ms/step - loss: 0.3778 - accuracy: 0.8842 - val_loss: 0.3286 - val_accuracy: 0.8817
Epoch 6/10
334/334 [==============================] - 6s 19ms/step - loss: 0.3537 - accuracy: 0.8922 - val_loss: 0.3290 - val_accuracy: 0.8922
Epoch 7/10
334/334 [==============================] - 7s 20ms/step - loss: 0.3467 - accuracy: 0.8935 - val_loss: 0.3057 - val_accuracy: 0.8930
Epoch 8/10
334/334 [==============================] - 6s 19ms/step - loss: 0.3274 - accuracy: 0.9005 - val_loss: 0.3308 - val_accuracy: 0.8915
Epoch 9/10
334/334 [==============================] - 6s 18ms/step - loss: 0.3189 - accuracy: 0.9054 - val_loss: 0.3290 - val_accuracy: 0.8885
Epoch 10/10
334/334 [==============================] - 6s 18ms/step - loss: 0.3217 - accuracy: 0.9003 - val_loss: 0.3284 - val_accuracy: 0.8937
134/134 [==============================] - 1s 4ms/step - loss: 0.4002 - accuracy: 0.8705
[0.4002329111099243, 0.8705089688301086]
TensorBoardで確認
%tensorboard --logdir log
85. 双方向RNN・多層化
順方向と逆方向のRNNの両方を用いて入力テキストをエンコードし,モデルを学習せよ.
h←T+1=0,h←t=RNN←−−−(emb(xt),h←t+1),y=softmax(W(yh)[h→T;h←1]+b(y))
ただし,h→t∈Rdh,h←t∈Rdhはそれぞれ,順方向および逆方向のRNNで求めた時刻tの隠れ状態ベクトル,RNN←−−−(x,h)は入力xと次時刻の隠れ状態hから前状態を計算するRNNユニット,W(yh)∈RL×2dhは隠れ状態ベクトルからカテゴリを予測するための行列,b(y)∈RLはバイアス項である.また,[a;b]はベクトルaとbの連結を表す。
さらに,双方向RNNを多層化して実験せよ.
from keras.layers.wrappers import Bidirectional
f_log = "log"
model = Sequential()
model.add(Embedding(VOCAB_SIZE, EMB_SIZE, input_length=maxlen, mask_zero=True))
model.add(Bidirectional(SimpleRNN(n_hidden, kernel_initializer='random_normal', return_sequences=True), input_shape=(maxlen, n_in)))
model.add(Bidirectional(SimpleRNN(n_hidden)))
model.add(Dropout(0.8))
model.add(Dense(n_out, kernel_initializer='random_normal'))
model.add(Activation("softmax"))
model.compile(optimizer='Adam',
loss='categorical_crossentropy',
metrics='accuracy')
tb_cb = keras.callbacks.TensorBoard(log_dir=f_log, histogram_freq=1)
model.fit(trainX, trainY, batch_size=batch_size, epochs=epochs, callbacks=[tb_cb], validation_data=(validX, validY))
model.evaluate(testX, testY, batch_size=10, verbose=1)
<出力>
Epoch 1/10
2/334 [..............................] - ETA: 2:53 - loss: 1.3686 - accuracy: 0.2969WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.1410s vs `on_train_batch_end` time: 0.9037s). Check your callbacks.
334/334 [==============================] - 32s 96ms/step - loss: 0.6255 - accuracy: 0.7714 - val_loss: 0.3496 - val_accuracy: 0.8795
Epoch 2/10
334/334 [==============================] - 30s 90ms/step - loss: 0.2346 - accuracy: 0.9226 - val_loss: 0.2962 - val_accuracy: 0.9109
Epoch 3/10
334/334 [==============================] - 30s 90ms/step - loss: 0.0992 - accuracy: 0.9719 - val_loss: 0.3149 - val_accuracy: 0.9109
Epoch 4/10
334/334 [==============================] - 30s 89ms/step - loss: 0.0519 - accuracy: 0.9847 - val_loss: 0.4051 - val_accuracy: 0.9072
Epoch 5/10
334/334 [==============================] - 30s 90ms/step - loss: 0.0310 - accuracy: 0.9913 - val_loss: 0.3926 - val_accuracy: 0.9229
Epoch 6/10
334/334 [==============================] - 30s 91ms/step - loss: 0.0305 - accuracy: 0.9917 - val_loss: 0.4067 - val_accuracy: 0.9139
Epoch 7/10
334/334 [==============================] - 30s 91ms/step - loss: 0.0267 - accuracy: 0.9927 - val_loss: 0.4711 - val_accuracy: 0.9109
Epoch 8/10
334/334 [==============================] - 30s 91ms/step - loss: 0.0253 - accuracy: 0.9934 - val_loss: 0.5540 - val_accuracy: 0.9072
Epoch 9/10
334/334 [==============================] - 30s 91ms/step - loss: 0.0253 - accuracy: 0.9929 - val_loss: 0.5130 - val_accuracy: 0.9087
Epoch 10/10
334/334 [==============================] - 30s 90ms/step - loss: 0.0284 - accuracy: 0.9922 - val_loss: 0.4822 - val_accuracy: 0.9177
134/134 [==============================] - 1s 11ms/step - loss: 0.6410 - accuracy: 0.8907
[0.6410461068153381, 0.8907185792922974]
訓練データの正解率は99%超えましたが、検証データの正解率は90%前後です。
%tensorboard --logdir log
86. 畳み込みニューラルネットワーク (CNN)
ID番号で表現された単語列x=(x1,x2,…,xT)がある.ただし,Tは単語列の長さ,xt∈RVは単語のID番号のone-hot表記である(Vは単語の総数である).畳み込みニューラルネットワーク(CNN: Convolutional Neural Network)を用い,単語列xからカテゴリyを予測するモデルを実装せよ.
ただし,畳み込みニューラルネットワークの構成は以下の通りとする.
単語埋め込みの次元数: dw
畳み込みのフィルターのサイズ: 3 トークン
畳み込みのストライド: 1 トークン
畳み込みのパディング: あり
畳み込み演算後の各時刻のベクトルの次元数: dh
畳み込み演算後に最大値プーリング(max pooling)を適用し,入力文をdh次元の隠れベクトルで表現
すなわち,時刻tの特徴ベクトルpt∈Rdhは次式で表される.
pt=g(W(px)[emb(xt−1);emb(xt);emb(xt+1)]+b(p))
ただし,W(px)∈Rdh×3dw,b(p)∈RdhはCNNのパラメータ,gは活性化関数(例えばtanhやReLUなど),[a;b;c]はベクトルa,b,cの連結である.なお,行列W(px)の列数が3dwになるのは,3個のトークンの単語埋め込みを連結したものに対して,線形変換を行うためである.
最大値プーリングでは,特徴ベクトルの次元毎に全時刻における最大値を取り,入力文書の特徴ベクトルc∈Rdhを求める.c[i]でベクトルcのi番目の次元の値を表すことにすると,最大値プーリングは次式で表される.
c[i]=max1≤t≤Tpt[i]
最後に,入力文書の特徴ベクトルcに行列W(yc)∈RL×dhとバイアス項b(y)∈RLによる線形変換とソフトマックス関数を適用し,カテゴリyを予測する.
y=softmax(W(yc)c+b(y))
なお,この問題ではモデルの学習を行わず,ランダムに初期化された重み行列でyを計算するだけでよい.
from keras.layers import Conv1D, MaxPooling1D,Flatten
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers import Activation, Dense, Dropout
f_log = 'log' # ログ用フォルダ
n_out = 4
batch_size = 32
VOCAB_SIZE = np.max(df_w["ID"])+1 # 辞書のID数 + パディングID
EMB_SIZE = 300
PADDING_IDX = np.max(df_w["ID"])
epochs=20
model = Sequential()
model.add(Embedding(VOCAB_SIZE, EMB_SIZE, input_length=maxlen, mask_zero=True))
print(model.output_shape)
model.add(Conv1D(64, 3, strides=1, padding="same" ))
print(model.output_shape)
model.add(Activation("relu"))
model.add(MaxPooling1D(pool_size=2))
print(model.output_shape)
model.add(Flatten())
print(model.output_shape)
model.add(Dense(n_out, kernel_initializer='random_normal'))
model.add(Activation("softmax"))
model.compile(optimizer="sgd",
loss='categorical_crossentropy',
metrics='accuracy')
model.evaluate(testX, testY)
<出力>
(None, 40, 300)
(None, 40, 64)
(None, 20, 64)
(None, 1280)
42/42 [==============================] - 0s 5ms/step - loss: 1.4079 - accuracy: 0.0726
[1.4079458713531494, 0.07260479032993317]
87. 確率的勾配降下法によるCNNの学習
確率的勾配降下法(SGD: Stochastic Gradient Descent)を用いて,問題86で構築したモデルを学習せよ.訓練データ上の損失と正解率,評価データ上の損失と正解率を表示しながらモデルを学習し,適当な基準(例えば10エポックなど)で終了させよ.
from keras.layers import Conv1D, MaxPooling1D,Flatten
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers import Activation, Dense, Dropout
from keras.optimizers import Adam
import keras
f_log = 'log' # ログ用フォルダ
n_out = 4
batch_size = 32
VOCAB_SIZE = np.max(df_w["ID"])+1 # 辞書のID数 + パディングID
EMB_SIZE = 300
PADDING_IDX = np.max(df_w["ID"])
epochs=20
model = Sequential()
model.add(Embedding(VOCAB_SIZE, EMB_SIZE, input_length=maxlen, mask_zero=True))
print(model.output_shape)
model.add(Conv1D(64, 3, strides=1, padding="same" ))
print(model.output_shape)
model.add(Activation("relu"))
model.add(MaxPooling1D(pool_size=2))
print(model.output_shape)
model.add(Flatten())
print(model.output_shape)
model.add(Dense(n_out, kernel_initializer='random_normal'))
model.add(Activation("softmax"))
model.compile(optimizer="sgd",
loss='categorical_crossentropy',
metrics='accuracy')
tb_cb = keras.callbacks.TensorBoard(log_dir=f_log, histogram_freq=1)
model.fit(trainX, trainY, batch_size=batch_size, epochs=epochs, callbacks=[tb_cb],validation_data=(validX, validY))
model.evaluate(testX, testY)
%tensorboard --logdir log
<出力>
(None, 40, 300)
(None, 40, 64)
(None, 20, 64)
(None, 1280)
Epoch 1/20
2/334 [..............................] - ETA: 13s - loss: 1.3658 - accuracy: 0.4844WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0180s vs `on_train_batch_end` time: 0.0647s). Check your callbacks.
334/334 [==============================] - 6s 18ms/step - loss: 1.1737 - accuracy: 0.4613 - val_loss: 1.1490 - val_accuracy: 0.5172
Epoch 2/20
334/334 [==============================] - 6s 18ms/step - loss: 1.1400 - accuracy: 0.5368 - val_loss: 1.1309 - val_accuracy: 0.5299
Epoch 3/20
334/334 [==============================] - 6s 18ms/step - loss: 1.1182 - accuracy: 0.5534 - val_loss: 1.1109 - val_accuracy: 0.5352
Epoch 4/20
334/334 [==============================] - 6s 18ms/step - loss: 1.0962 - accuracy: 0.5607 - val_loss: 1.0920 - val_accuracy: 0.5546
Epoch 5/20
334/334 [==============================] - 6s 18ms/step - loss: 1.0782 - accuracy: 0.5734 - val_loss: 1.0771 - val_accuracy: 0.5704
Epoch 6/20
334/334 [==============================] - 6s 18ms/step - loss: 1.0635 - accuracy: 0.5802 - val_loss: 1.0644 - val_accuracy: 0.5689
Epoch 7/20
334/334 [==============================] - 6s 18ms/step - loss: 1.0511 - accuracy: 0.5840 - val_loss: 1.0626 - val_accuracy: 0.5636
Epoch 8/20
334/334 [==============================] - 6s 18ms/step - loss: 1.0399 - accuracy: 0.5880 - val_loss: 1.0556 - val_accuracy: 0.5704
Epoch 9/20
334/334 [==============================] - 6s 18ms/step - loss: 1.0269 - accuracy: 0.5951 - val_loss: 1.0385 - val_accuracy: 0.5973
Epoch 10/20
334/334 [==============================] - 6s 18ms/step - loss: 1.0133 - accuracy: 0.6051 - val_loss: 1.0247 - val_accuracy: 0.5951
Epoch 11/20
334/334 [==============================] - 6s 18ms/step - loss: 0.9982 - accuracy: 0.6161 - val_loss: 1.0025 - val_accuracy: 0.6078
Epoch 12/20
334/334 [==============================] - 6s 18ms/step - loss: 0.9808 - accuracy: 0.6285 - val_loss: 0.9884 - val_accuracy: 0.6235
Epoch 13/20
334/334 [==============================] - 6s 18ms/step - loss: 0.9631 - accuracy: 0.6384 - val_loss: 0.9686 - val_accuracy: 0.6355
Epoch 14/20
334/334 [==============================] - 6s 18ms/step - loss: 0.9419 - accuracy: 0.6523 - val_loss: 0.9459 - val_accuracy: 0.6534
Epoch 15/20
334/334 [==============================] - 6s 18ms/step - loss: 0.9207 - accuracy: 0.6664 - val_loss: 0.9218 - val_accuracy: 0.6707
Epoch 16/20
334/334 [==============================] - 6s 18ms/step - loss: 0.8967 - accuracy: 0.6829 - val_loss: 0.8987 - val_accuracy: 0.6856
Epoch 17/20
334/334 [==============================] - 6s 18ms/step - loss: 0.8729 - accuracy: 0.6945 - val_loss: 0.8738 - val_accuracy: 0.7028
Epoch 18/20
334/334 [==============================] - 6s 18ms/step - loss: 0.8468 - accuracy: 0.7086 - val_loss: 0.8463 - val_accuracy: 0.7081
Epoch 19/20
334/334 [==============================] - 6s 18ms/step - loss: 0.8210 - accuracy: 0.7183 - val_loss: 0.8223 - val_accuracy: 0.7186
Epoch 20/20
334/334 [==============================] - 6s 18ms/step - loss: 0.7954 - accuracy: 0.7298 - val_loss: 0.7979 - val_accuracy: 0.7313
42/42 [==============================] - 0s 5ms/step - loss: 0.8316 - accuracy: 0.7163
[0.8315699100494385, 0.716317355632782]
88. パラメータチューニング
問題85や問題87のコードを改変し,ニューラルネットワークの形状やハイパーパラメータを調整しながら,高性能なカテゴリ分類器を構築せよ.
from keras.layers import Conv1D, MaxPooling1D,Flatten
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers import Activation, Dense, Dropout
from keras.optimizers import Adam
import keras
f_log = 'log' # ログ用フォルダ
n_out = 4
batch_size = 32
VOCAB_SIZE = np.max(df_w["ID"])+1 # 辞書のID数 + パディングID
EMB_SIZE = 300
PADDING_IDX = np.max(df_w["ID"])
epochs=20
model = Sequential()
model.add(Embedding(VOCAB_SIZE, EMB_SIZE, input_length=maxlen, mask_zero=True))
print(model.output_shape)
model.add(Conv1D(64, 2, strides=1, padding="same" ))
print(model.output_shape)
model.add(Activation("relu"))
model.add(Conv1D(32, 3, strides=1, padding="same" ))
print(model.output_shape)
model.add(Activation("relu"))
model.add(Conv1D(32, 4, strides=1, padding="same" ))
print(model.output_shape)
model.add(Activation("relu"))
model.add(MaxPooling1D(pool_size=2))
print(model.output_shape)
model.add(Flatten())
print(model.output_shape)
model.add(Dropout(0.8))
model.add(Dense(n_out, kernel_initializer='random_normal'))
model.add(Activation("softmax"))
model.compile(optimizer=Adam(lr=0.001),
loss='categorical_crossentropy',
metrics='accuracy')
tb_cb = keras.callbacks.TensorBoard(log_dir=f_log, histogram_freq=1)
print(trainX.shape)
print(trainY.shape)
model.fit(trainX, trainY, batch_size=batch_size, epochs=epochs, callbacks=[tb_cb],validation_data=(validX, validY))
model.evaluate(testX, testY, batch_size=10, verbose=1)
%tensorboard --logdir log
<出力>
(None, 40, 300)
(None, 40, 64)
(None, 40, 32)
(None, 40, 32)
(None, 20, 32)
(None, 640)
(10684, 40)
(10684, 4)
Epoch 1/20
2/334 [..............................] - ETA: 28s - loss: 1.3841 - accuracy: 0.2656WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0626s vs `on_train_batch_end` time: 0.1076s). Check your callbacks.
334/334 [==============================] - 17s 52ms/step - loss: 0.8266 - accuracy: 0.6557 - val_loss: 0.4379 - val_accuracy: 0.8166
Epoch 2/20
334/334 [==============================] - 17s 51ms/step - loss: 0.3830 - accuracy: 0.8472 - val_loss: 0.3544 - val_accuracy: 0.8638
Epoch 3/20
334/334 [==============================] - 17s 51ms/step - loss: 0.2550 - accuracy: 0.8929 - val_loss: 0.3613 - val_accuracy: 0.8668
Epoch 4/20
334/334 [==============================] - 17s 51ms/step - loss: 0.1915 - accuracy: 0.9188 - val_loss: 0.3839 - val_accuracy: 0.8915
Epoch 5/20
334/334 [==============================] - 17s 51ms/step - loss: 0.1236 - accuracy: 0.9571 - val_loss: 0.3696 - val_accuracy: 0.9124
Epoch 6/20
334/334 [==============================] - 17s 51ms/step - loss: 0.0809 - accuracy: 0.9741 - val_loss: 0.3686 - val_accuracy: 0.9199
Epoch 7/20
334/334 [==============================] - 18s 53ms/step - loss: 0.0706 - accuracy: 0.9787 - val_loss: 0.3676 - val_accuracy: 0.9251
Epoch 8/20
334/334 [==============================] - 17s 51ms/step - loss: 0.0422 - accuracy: 0.9858 - val_loss: 0.4602 - val_accuracy: 0.9237
Epoch 9/20
334/334 [==============================] - 17s 51ms/step - loss: 0.0372 - accuracy: 0.9890 - val_loss: 0.3974 - val_accuracy: 0.9222
Epoch 10/20
334/334 [==============================] - 18s 53ms/step - loss: 0.0323 - accuracy: 0.9903 - val_loss: 0.5205 - val_accuracy: 0.9177
Epoch 11/20
334/334 [==============================] - 17s 51ms/step - loss: 0.0325 - accuracy: 0.9904 - val_loss: 0.5027 - val_accuracy: 0.9237
Epoch 12/20
334/334 [==============================] - 17s 52ms/step - loss: 0.0227 - accuracy: 0.9924 - val_loss: 0.5795 - val_accuracy: 0.9177
Epoch 13/20
334/334 [==============================] - 17s 51ms/step - loss: 0.0244 - accuracy: 0.9924 - val_loss: 0.7259 - val_accuracy: 0.9139
Epoch 14/20
334/334 [==============================] - 18s 54ms/step - loss: 0.0201 - accuracy: 0.9941 - val_loss: 0.7249 - val_accuracy: 0.9154
Epoch 15/20
334/334 [==============================] - 17s 52ms/step - loss: 0.0181 - accuracy: 0.9948 - val_loss: 0.7823 - val_accuracy: 0.9124
Epoch 16/20
334/334 [==============================] - 17s 50ms/step - loss: 0.0148 - accuracy: 0.9951 - val_loss: 0.7605 - val_accuracy: 0.9124
Epoch 17/20
334/334 [==============================] - 17s 51ms/step - loss: 0.0194 - accuracy: 0.9940 - val_loss: 0.8505 - val_accuracy: 0.9079
Epoch 18/20
334/334 [==============================] - 17s 50ms/step - loss: 0.0209 - accuracy: 0.9940 - val_loss: 0.8888 - val_accuracy: 0.9087
Epoch 19/20
334/334 [==============================] - 17s 50ms/step - loss: 0.0175 - accuracy: 0.9949 - val_loss: 0.6597 - val_accuracy: 0.9184
Epoch 20/20
334/334 [==============================] - 17s 49ms/step - loss: 0.0190 - accuracy: 0.9941 - val_loss: 0.7623 - val_accuracy: 0.9177
134/134 [==============================] - 0s 2ms/step - loss: 1.1281 - accuracy: 0.8937
Reusing TensorBoard on port 6006 (pid 982), started 0:13:52 ago. (Use '!kill 982' to kill it.)
89. 事前学習済み言語モデルからの転移学習
事前学習済み言語モデル(例えばBERTなど)を出発点として,ニュース記事見出しをカテゴリに分類するモデルを構築せよ.
BERTのWebページに様々なサイズのモデルが公開されていますが、
選択を誤ると痛い目を見ます。
私が最初にダウンロードしたファイルは
wwm_cased_L-24_H-1024_A-16.zip
でした。
これはチェックポイントファイルが4Gバイトを超えており、Google Driveの容量を圧迫し、ディスク容量を購入する羽目になりました。
学習もものすごく時間がかかります。学習データの量がそれほど大きいわけでもないですが、1エポックに3時間半以上かかり、Google Colaboratoryの実行時間12時間制限に引っかかってしまいます。
少ないエポック数では精度も全然あがりません。
その後、以下のファイルをダウンロードして上記のファイルと置換し、無事に学習させることができました。
まず、解凍したファイルをGoogle Driveにアップロードします。
Google Driveを /content/drive にマウントし、以下のフォルダにアップロードしました。
/content/drive/My Drive/NL100/data/bert/bert-wiki
--bert_config.json
--bert_model.ckpt.data-00000-of-00001
--bert_model.ckpt.index
--vocab.txt
カレントフォルダを移動しておきます。
%cd "/content/drive/My Drive/NL100/data/bert"
つぎに、BERTをKerasで動かすために必要なライブラリをインストールします。
BERTは、「Bidirectional Encoder Representations from Transformers:Transformerを活用した双方向的エンコード表現)」の略称ですが、BERTを動かすにはTransformerが必要です。
!pip install transformers
!pip install keras_bert
翻訳等の自然言語処理では、Attentionを用いたエンコーダー-デコーダ形式のRNNやCNNが主流でしたが、Transformerは、RNNやCNNを用いず、Attentionのみを用いたモデルです。
TransformerはBERTだけでなく、最近の自然言語処理で話題のXLNet、GPT-2のベースにもなっています。
Transformerに関してはこちらのサイトに詳しく書かれています。
深層学習界の大前提Transformerの論文解説!
BERTは、Transformerの双方向的な訓練を適用し、文脈と文の流れから見た言語の意味をより深くつかめるモデルになっています。
(参考)
BERT解説:自然言語処理のための最先端言語モデル
また、BERTのモデル作成方法に関しては以下を参考にしています。
BERT(Keras BERT)を使用した文章分類を学習から予測まで紹介!
実際、Google DriveにアップロードしたBERTの定義ファイルとチェックポイントファイルからKerasのモデルを作成します。
import sys
sys.path.append('modules')
from keras_bert import load_trained_model_from_checkpoint
# BERTのロード
config_path = '/content/drive/My Drive/NL100/data/bert/bert-wiki/bert_config.json'
checkpoint_path = '/content/drive/My Drive/NL100/data/bert/bert-wiki/bert_model.ckpt'
# 訓練データ・テストデータの最大のトークン数
SEQ_LEN = 20
BATCH_SIZE = 16
# BERTの配列の次元
BERT_DIM = 512
# Adamの学習率
LR = 1e-4
# 学習回数
EPOCH = 20
bert = load_trained_model_from_checkpoint(config_path, checkpoint_path, training=True, trainable=True, seq_len=SEQ_LEN)
bert.summary()
<出力>
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
Input-Token (InputLayer) [(None, 20)] 0
__________________________________________________________________________________________________
Input-Segment (InputLayer) [(None, 20)] 0
__________________________________________________________________________________________________
Embedding-Token (TokenEmbedding [(None, 20, 512), (3 15627264 Input-Token[0][0]
__________________________________________________________________________________________________
Embedding-Segment (Embedding) (None, 20, 512) 1024 Input-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Token-Segment (Add) (None, 20, 512) 0 Embedding-Token[0][0]
Embedding-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Position (PositionEmb (None, 20, 512) 10240 Embedding-Token-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Dropout (Dropout) (None, 20, 512) 0 Embedding-Position[0][0]
__________________________________________________________________________________________________
Embedding-Norm (LayerNormalizat (None, 20, 512) 1024 Embedding-Dropout[0][0]
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, None, 512) 1050624 Embedding-Norm[0][0]
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, None, 512) 0 Encoder-1-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 20, 512) 0 Embedding-Norm[0][0]
Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 20, 512) 1024 Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-FeedForward (FeedForw (None, 20, 512) 2099712 Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-FeedForward-Dropout ( (None, 20, 512) 0 Encoder-1-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-1-FeedForward-Add (Add) (None, 20, 512) 0 Encoder-1-MultiHeadSelfAttention-
Encoder-1-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-1-FeedForward-Norm (Lay (None, 20, 512) 1024 Encoder-1-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, None, 512) 1050624 Encoder-1-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, None, 512) 0 Encoder-2-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 20, 512) 0 Encoder-1-FeedForward-Norm[0][0]
Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 20, 512) 1024 Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-FeedForward (FeedForw (None, 20, 512) 2099712 Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-FeedForward-Dropout ( (None, 20, 512) 0 Encoder-2-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-2-FeedForward-Add (Add) (None, 20, 512) 0 Encoder-2-MultiHeadSelfAttention-
Encoder-2-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-2-FeedForward-Norm (Lay (None, 20, 512) 1024 Encoder-2-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, None, 512) 1050624 Encoder-2-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, None, 512) 0 Encoder-3-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 20, 512) 0 Encoder-2-FeedForward-Norm[0][0]
Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 20, 512) 1024 Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-FeedForward (FeedForw (None, 20, 512) 2099712 Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-FeedForward-Dropout ( (None, 20, 512) 0 Encoder-3-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-3-FeedForward-Add (Add) (None, 20, 512) 0 Encoder-3-MultiHeadSelfAttention-
Encoder-3-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-3-FeedForward-Norm (Lay (None, 20, 512) 1024 Encoder-3-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, None, 512) 1050624 Encoder-3-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, None, 512) 0 Encoder-4-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 20, 512) 0 Encoder-3-FeedForward-Norm[0][0]
Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 20, 512) 1024 Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-FeedForward (FeedForw (None, 20, 512) 2099712 Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-FeedForward-Dropout ( (None, 20, 512) 0 Encoder-4-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-4-FeedForward-Add (Add) (None, 20, 512) 0 Encoder-4-MultiHeadSelfAttention-
Encoder-4-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-4-FeedForward-Norm (Lay (None, 20, 512) 1024 Encoder-4-FeedForward-Add[0][0]
__________________________________________________________________________________________________
MLM-Dense (Dense) (None, 20, 512) 262656 Encoder-4-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
MLM-Norm (LayerNormalization) (None, 20, 512) 1024 MLM-Dense[0][0]
__________________________________________________________________________________________________
Extract (Extract) (None, 512) 0 Encoder-4-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
MLM-Sim (EmbeddingSimilarity) (None, 20, 30522) 30522 MLM-Norm[0][0]
Embedding-Token[0][1]
__________________________________________________________________________________________________
Input-Masked (InputLayer) [(None, 20)] 0
__________________________________________________________________________________________________
NSP-Dense (Dense) (None, 512) 262656 Extract[0][0]
__________________________________________________________________________________________________
MLM (Masked) (None, 20, 30522) 0 MLM-Sim[0][0]
Input-Masked[0][0]
__________________________________________________________________________________________________
NSP (Dense) (None, 2) 1026 NSP-Dense[0][0]
==================================================================================================
Total params: 28,806,972
Trainable params: 28,806,972
Non-trainable params: 0
__________________________________________________________________________________________________
つぎに入力データの準備です。
問題81で作成したデータを使用します。
ただ、問題81では最大トークン数を40程度に設定していましたが、そのままのサイズでBERTの巨大モデルに投入したらメモリオーバーでクラッシュしたので、20までで切り落として使用することにしました。(ひょっとしたらモデルを小さくしたので40でもクラッシュすることはないかもしれませんが、、、)
import pandas as pd
import numpy as np
df = pd.DataFrame(testX)
df = df.iloc[:,0:20]
testX = df.values
df = pd.DataFrame(trainX)
df = df.iloc[:,0:20]
trainX = df.values
クラス分類したいので、モデルの最後にSoftmax関数を使用した出力層を追加します。
# データをまとめた関数
def _load_labeldata():
maxlen=20
train_features = trainX
test_features = testX
train_segments = np.zeros((len(train_features), maxlen), dtype = np.float32)
test_segments = np.zeros((len(test_features), maxlen), dtype = np.float32)
# 出力クラス数
class_count = 4
print(f'Trainデータ数: {len(trainX[:,0])}, Testデータ数: {len(testX[:,0])}, ラベル数: {class_count}')
return {
'class_count': class_count,
'train_labels': trainY,
'test_labels': testY,
'train_features': trainX,
'train_segments': train_segments,
'test_features': testX,
'test_segments': test_segments,
'input_len': maxlen
}
from keras.layers import Dense, Dropout, LSTM, Bidirectional, Flatten, GlobalMaxPooling1D
from keras_bert.layers import MaskedGlobalMaxPool1D
from keras import Input, Model
from keras_bert import AdamWarmup, calc_train_steps
# モデル構築の関数
def _create_model(input_shape, class_count):
decay_steps, warmup_steps = calc_train_steps(
input_shape[0],
batch_size=BATCH_SIZE,
epochs=EPOCH,
)
bert_last = bert.get_layer(name='NSP-Dense').output
x1 = bert_last
output_tensor = Dense(class_count, activation='softmax')(x1)
# Trainableの場合は、Input Masked Layerが3番目の入力なりますが、
# FineTuning時には必要無いので1, 2番目の入力だけ使用します。
# Trainableでなければkeras-bertのModel.inputそのままで問題ありません。
model = Model([bert.input[0], bert.input[1]], output_tensor)
model.compile(loss='categorical_crossentropy',
optimizer=AdamWarmup(decay_steps=decay_steps, warmup_steps=warmup_steps, lr=LR),
metrics=['mae', 'mse', 'acc'])
return model
# データロードとモデルの準備
from keras.callbacks import EarlyStopping, ModelCheckpoint, TensorBoard
data = _load_labeldata()
model_filename = 'models/knbc_finetuning.model'
model = _create_model(data['train_features'].shape, data['class_count'])
model.summary()
<出力>
Trainデータ数: 10684, Testデータ数: 1336, ラベル数: 4
Model: "functional_3"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
Input-Token (InputLayer) [(None, 20)] 0
__________________________________________________________________________________________________
Input-Segment (InputLayer) [(None, 20)] 0
__________________________________________________________________________________________________
Embedding-Token (TokenEmbedding [(None, 20, 512), (3 15627264 Input-Token[0][0]
__________________________________________________________________________________________________
Embedding-Segment (Embedding) (None, 20, 512) 1024 Input-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Token-Segment (Add) (None, 20, 512) 0 Embedding-Token[0][0]
Embedding-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Position (PositionEmb (None, 20, 512) 10240 Embedding-Token-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Dropout (Dropout) (None, 20, 512) 0 Embedding-Position[0][0]
__________________________________________________________________________________________________
Embedding-Norm (LayerNormalizat (None, 20, 512) 1024 Embedding-Dropout[0][0]
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, None, 512) 1050624 Embedding-Norm[0][0]
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, None, 512) 0 Encoder-1-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 20, 512) 0 Embedding-Norm[0][0]
Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 20, 512) 1024 Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-FeedForward (FeedForw (None, 20, 512) 2099712 Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-FeedForward-Dropout ( (None, 20, 512) 0 Encoder-1-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-1-FeedForward-Add (Add) (None, 20, 512) 0 Encoder-1-MultiHeadSelfAttention-
Encoder-1-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-1-FeedForward-Norm (Lay (None, 20, 512) 1024 Encoder-1-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, None, 512) 1050624 Encoder-1-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, None, 512) 0 Encoder-2-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 20, 512) 0 Encoder-1-FeedForward-Norm[0][0]
Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 20, 512) 1024 Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-FeedForward (FeedForw (None, 20, 512) 2099712 Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-FeedForward-Dropout ( (None, 20, 512) 0 Encoder-2-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-2-FeedForward-Add (Add) (None, 20, 512) 0 Encoder-2-MultiHeadSelfAttention-
Encoder-2-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-2-FeedForward-Norm (Lay (None, 20, 512) 1024 Encoder-2-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, None, 512) 1050624 Encoder-2-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, None, 512) 0 Encoder-3-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 20, 512) 0 Encoder-2-FeedForward-Norm[0][0]
Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 20, 512) 1024 Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-FeedForward (FeedForw (None, 20, 512) 2099712 Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-FeedForward-Dropout ( (None, 20, 512) 0 Encoder-3-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-3-FeedForward-Add (Add) (None, 20, 512) 0 Encoder-3-MultiHeadSelfAttention-
Encoder-3-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-3-FeedForward-Norm (Lay (None, 20, 512) 1024 Encoder-3-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, None, 512) 1050624 Encoder-3-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, None, 512) 0 Encoder-4-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 20, 512) 0 Encoder-3-FeedForward-Norm[0][0]
Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 20, 512) 1024 Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-FeedForward (FeedForw (None, 20, 512) 2099712 Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-FeedForward-Dropout ( (None, 20, 512) 0 Encoder-4-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-4-FeedForward-Add (Add) (None, 20, 512) 0 Encoder-4-MultiHeadSelfAttention-
Encoder-4-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-4-FeedForward-Norm (Lay (None, 20, 512) 1024 Encoder-4-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Extract (Extract) (None, 512) 0 Encoder-4-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
NSP-Dense (Dense) (None, 512) 262656 Extract[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 4) 2052 NSP-Dense[0][0]
==================================================================================================
Total params: 28,513,796
Trainable params: 28,513,796
Non-trainable params: 0
__________________________________________________________________________________________________
モデルの最後にdense層が追加されているのがわかります。
モデル作成したら、さっそく訓練です。
history = model.fit([data['train_features'], data['train_segments']],
data['train_labels'],
epochs = EPOCH,
batch_size = BATCH_SIZE,
validation_data=([data['test_features'], data['test_segments']], data['test_labels']),
shuffle=False,
verbose = 1,
callbacks = [
ModelCheckpoint(monitor='val_acc', mode='max', filepath=model_filename, save_best_only=True)
])
<出力>
Epoch 1/20
668/668 [==============================] - ETA: 0s - loss: 0.9379 - mae: 0.2463 - mse: 0.1241 - acc: 0.6203WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py:111: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: models/knbc_finetuning.model/assets
668/668 [==============================] - 591s 885ms/step - loss: 0.9379 - mae: 0.2463 - mse: 0.1241 - acc: 0.6203 - val_loss: 0.7782 - val_mae: 0.1610 - val_mse: 0.0962 - val_acc: 0.7328
Epoch 2/20
668/668 [==============================] - ETA: 0s - loss: 0.4534 - mae: 0.1124 - mse: 0.0574 - acc: 0.8394INFO:tensorflow:Assets written to: models/knbc_finetuning.model/assets
668/668 [==============================] - 604s 905ms/step - loss: 0.4534 - mae: 0.1124 - mse: 0.0574 - acc: 0.8394 - val_loss: 0.5117 - val_mae: 0.1017 - val_mse: 0.0610 - val_acc: 0.8346
Epoch 3/20
668/668 [==============================] - ETA: 0s - loss: 0.2189 - mae: 0.0529 - mse: 0.0273 - acc: 0.9283INFO:tensorflow:Assets written to: models/knbc_finetuning.model/assets
668/668 [==============================] - 602s 901ms/step - loss: 0.2189 - mae: 0.0529 - mse: 0.0273 - acc: 0.9283 - val_loss: 0.4273 - val_mae: 0.0715 - val_mse: 0.0499 - val_acc: 0.8728
Epoch 4/20
668/668 [==============================] - 589s 882ms/step - loss: 0.1130 - mae: 0.0271 - mse: 0.0141 - acc: 0.9639 - val_loss: 0.5599 - val_mae: 0.0697 - val_mse: 0.0543 - val_acc: 0.8713
Epoch 5/20
668/668 [==============================] - ETA: 0s - loss: 0.0551 - mae: 0.0134 - mse: 0.0069 - acc: 0.9824INFO:tensorflow:Assets written to: models/knbc_finetuning.model/assets
668/668 [==============================] - 596s 893ms/step - loss: 0.0551 - mae: 0.0134 - mse: 0.0069 - acc: 0.9824 - val_loss: 0.5576 - val_mae: 0.0590 - val_mse: 0.0495 - val_acc: 0.8870
Epoch 6/20
668/668 [==============================] - 599s 896ms/step - loss: 0.0354 - mae: 0.0082 - mse: 0.0043 - acc: 0.9890 - val_loss: 0.6096 - val_mae: 0.0604 - val_mse: 0.0516 - val_acc: 0.8832
Epoch 7/20
668/668 [==============================] - 602s 901ms/step - loss: 0.0327 - mae: 0.0074 - mse: 0.0041 - acc: 0.9894 - val_loss: 0.6174 - val_mae: 0.0613 - val_mse: 0.0527 - val_acc: 0.8765
Epoch 8/20
668/668 [==============================] - ETA: 0s - loss: 0.0199 - mae: 0.0048 - mse: 0.0026 - acc: 0.9934INFO:tensorflow:Assets written to: models/knbc_finetuning.model/assets
668/668 [==============================] - 621s 929ms/step - loss: 0.0199 - mae: 0.0048 - mse: 0.0026 - acc: 0.9934 - val_loss: 0.6860 - val_mae: 0.0561 - val_mse: 0.0502 - val_acc: 0.8907
Epoch 9/20
668/668 [==============================] - ETA: 0s - loss: 0.0250 - mae: 0.0056 - mse: 0.0032 - acc: 0.9920INFO:tensorflow:Assets written to: models/knbc_finetuning.model/assets
668/668 [==============================] - 645s 966ms/step - loss: 0.0250 - mae: 0.0056 - mse: 0.0032 - acc: 0.9920 - val_loss: 0.5956 - val_mae: 0.0550 - val_mse: 0.0474 - val_acc: 0.8952
Epoch 10/20
668/668 [==============================] - 633s 948ms/step - loss: 0.0120 - mae: 0.0028 - mse: 0.0016 - acc: 0.9957 - val_loss: 0.6431 - val_mae: 0.0555 - val_mse: 0.0482 - val_acc: 0.8892
Epoch 11/20
668/668 [==============================] - 599s 897ms/step - loss: 0.0093 - mae: 0.0023 - mse: 0.0013 - acc: 0.9965 - val_loss: 0.7258 - val_mae: 0.0594 - val_mse: 0.0523 - val_acc: 0.8840
Epoch 12/20
668/668 [==============================] - ETA: 0s - loss: 0.0093 - mae: 0.0024 - mse: 0.0014 - acc: 0.9962INFO:tensorflow:Assets written to: models/knbc_finetuning.model/assets
668/668 [==============================] - 608s 911ms/step - loss: 0.0093 - mae: 0.0024 - mse: 0.0014 - acc: 0.9962 - val_loss: 0.6894 - val_mae: 0.0518 - val_mse: 0.0462 - val_acc: 0.8990
Epoch 13/20
668/668 [==============================] - 601s 900ms/step - loss: 0.0073 - mae: 0.0016 - mse: 8.3860e-04 - acc: 0.9975 - val_loss: 0.7164 - val_mae: 0.0514 - val_mse: 0.0468 - val_acc: 0.8967
Epoch 14/20
668/668 [==============================] - 605s 905ms/step - loss: 0.0044 - mae: 0.0011 - mse: 6.2132e-04 - acc: 0.9982 - val_loss: 0.7567 - val_mae: 0.0512 - val_mse: 0.0469 - val_acc: 0.8990
Epoch 15/20
668/668 [==============================] - ETA: 0s - loss: 0.0033 - mae: 0.0010 - mse: 5.7569e-04 - acc: 0.9978INFO:tensorflow:Assets written to: models/knbc_finetuning.model/assets
668/668 [==============================] - 612s 916ms/step - loss: 0.0033 - mae: 0.0010 - mse: 5.7569e-04 - acc: 0.9978 - val_loss: 0.8021 - val_mae: 0.0506 - val_mse: 0.0468 - val_acc: 0.9012
Epoch 16/20
668/668 [==============================] - ETA: 0s - loss: 0.0036 - mae: 9.9245e-04 - mse: 5.5182e-04 - acc: 0.9984INFO:tensorflow:Assets written to: models/knbc_finetuning.model/assets
668/668 [==============================] - 613s 918ms/step - loss: 0.0036 - mae: 9.9245e-04 - mse: 5.5182e-04 - acc: 0.9984 - val_loss: 0.7621 - val_mae: 0.0483 - val_mse: 0.0445 - val_acc: 0.9042
Epoch 17/20
668/668 [==============================] - 618s 925ms/step - loss: 0.0032 - mae: 7.8040e-04 - mse: 4.1805e-04 - acc: 0.9984 - val_loss: 0.8081 - val_mae: 0.0499 - val_mse: 0.0462 - val_acc: 0.9004
Epoch 18/20
668/668 [==============================] - 621s 930ms/step - loss: 0.0033 - mae: 8.8343e-04 - mse: 4.9967e-04 - acc: 0.9983 - val_loss: 0.8117 - val_mae: 0.0484 - val_mse: 0.0449 - val_acc: 0.9042
Epoch 19/20
668/668 [==============================] - 619s 927ms/step - loss: 0.0015 - mae: 5.1728e-04 - mse: 2.6630e-04 - acc: 0.9985 - val_loss: 0.8114 - val_mae: 0.0479 - val_mse: 0.0446 - val_acc: 0.9034
Epoch 20/20
668/668 [==============================] - 607s 909ms/step - loss: 0.0014 - mae: 5.0146e-04 - mse: 2.3981e-04 - acc: 0.9992 - val_loss: 0.8189 - val_mae: 0.0486 - val_mse: 0.0456 - val_acc: 0.9042
グラフ出力
import matplotlib.pyplot as plt
%matplotlib inline
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))
# accの表示
plt.plot(epochs, acc, 'r' ,label = 'training acc')
plt.plot(epochs, val_acc, 'b' , label= 'validation acc')
plt.title('Training and Validation acc')
plt.legend()
plt.figure()
# lossの表示
plt.plot(epochs, loss, 'r' ,label = 'training loss')
plt.plot(epochs, val_loss, 'b' , label= 'validation loss')
plt.title('Training and Validation loss')
plt.legend()
plt.show()
20回のエポック数で、テストデータの正解率が90%をわずかに上回りました。
しかし、問題88の結果と同等レベルでした。