More than 1 year has passed since last update.

ラビットチャレンジ #13 深層学習Day3

Last updated at 2021-12-26Posted at 2021-12-26

CNNの構造図

3×3になる

Section1：再帰型ニューラルネットワークの概念

RNN全体像

RNNとは時系列データに対応可能な、ニューラルネットワーク
- 時系列データとは時間的順序を追って一定間隔ごとに観察され，しかも相互に統計的依存関係が認められるようなデータ
- 具体的な例として、音声データなどがある
RNNの数式はこのようになる

u^t = W_{(in)}x^t+Wz^{t-1}+b \\
z^t = f(W_{(in)}x^t+Wz^{t-1}+b) \\
v^t = W_{(out)}z^t + c \\
y^t = g(W_{(out)}z^t + c)

前の中間層から今の中間層にかけられる重み（$W$）

問題の
- 入力から現在の中間層を定義する際にかけられる重みが$W_{(in)}$
- 中間層から出力を定義する際にかけられる重みが$W_{(out)}$
RNNの特徴として初期の状態と過去の時間t-1の状態を保持し、そこから次の時間でのtを再帰的に求める再帰構造が必要になる

BPTT

BPTTとはRNNのパラメータ調整方法の一種
- 誤差逆伝播の一種

誤差逆伝播の復習

計算結果（=誤差）から微分を逆算することで、不要な再帰的計算を避けて微分を算出できる。

\frac{dz}{dx} = \frac{dz}{dt} * \frac{dt}{dx} \\
= 2t * 1 \\
= 2(x+y)

BPTTの数学的記述

その1
その2

y_1 = g(W_{(out)}z_1+c)\\
z_1 = f(Wz_0+W_{(in)}x_1+b)

その3
その4

実装演習

import numpy as np
from common import functions
import matplotlib.pyplot as plt

# def d_tanh(x):



# データを用意
# 2進数の桁数
binary_dim = 8
# 最大値 + 1
largest_number = pow(2, binary_dim)
# largest_numberまで2進数を用意
binary = np.unpackbits(np.array([range(largest_number)],dtype=np.uint8).T,axis=1)

input_layer_size = 2
hidden_layer_size = 16
output_layer_size = 1

weight_init_std = 1
learning_rate = 0.1

iters_num = 10000
plot_interval = 100

# ウェイト初期化 (バイアスは簡単のため省略)
W_in = weight_init_std * np.random.randn(input_layer_size, hidden_layer_size)
W_out = weight_init_std * np.random.randn(hidden_layer_size, output_layer_size)
W = weight_init_std * np.random.randn(hidden_layer_size, hidden_layer_size)

# Xavier


# He



# 勾配
W_in_grad = np.zeros_like(W_in)
W_out_grad = np.zeros_like(W_out)
W_grad = np.zeros_like(W)

u = np.zeros((hidden_layer_size, binary_dim + 1))
z = np.zeros((hidden_layer_size, binary_dim + 1))
y = np.zeros((output_layer_size, binary_dim))

delta_out = np.zeros((output_layer_size, binary_dim))
delta = np.zeros((hidden_layer_size, binary_dim + 1))

all_losses = []

for i in range(iters_num):
    
    # A, B初期化 (a + b = d)
    a_int = np.random.randint(largest_number/2)
    a_bin = binary[a_int] # binary encoding
    b_int = np.random.randint(largest_number/2)
    b_bin = binary[b_int] # binary encoding
    
    # 正解データ
    d_int = a_int + b_int
    d_bin = binary[d_int]
    
    # 出力バイナリ
    out_bin = np.zeros_like(d_bin)
    
    # 時系列全体の誤差
    all_loss = 0    
    
    # 時系列ループ
    for t in range(binary_dim):
        # 入力値
        X = np.array([a_bin[ - t - 1], b_bin[ - t - 1]]).reshape(1, -1)
        # 時刻tにおける正解データ
        dd = np.array([d_bin[binary_dim - t - 1]])
        
        u[:,t+1] = np.dot(X, W_in) + np.dot(z[:,t].reshape(1, -1), W)
        z[:,t+1] = functions.sigmoid(u[:,t+1])

        y[:,t] = functions.sigmoid(np.dot(z[:,t+1].reshape(1, -1), W_out))


        #誤差
        loss = functions.mean_squared_error(dd, y[:,t])
        
        delta_out[:,t] = functions.d_mean_squared_error(dd, y[:,t]) * functions.d_sigmoid(y[:,t])        
        
        all_loss += loss

        out_bin[binary_dim - t - 1] = np.round(y[:,t])
    
    
    for t in range(binary_dim)[::-1]:
        X = np.array([a_bin[-t-1],b_bin[-t-1]]).reshape(1, -1)        

        delta[:,t] = (np.dot(delta[:,t+1].T, W.T) + np.dot(delta_out[:,t].T, W_out.T)) * functions.d_sigmoid(u[:,t+1])

        # 勾配更新
        W_out_grad += np.dot(z[:,t+1].reshape(-1,1), delta_out[:,t].reshape(-1,1))
        W_grad += np.dot(z[:,t].reshape(-1,1), delta[:,t].reshape(1,-1))
        W_in_grad += np.dot(X.T, delta[:,t].reshape(1,-1))
    
    # 勾配適用
    W_in -= learning_rate * W_in_grad
    W_out -= learning_rate * W_out_grad
    W -= learning_rate * W_grad
    
    W_in_grad *= 0
    W_out_grad *= 0
    W_grad *= 0
    

    if(i % plot_interval == 0):
        all_losses.append(all_loss)        
        print("iters:" + str(i))
        print("Loss:" + str(all_loss))
        print("Pred:" + str(out_bin))
        print("True:" + str(d_bin))
        out_int = 0
        for index,x in enumerate(reversed(out_bin)):
            out_int += x * pow(2, index)
        print(str(a_int) + " + " + str(b_int) + " = " + str(out_int))
        print("------------")

lists = range(0, iters_num, plot_interval)
plt.plot(lists, all_losses, label="loss")
plt.show()


iters:0
# Loss:1.1488339613748393
# Pred:[0 0 0 0 0 0 0 0]
# True:[1 0 1 0 0 1 1 0]
# 74 + 92 = 0
# ------------
# iters:100
# Loss:0.9039920941909935
# Pred:[1 0 0 0 0 1 0 0]
# True:[1 0 0 1 0 0 0 1]
# 90 + 55 = 132
# ------------
# iters:200
# Loss:0.905193644777182
# Pred:[0 0 0 0 0 1 0 1]
# True:[0 1 1 0 0 1 0 1]
# 45 + 56 = 5
# ------------
# iters:300
# Loss:0.9142826509456912
# Pred:[1 1 1 1 1 1 1 1]
# True:[1 0 1 1 0 0 0 1]
# 73 + 104 = 255
# ------------
# iters:400
# Loss:1.0324640618413718
# Pred:[1 1 1 0 0 1 0 0]
# True:[0 1 1 1 0 0 0 1]
# 88 + 25 = 228
# ------------
# iters:500
# Loss:0.8931327732122861
# Pred:[1 0 1 0 1 0 0 0]
# True:[1 0 1 0 0 0 1 0]
# 85 + 77 = 168
# ------------
# iters:600
# Loss:0.9868891307058315
# Pred:[1 0 1 0 0 1 1 1]
# True:[0 1 0 0 0 0 1 1]
# 16 + 51 = 167
# ------------
# iters:700
# Loss:1.0235137652815283
# Pred:[0 1 1 1 1 1 1 1]
# True:[0 1 0 0 1 1 0 1]
# 15 + 62 = 127
# ------------
# iters:800
# Loss:0.9543834397349825
# Pred:[1 0 0 0 0 0 0 0]
# True:[1 0 1 1 0 0 0 0]
# 79 + 97 = 128
# ------------
# iters:900
# Loss:0.872164050816704
# Pred:[0 1 1 1 0 1 0 0]
# True:[0 1 1 1 0 0 1 0]
# 34 + 80 = 116
# ------------
# iters:1000
# Loss:0.7650616278924502
# Pred:[0 0 0 0 0 0 0 0]
# True:[1 0 0 0 0 0 0 0]
# 127 + 1 = 0
# ------------
# iters:1100
# Loss:1.0078531273209432
# Pred:[1 1 1 1 1 0 1 1]
# True:[1 1 0 0 0 1 1 1]
# 105 + 94 = 251
# ------------
# iters:1200
# Loss:1.0639358950377042
# Pred:[0 0 1 0 0 0 0 0]
# True:[0 1 0 0 1 1 0 1]
# 27 + 50 = 32
# ------------
# iters:1300
# Loss:1.0393180637913888
# Pred:[0 1 0 0 0 0 1 1]
# True:[0 1 1 0 1 1 0 0]
# 49 + 59 = 67
# ------------
# iters:1400
# Loss:0.9011110362194974
# Pred:[1 1 1 1 1 1 0 0]
# True:[1 1 1 1 1 0 0 0]
# 126 + 122 = 252
# ------------
# iters:1500
# Loss:1.1465555114792538
# Pred:[0 1 0 1 1 0 1 1]
# True:[1 0 1 0 0 0 1 1]
# 54 + 109 = 91
# ------------
# iters:1600
# Loss:1.266525809504998
# Pred:[1 1 1 1 1 1 1 1]
# True:[1 0 1 0 0 1 0 0]
# 95 + 69 = 255
# ------------
# iters:1700
# Loss:1.0281788790359414
# Pred:[1 1 1 1 1 0 1 1]
# True:[1 0 0 1 1 0 1 0]
# 93 + 61 = 251
# ------------
# iters:1800
# Loss:0.9534461057423635
# Pred:[0 1 1 1 1 0 1 1]
# True:[0 1 0 0 0 0 1 1]
# 36 + 31 = 123
# ------------
# iters:1900
# Loss:0.9175609909812852
# Pred:[1 0 0 1 1 1 1 1]
# True:[1 0 1 0 0 1 1 1]
# 77 + 90 = 159
# ------------
# iters:2000
# Loss:0.8354289907577811
# Pred:[0 1 1 1 1 1 0 0]
# True:[0 1 1 0 1 1 1 0]
# 104 + 6 = 124
# ------------
# iters:2100
# Loss:0.9605301202348999
# Pred:[0 1 1 1 0 1 0 1]
# True:[1 0 0 0 0 1 0 1]
# 122 + 11 = 117
# ------------
# iters:2200
# Loss:0.8242415961706964
# Pred:[0 1 0 0 1 0 1 1]
# True:[1 0 0 0 1 0 1 1]
# 33 + 106 = 75
# ------------
# iters:2300
# Loss:1.2850768001352875
# Pred:[0 1 1 1 1 0 1 1]
# True:[1 0 0 0 0 0 1 1]
# 30 + 101 = 123
# ------------
# iters:2400
# Loss:1.0536915648832617
# Pred:[1 1 0 1 0 0 1 1]
# True:[1 0 1 0 0 0 1 1]
# 41 + 122 = 211
# ------------
# iters:2500
# Loss:1.465043181315218
# Pred:[1 1 1 1 1 1 0 1]
# True:[1 0 0 0 0 0 0 1]
# 125 + 4 = 253
# ------------
# iters:2600
# Loss:0.6059802568484904
# Pred:[0 0 1 0 0 1 0 0]
# True:[0 1 0 0 1 1 0 0]
# 20 + 56 = 36
# ------------
# iters:2700
# Loss:0.5216417115004406
# Pred:[1 0 1 1 1 1 1 1]
# True:[1 0 1 1 1 1 1 1]
# 109 + 82 = 191
# ------------
# iters:2800
# Loss:0.4962122556852809
# Pred:[1 0 0 1 1 0 1 0]
# True:[1 1 0 1 1 0 1 0]
# 104 + 114 = 154
# ------------
# iters:2900
# Loss:0.463867536356606
# Pred:[0 1 1 0 1 0 1 1]
# True:[0 1 1 0 1 0 1 1]
# 37 + 70 = 107
# ------------
# iters:3000
# Loss:0.5402602454746994
# Pred:[0 0 1 1 1 0 1 1]
# True:[0 0 1 0 0 0 1 1]
# 20 + 15 = 59
# ------------
# iters:3100
# Loss:0.9026775195235532
# Pred:[0 1 1 1 1 1 0 0]
# True:[0 1 1 0 0 0 1 0]
# 83 + 15 = 124
# ------------
# iters:3200
# Loss:0.38484499766628066
# Pred:[0 1 0 1 0 0 1 0]
# True:[0 1 0 1 0 0 1 0]
# 37 + 45 = 82
# ------------
# iters:3300
# Loss:0.5834155226225781
# Pred:[1 0 0 1 1 1 0 0]
# True:[1 0 1 1 1 0 0 0]
# 86 + 98 = 156
# ------------
# iters:3400
# Loss:1.17889474494824
# Pred:[1 1 1 1 0 0 0 1]
# True:[1 0 0 0 1 1 0 1]
# 95 + 46 = 241
# ------------
# iters:3500
# Loss:0.48251036357000865
# Pred:[0 1 1 1 1 0 1 1]
# True:[0 0 1 1 1 0 1 1]
# 37 + 22 = 123
# ------------
# iters:3600
# Loss:0.6498998293617873
# Pred:[1 0 1 1 1 0 0 1]
# True:[1 0 1 1 1 1 0 1]
# 70 + 119 = 185
# ------------
# iters:3700
# Loss:0.15242075543827768
# Pred:[0 0 1 0 0 1 0 0]
# True:[0 0 1 0 0 1 0 0]
# 16 + 20 = 36
# ------------
# iters:3800
# Loss:0.35093066487196384
# Pred:[1 0 1 1 0 0 0 1]
# True:[1 0 1 1 0 0 0 1]
# 108 + 69 = 177
# ------------
# iters:3900
# Loss:0.5355963275788471
# Pred:[0 1 1 1 0 1 1 1]
# True:[0 1 0 0 0 1 1 1]
# 31 + 40 = 119
# ------------
# iters:4000
# Loss:0.9023372113492066
# Pred:[1 0 0 0 0 0 0 1]
# True:[1 1 0 1 1 0 0 1]
# 123 + 94 = 129
# ------------
# iters:4100
# Loss:0.4409425508514817
# Pred:[1 1 0 0 0 0 1 1]
# True:[1 0 0 0 0 0 1 1]
# 34 + 97 = 195
# ------------
# iters:4200
# Loss:0.9278789176143879
# Pred:[1 1 0 0 1 1 1 0]
# True:[1 0 0 1 0 0 0 0]
# 25 + 119 = 206
# ------------
# iters:4300
# Loss:0.3965493643581658
# Pred:[1 0 0 1 0 1 1 1]
# True:[1 0 0 1 0 1 1 1]
# 118 + 33 = 151
# ------------
# iters:4400
# Loss:0.365792577573951
# Pred:[0 1 0 0 1 0 0 0]
# True:[0 1 0 0 1 0 0 0]
# 60 + 12 = 72
# ------------
# iters:4500
# Loss:0.388239877553089
# Pred:[1 0 0 1 1 1 1 0]
# True:[1 1 0 1 1 1 1 0]
# 123 + 99 = 158
# ------------
# iters:4600
# Loss:0.15990661571358508
# Pred:[0 1 1 1 0 0 1 1]
# True:[0 1 1 1 0 0 1 1]
# 35 + 80 = 115
# ------------
# iters:4700
# Loss:0.4802054579176366
# Pred:[0 1 0 0 0 0 0 1]
# True:[0 1 0 1 0 0 0 1]
# 50 + 31 = 65
# ------------
# iters:4800
# Loss:0.1304623583401485
# Pred:[0 0 0 1 0 0 0 1]
# True:[0 0 0 1 0 0 0 1]
# 5 + 12 = 17
# ------------
# iters:4900
# Loss:0.07883464426855903
# Pred:[0 1 1 1 1 1 0 1]
# True:[0 1 1 1 1 1 0 1]
# 114 + 11 = 125
# ------------
# iters:5000
# Loss:0.04428302080535708
# Pred:[0 1 1 0 0 1 1 1]
# True:[0 1 1 0 0 1 1 1]
# 36 + 67 = 103
# ------------
# iters:5100
# Loss:0.06766175303844331
# Pred:[1 0 0 0 1 1 1 0]
# True:[1 0 0 0 1 1 1 0]
# 60 + 82 = 142
# ------------
# iters:5200
# Loss:0.13716292368212088
# Pred:[1 0 1 0 0 0 1 1]
# True:[1 0 1 0 0 0 1 1]
# 78 + 85 = 163
# ------------
# iters:5300
# Loss:0.2885939809012163
# Pred:[1 0 0 0 0 0 1 0]
# True:[1 0 0 1 0 0 1 0]
# 51 + 95 = 130
# ------------
# iters:5400
# Loss:0.058341714765702594
# Pred:[1 0 0 0 0 0 1 0]
# True:[1 0 0 0 0 0 1 0]
# 32 + 98 = 130
# ------------
# iters:5500
# Loss:0.05953813730876062
# Pred:[1 0 0 1 0 1 1 0]
# True:[1 0 0 1 0 1 1 0]
# 118 + 32 = 150
# ------------
# iters:5600
# Loss:0.09662181715647031
# Pred:[1 0 1 0 1 1 1 1]
# True:[1 0 1 0 1 1 1 1]
# 52 + 123 = 175
# ------------
# iters:5700
# Loss:0.04135143424740656
# Pred:[0 1 0 0 0 0 1 0]
# True:[0 1 0 0 0 0 1 0]
# 41 + 25 = 66
# ------------
# iters:5800
# Loss:0.01654406659183811
# Pred:[0 1 1 0 0 1 1 0]
# True:[0 1 1 0 0 1 1 0]
# 100 + 2 = 102
# ------------
# iters:5900
# Loss:0.10917668717471508
# Pred:[0 1 0 1 0 0 0 0]
# True:[0 1 0 1 0 0 0 0]
# 29 + 51 = 80
# ------------
# iters:6000
# Loss:0.03330483908103671
# Pred:[0 1 1 1 0 0 0 1]
# True:[0 1 1 1 0 0 0 1]
# 100 + 13 = 113
# ------------
# iters:6100
# Loss:0.1068743668654802
# Pred:[0 1 1 0 1 0 1 0]
# True:[0 1 1 0 1 0 1 0]
# 59 + 47 = 106
# ------------
# iters:6200
# Loss:0.03916087080496731
# Pred:[0 1 1 0 1 1 1 1]
# True:[0 1 1 0 1 1 1 1]
# 30 + 81 = 111
# ------------
# iters:6300
# Loss:0.058427304707599666
# Pred:[1 1 1 0 1 0 1 0]
# True:[1 1 1 0 1 0 1 0]
# 119 + 115 = 234
# ------------
# iters:6400
# Loss:0.06504187871168517
# Pred:[1 0 0 0 0 0 0 0]
# True:[1 0 0 0 0 0 0 0]
# 114 + 14 = 128
# ------------
# iters:6500
# Loss:0.0057876799450448935
# Pred:[1 0 1 1 0 1 1 0]
# True:[1 0 1 1 0 1 1 0]
# 113 + 69 = 182
# ------------
# iters:6600
# Loss:0.039048253795518774
# Pred:[0 1 0 0 1 1 0 1]
# True:[0 1 0 0 1 1 0 1]
# 11 + 66 = 77
# ------------
# iters:6700
# Loss:0.020094824860589177
# Pred:[0 1 0 0 1 1 1 1]
# True:[0 1 0 0 1 1 1 1]
# 52 + 27 = 79
# ------------
# iters:6800
# Loss:0.024360067783454467
# Pred:[1 0 1 0 0 1 0 1]
# True:[1 0 1 0 0 1 0 1]
# 114 + 51 = 165
# ------------
# iters:6900
# Loss:0.04386532228355007
# Pred:[0 1 0 1 0 1 0 1]
# True:[0 1 0 1 0 1 0 1]
# 59 + 26 = 85
# ------------
# iters:7000
# Loss:0.03093567575659112
# Pred:[1 0 0 0 0 0 1 0]
# True:[1 0 0 0 0 0 1 0]
# 30 + 100 = 130
# ------------
# iters:7100
# Loss:0.04598003394967047
# Pred:[1 0 0 1 1 0 0 0]
# True:[1 0 0 1 1 0 0 0]
# 27 + 125 = 152
# ------------
# iters:7200
# Loss:0.008578192775326359
# Pred:[1 0 1 1 1 0 0 0]
# True:[1 0 1 1 1 0 0 0]
# 120 + 64 = 184
# ------------
# iters:7300
# Loss:0.024969047045157424
# Pred:[1 0 1 1 0 0 0 1]
# True:[1 0 1 1 0 0 0 1]
# 116 + 61 = 177
# ------------
# iters:7400
# Loss:0.012012403497561304
# Pred:[0 1 0 1 0 0 0 0]
# True:[0 1 0 1 0 0 0 0]
# 68 + 12 = 80
# ------------
# iters:7500
# Loss:0.009211275010850705
# Pred:[1 1 0 1 1 1 0 0]
# True:[1 1 0 1 1 1 0 0]
# 96 + 124 = 220
# ------------
# iters:7600
# Loss:0.022270448611346864
# Pred:[1 0 0 0 1 0 0 0]
# True:[1 0 0 0 1 0 0 0]
# 30 + 106 = 136
# ------------
# iters:7700
# Loss:0.021494394291221658
# Pred:[0 1 0 1 0 0 1 1]
# True:[0 1 0 1 0 0 1 1]
# 39 + 44 = 83
# ------------
# iters:7800
# Loss:0.01300937579170816
# Pred:[1 0 0 0 0 1 0 1]
# True:[1 0 0 0 0 1 0 1]
# 89 + 44 = 133
# ------------
# iters:7900
# Loss:0.017958966717361156
# Pred:[1 1 0 1 1 0 1 1]
# True:[1 1 0 1 1 0 1 1]
# 106 + 113 = 219
# ------------
# iters:8000
# Loss:0.007165176028318373
# Pred:[0 1 1 0 0 1 1 0]
# True:[0 1 1 0 0 1 1 0]
# 32 + 70 = 102
# ------------
# iters:8100
# Loss:0.008221692479168694
# Pred:[0 1 1 0 1 1 0 0]
# True:[0 1 1 0 1 1 0 0]
# 92 + 16 = 108
# ------------
# iters:8200
# Loss:0.007257866447870363
# Pred:[0 1 1 1 0 1 1 1]
# True:[0 1 1 1 0 1 1 1]
# 100 + 19 = 119
# ------------
# iters:8300
# Loss:0.01618787598250365
# Pred:[1 0 0 1 0 1 0 0]
# True:[1 0 0 1 0 1 0 0]
# 119 + 29 = 148
# ------------
# iters:8400
# Loss:0.011743798857904079
# Pred:[0 1 1 0 0 1 0 1]
# True:[0 1 1 0 0 1 0 1]
# 63 + 38 = 101
# ------------
# iters:8500
# Loss:0.010756655305433448
# Pred:[1 0 1 1 1 0 1 0]
# True:[1 0 1 1 1 0 1 0]
# 123 + 63 = 186
# ------------
# iters:8600
# Loss:0.010853098917513009
# Pred:[0 1 1 0 0 0 1 1]
# True:[0 1 1 0 0 0 1 1]
# 15 + 84 = 99
# ------------
# iters:8700
# Loss:0.0022410902934356203
# Pred:[0 1 0 1 1 1 0 0]
# True:[0 1 0 1 1 1 0 0]
# 91 + 1 = 92
# ------------
# iters:8800
# Loss:0.004894560701902507
# Pred:[0 1 0 0 0 1 0 1]
# True:[0 1 0 0 0 1 0 1]
# 32 + 37 = 69
# ------------
# iters:8900
# Loss:0.006628202449284809
# Pred:[1 1 0 1 0 0 0 0]
# True:[1 1 0 1 0 0 0 0]
# 112 + 96 = 208
# ------------
# iters:9000
# Loss:0.005621380736963992
# Pred:[1 0 1 1 0 1 1 1]
# True:[1 0 1 1 0 1 1 1]
# 92 + 91 = 183
# ------------
# iters:9100
# Loss:0.005968740836425797
# Pred:[1 1 0 0 1 0 0 1]
# True:[1 1 0 0 1 0 0 1]
# 103 + 98 = 201
# ------------
# iters:9200
# Loss:0.0008327347004810689
# Pred:[0 0 0 0 1 1 1 0]
# True:[0 0 0 0 1 1 1 0]
# 5 + 9 = 14
# ------------
# iters:9300
# Loss:0.007711302241492535
# Pred:[0 1 0 0 0 1 1 1]
# True:[0 1 0 0 0 1 1 1]
# 19 + 52 = 71
# ------------
# iters:9400
# Loss:0.005473230166881731
# Pred:[0 1 1 0 0 0 0 1]
# True:[0 1 1 0 0 0 0 1]
# 28 + 69 = 97
# ------------
# iters:9500
# Loss:0.00440018899493548
# Pred:[1 0 0 1 0 0 0 1]
# True:[1 0 0 1 0 0 0 1]
# 73 + 72 = 145
# ------------
# iters:9600
# Loss:0.005170268909604959
# Pred:[0 1 1 1 0 0 1 1]
# True:[0 1 1 1 0 0 1 1]
# 51 + 64 = 115
# ------------
# iters:9700
# Loss:0.006527051641671278
# Pred:[1 0 1 1 0 0 1 1]
# True:[1 0 1 1 0 0 1 1]
# 59 + 120 = 179
# ------------
# iters:9800
# Loss:0.004266653225570331
# Pred:[1 0 1 0 0 0 0 0]
# True:[1 0 1 0 0 0 0 0]
# 73 + 87 = 160
# ------------
# iters:9900
# Loss:0.0041113067655168265
# Pred:[1 0 0 0 1 1 0 1]
# True:[1 0 0 0 1 1 0 1]
# 33 + 108 = 141
# ------------

収束しているのがわかる
ちなみにReLU関数で実行してみると下記のようになり収束しない

Section2：LSTM

RNNの課題
- 時系列を遡れば遡るほど、勾配が消失していく
  - 長い時系列の学習が困難
解決策
- 勾配消失の解決方法とは、別で、構造自体を変えて解決したものとしてLSTMがある

勾配消失問題の復習

勾配消失問題とは？
- 誤差逆伝播法が下位層に進んでいくに連れて、勾配がどんどん緩やかになっていくため、勾配降下法による更新では下位層のパラメータはほとんど変わらず訓練は最適値に収束しなくなること

活性化関数：シグモイド関数

数式

f(u) = \frac{1}{1+e^{-u}}

0 ~ 1の間を緩やかに変化する関数で、ステップ関数では0/1しかない状態に対し信号の強弱を伝えられるようになり、予想ニューラルネットワーク普及のきっかけとなった。
課題
- 大きな値では出力の変化が微小なため、勾配消失問題を引き起こす事があった

(2)0.25

シグモイド関数を微分すると下記のようになる

f'(u) = (1 - f(u))*f(u)

勾配爆発

勾配が層を逆伝播するごとに指数関数的に大きくなっていくこと

(1)
- 勾配のノルムがしきい値より大きいときは、勾配のノルムをしきい値に正規化する
- クリッピングした勾配は、勾配×(しきい値/勾配のノルム)と計算される。つまり、gradient * rateである

LSTMの全体像

CEC

CECとは誤差を内部にとどめ、勾配消失を防ぐためのもの
- 勾配消失および勾配爆発の解決方法として、勾配が1であれば解決できる

CECの課題

入力データについて、時間依存度に関係なく重みが一律
- ニューラルネットワークの学習特性が無いということ

入力ゲートと出力ゲート

入力ゲートと出力ゲートの役割

入力・出力ゲートを追加することでそれぞれのゲートへの入力値の重みを重み行列$W$,$U$で可変可能とする
- CECの課題を解決！

忘却ゲート

(3)
- 新しいセルの状態は、計算されたセルへの入力と1ステップ前のセルの状態に入力ゲート、忘却ゲートを掛けて足し合わせたものと表現される。
- つまり、input_gate* a + forget_gate* cである

LSTMブロックの課題

LSTMの現状
- CECは、過去の情報が全て保管されている
課題
- 過去の情報が要らなくなった場合、削除することはできず保管され続ける
過去の情報が要らなくなった場合、そのタイミングで情報を忘却する機能が必要があれば解決しそう！
- そのために誕生したのが忘却ゲート！

忘却ゲート
- 不要な情報を削除するため

覗き穴結合

LSTMの課題

CECの保存されている過去の情報を任意のタイミングで他のノードに伝播させたり、あるいは任意のタイミングで忘却させたい
CEC自身の値はゲート制御に影響を与えていない

覗き穴結合とは

CEC自身の値に重み行列を介して伝播可能にした構造

Section3：GRU

LSTMではパラメータ数が多く計算負荷が高くなる問題があった
- これを解決するのがGRU！
従来のLSTMではパラメータが多数存在していたため、計算負荷が大きかった。しかし、GRUではそのパラメータを大幅に削減し、精度は同等またはそれ以上が望める様になった構造
- GRUのメリットは計算負荷が低いこと

GRUの全体像

LSTMが抱える課題
- パラメータ数が多く計算負荷が高い
CECが抱える課題
- 入力データについて、時間依存度に関係なく重みが一律
  - つまり、学習機能がない

LSTMはパラメータが多く計算負荷が高い
GRUはパラメータが少なく計算負荷が低い

実装演習

import tensorflow as tf
import numpy as np
import re
import glob
import collections
import random
import pickle
import time
import datetime
import os

# logging levelを変更
tf.logging.set_verbosity(tf.logging.ERROR)

class Corpus:
    def __init__(self):
        self.unknown_word_symbol = "<???>" # 出現回数の少ない単語は未知語として定義しておく
        self.unknown_word_threshold = 3 # 未知語と定義する単語の出現回数の閾値
        self.corpus_file = "./corpus/**/*.txt"
        self.corpus_encoding = "utf-8"
        self.dictionary_filename = "./data_for_predict/word_dict.dic"
        self.chunk_size = 5
        self.load_dict()

        words = []
        for filename in glob.glob(self.corpus_file, recursive=True):
            with open(filename, "r", encoding=self.corpus_encoding) as f:

                # word breaking
                text = f.read()
                # 全ての文字を小文字に統一し、改行をスペースに変換
                text = text.lower().replace("\n", " ")
                # 特定の文字以外の文字を空文字に置換する
                text = re.sub(r"[^a-z '\-]", "", text)
                # 複数のスペースはスペース一文字に変換
                text = re.sub(r"[ ]+", " ", text)

                # 前処理： '-' で始まる単語は無視する
                words = [ word for word in text.split() if not word.startswith("-")]


        self.data_n = len(words) - self.chunk_size
        self.data = self.seq_to_matrix(words)

    def prepare_data(self):
        """
        訓練データとテストデータを準備する。
        data_n = ( text データの総単語数 ) - chunk_size
        input: (data_n, chunk_size, vocabulary_size)
        output: (data_n, vocabulary_size)
        """

        # 入力と出力の次元テンソルを準備
        all_input = np.zeros([self.chunk_size, self.vocabulary_size, self.data_n])
        all_output = np.zeros([self.vocabulary_size, self.data_n])

        # 準備したテンソルに、コーパスの one-hot 表現(self.data) のデータを埋めていく
        # i 番目から ( i + chunk_size - 1 ) 番目までの単語が１組の入力となる
        # このときの出力は ( i + chunk_size ) 番目の単語
        for i in range(self.data_n):
            all_output[:, i] = self.data[:, i + self.chunk_size] # (i + chunk_size) 番目の単語の one-hot ベクトル
            for j in range(self.chunk_size):
                all_input[j, :, i] = self.data[:, i + self.chunk_size - j - 1]

        # 後に使うデータ形式に合わせるために転置を取る
        all_input = all_input.transpose([2, 0, 1])
        all_output = all_output.transpose()

        # 訓練データ：テストデータを 4 : 1 に分割する
        training_num = ( self.data_n * 4 ) // 5
        return all_input[:training_num], all_output[:training_num], all_input[training_num:], all_output[training_num:]


    def build_dict(self):
        # コーパス全体を見て、単語の出現回数をカウントする
        counter = collections.Counter()
        for filename in glob.glob(self.corpus_file, recursive=True):
            with open(filename, "r", encoding=self.corpus_encoding) as f:

                # word breaking
                text = f.read()
                # 全ての文字を小文字に統一し、改行をスペースに変換
                text = text.lower().replace("\n", " ")
                # 特定の文字以外の文字を空文字に置換する
                text = re.sub(r"[^a-z '\-]", "", text)
                # 複数のスペースはスペース一文字に変換
                text = re.sub(r"[ ]+", " ", text)

                # 前処理： '-' で始まる単語は無視する
                words = [word for word in text.split() if not word.startswith("-")]

                counter.update(words)

        # 出現頻度の低い単語を一つの記号にまとめる
        word_id = 0
        dictionary = {}
        for word, count in counter.items():
            if count <= self.unknown_word_threshold:
                continue

            dictionary[word] = word_id
            word_id += 1
        dictionary[self.unknown_word_symbol] = word_id

        print("総単語数：", len(dictionary))

        # 辞書を pickle を使って保存しておく
        with open(self.dictionary_filename, "wb") as f:
            pickle.dump(dictionary, f)
            print("Dictionary is saved to", self.dictionary_filename)

        self.dictionary = dictionary

        print(self.dictionary)

    def load_dict(self):
        with open(self.dictionary_filename, "rb") as f:
            self.dictionary = pickle.load(f)
            self.vocabulary_size = len(self.dictionary)
            self.input_layer_size = len(self.dictionary)
            self.output_layer_size = len(self.dictionary)
            print("総単語数: ", self.input_layer_size)

    def get_word_id(self, word):
        # print(word)
        # print(self.dictionary)
        # print(self.unknown_word_symbol)
        # print(self.dictionary[self.unknown_word_symbol])
        # print(self.dictionary.get(word, self.dictionary[self.unknown_word_symbol]))
        return self.dictionary.get(word, self.dictionary[self.unknown_word_symbol])

    # 入力された単語を one-hot ベクトルにする
    def to_one_hot(self, word):
        index = self.get_word_id(word)
        data = np.zeros(self.vocabulary_size)
        data[index] = 1
        return data

    def seq_to_matrix(self, seq):
        print(seq)
        data = np.array([self.to_one_hot(word) for word in seq]) # (data_n, vocabulary_size)
        return data.transpose() # (vocabulary_size, data_n)

class Language:
    """
    input layer: self.vocabulary_size
    hidden layer: rnn_size = 30
    output layer: self.vocabulary_size
    """

    def __init__(self):
        self.corpus = Corpus()
        self.dictionary = self.corpus.dictionary
        self.vocabulary_size = len(self.dictionary) # 単語数
        self.input_layer_size = self.vocabulary_size # 入力層の数
        self.hidden_layer_size = 30 # 隠れ層の RNN ユニットの数
        self.output_layer_size = self.vocabulary_size # 出力層の数
        self.batch_size = 128 # バッチサイズ
        self.chunk_size = 5 # 展開するシーケンスの数。c_0, c_1, ..., c_(chunk_size - 1) を入力し、c_(chunk_size) 番目の単語の確率が出力される。
        self.learning_rate = 0.005 # 学習率
        self.epochs = 1000 # 学習するエポック数
        self.forget_bias = 1.0 # LSTM における忘却ゲートのバイアス
        self.model_filename = "./data_for_predict/predict_model.ckpt"
        self.unknown_word_symbol = self.corpus.unknown_word_symbol

    def inference(self, input_data, initial_state):
        """
        :param input_data: (batch_size, chunk_size, vocabulary_size) 次元のテンソル
        :param initial_state: (batch_size, hidden_layer_size) 次元の行列
        :return:
        """
        # 重みとバイアスの初期化
        hidden_w = tf.Variable(tf.truncated_normal([self.input_layer_size, self.hidden_layer_size], stddev=0.01))
        hidden_b = tf.Variable(tf.ones([self.hidden_layer_size]))
        output_w = tf.Variable(tf.truncated_normal([self.hidden_layer_size, self.output_layer_size], stddev=0.01))
        output_b = tf.Variable(tf.ones([self.output_layer_size]))

        # BasicLSTMCell, BasicRNNCell は (batch_size, hidden_layer_size) が chunk_size 数ぶんつながったリストを入力とする。
        # 現時点での入力データは (batch_size, chunk_size, input_layer_size) という３次元のテンソルなので
        # tf.transpose や tf.reshape などを駆使してテンソルのサイズを調整する。

        input_data = tf.transpose(input_data, [1, 0, 2]) # 転置。(chunk_size, batch_size, vocabulary_size)
        input_data = tf.reshape(input_data, [-1, self.input_layer_size]) # 変形。(chunk_size * batch_size, input_layer_size)
        input_data = tf.matmul(input_data, hidden_w) + hidden_b # 重みWとバイアスBを適用。 (chunk_size, batch_size, hidden_layer_size)
        input_data = tf.split(input_data, self.chunk_size, 0) # リストに分割。chunk_size * (batch_size, hidden_layer_size)

        # RNN のセルを定義する。RNN Cell の他に LSTM のセルや GRU のセルなどが利用できる。
        cell = tf.nn.rnn_cell.BasicRNNCell(self.hidden_layer_size)
        outputs, states = tf.nn.static_rnn(cell, input_data, initial_state=initial_state)
        
        # 最後に隠れ層から出力層につながる重みとバイアスを処理する
        # 最終的に softmax 関数で処理し、確率として解釈される。
        # softmax 関数はこの関数の外で定義する。
        output = tf.matmul(outputs[-1], output_w) + output_b

        return output

    def loss(self, logits, labels):
        cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))

        return cost

    def training(self, cost):
        # 今回は最適化手法として Adam を選択する。
        # ここの AdamOptimizer の部分を変えることで、Adagrad, Adadelta などの他の最適化手法を選択することができる
        optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(cost)

        return optimizer

    def train(self):
        # 変数などの用意
        input_data = tf.placeholder("float", [None, self.chunk_size, self.input_layer_size])
        actual_labels = tf.placeholder("float", [None, self.output_layer_size])
        initial_state = tf.placeholder("float", [None, self.hidden_layer_size])

        prediction = self.inference(input_data, initial_state)
        cost = self.loss(prediction, actual_labels)
        optimizer = self.training(cost)
        correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(actual_labels, 1))
        accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

        # TensorBoard で可視化するため、クロスエントロピーをサマリーに追加
        tf.summary.scalar("Cross entropy: ", cost)
        summary = tf.summary.merge_all()

        # 訓練・テストデータの用意
        # corpus = Corpus()
        trX, trY, teX, teY = self.corpus.prepare_data()
        training_num = trX.shape[0]

        # ログを保存するためのディレクトリ
        timestamp = time.time()
        dirname = datetime.datetime.fromtimestamp(timestamp).strftime("%Y%m%d%H%M%S")

        # ここから実際に学習を走らせる
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            summary_writer = tf.summary.FileWriter("./log/" + dirname, sess.graph)

            # エポックを回す
            for epoch in range(self.epochs):
                step = 0
                epoch_loss = 0
                epoch_acc = 0

                # 訓練データをバッチサイズごとに分けて学習させる (= optimizer を走らせる)
                # エポックごとの損失関数の合計値や（訓練データに対する）精度も計算しておく
                while (step + 1) * self.batch_size < training_num:
                    start_idx = step * self.batch_size
                    end_idx = (step + 1) * self.batch_size

                    batch_xs = trX[start_idx:end_idx, :, :]
                    batch_ys = trY[start_idx:end_idx, :]

                    _, c, a = sess.run([optimizer, cost, accuracy],
                                       feed_dict={input_data: batch_xs,
                                                  actual_labels: batch_ys,
                                                  initial_state: np.zeros([self.batch_size, self.hidden_layer_size])
                                                  }
                                       )
                    epoch_loss += c
                    epoch_acc += a
                    step += 1

                # コンソールに損失関数の値や精度を出力しておく
                print("Epoch", epoch, "completed ouf of", self.epochs, "-- loss:", epoch_loss, " -- accuracy:",
                      epoch_acc / step)

                # Epochが終わるごとにTensorBoard用に値を保存
                summary_str = sess.run(summary, feed_dict={input_data: trX,
                                                           actual_labels: trY,
                                                           initial_state: np.zeros(
                                                               [trX.shape[0],
                                                                self.hidden_layer_size]
                                                           )
                                                           }
                                       )
                summary_writer.add_summary(summary_str, epoch)
                summary_writer.flush()

            # 学習したモデルも保存しておく
            saver = tf.train.Saver()
            saver.save(sess, self.model_filename)

            # 最後にテストデータでの精度を計算して表示する
            a = sess.run(accuracy, feed_dict={input_data: teX, actual_labels: teY,
                                              initial_state: np.zeros([teX.shape[0], self.hidden_layer_size])})
            print("Accuracy on test:", a)


    def predict(self, seq):
        """
        文章を入力したときに次に来る単語を予測する
        :param seq: 予測したい単語の直前の文字列。chunk_size 以上の単語数が必要。
        :return:
        """

        # 最初に復元したい変数をすべて定義してしまいます
        tf.reset_default_graph()
        input_data = tf.placeholder("float", [None, self.chunk_size, self.input_layer_size])
        initial_state = tf.placeholder("float", [None, self.hidden_layer_size])
        prediction = tf.nn.softmax(self.inference(input_data, initial_state))
        predicted_labels = tf.argmax(prediction, 1)

        # 入力データの作成
        # seq を one-hot 表現に変換する。
        words = [word for word in seq.split() if not word.startswith("-")]
        x = np.zeros([1, self.chunk_size, self.input_layer_size])
        for i in range(self.chunk_size):
            word = seq[len(words) - self.chunk_size + i]
            index = self.dictionary.get(word, self.dictionary[self.unknown_word_symbol])
            x[0][i][index] = 1
        feed_dict = {
            input_data: x, # (1, chunk_size, vocabulary_size)
            initial_state: np.zeros([1, self.hidden_layer_size])
        }

        # tf.Session()を用意
        with tf.Session() as sess:
            # 保存したモデルをロードする。ロード前にすべての変数を用意しておく必要がある。
            saver = tf.train.Saver()
            saver.restore(sess, self.model_filename)

            # ロードしたモデルを使って予測結果を計算
            u, v = sess.run([prediction, predicted_labels], feed_dict=feed_dict)

            keys = list(self.dictionary.keys())


            # コンソールに文字ごとの確率を表示
            for i in range(self.vocabulary_size):
                c = self.unknown_word_symbol if i == (self.vocabulary_size - 1) else keys[i]
                print(c, ":", u[0][i])

            print("Prediction:", seq + " " + ("<???>" if v[0] == (self.vocabulary_size - 1) else keys[v[0]]))

        return u[0]


def build_dict():
    cp = Corpus()
    cp.build_dict()

if __name__ == "__main__":
    #build_dict()

    ln = Language()

    # 学習するときに呼び出す
    #ln.train()

    # 保存したモデルを使って単語の予測をする
    ln.predict("some of them looks like")

Section4：双方向RNN

過去の情報だけでなく、未来の情報を加味することで精度を向上させるためのモデル
実用例
- 文章の推敲
- 機械翻訳
- etc...
以下は双方向RNNの順伝播を行うプログラムである。順方向については、入力から中間層への重みW_f, 一ステップ前の中間層出力から中間層への重みをU_f、逆方向に関しては同様にパラメータW_b, U_bを持ち、両者の中間層表現を合わせた特徴から出力層への重みはVである。_rnn関数はRNNの順伝播を表し中間層の系列を返す関数であるとする。（か）にあてはまるのはどれか

(1)h_f + h_b[::-1]
(2)h_f * h_b[::-1]
(3)np.concatenate([h_f, h_b[::-1]], axis=0)
(4)np.concatenate([h_f, h_b[::-1]], axis=1)
答え：(4)
- 双方向RNNでは、順方向と逆方向に伝播したときの中間層表現をあわせたものが特徴量となる

Section5：Seq2Seq

Seq2Seq全体像

Seq2SeqはEncoder-Decoderモデルの一種
- 機械対話や、機械翻訳などに使用されている

Encoder RNN

ユーザーがインプットしたテキストデータを、単語等のトークンに区切って渡す構造

Taking
- 文章を単語等のトークン毎に分割し、トークンごとのIDに分割する
Embedding
- IDから、そのトークンを表す分散表現ベクトルに変換する
Encoder RNN
- ベクトルを順番にRNNに入力していく

処理手順

vector1をRNNに入力し、hidden stateを出力。
1で出力したhidden stateと次の入力vector2をRNNに入力し、hidden stateを出力
1と2を繰り返す
最後のvectorを入れたときのhidden stateをfinal stateとしてとっておく
final stateがthought vectorと呼ばれ、入力した文の意味を表すベクトルとなる

Decoder RNN

システムがアウトプットデータを、単語等のトークンごとに生成する構造

処理手順

Decoder RNN
- Encoder RNN のfinal stateから、各token の生成確率を出力していき、final state をDecoder RNN のinitial state ととして設定し、Embedding を入力する
Sampling
- 生成確率にもとづいてtoken をランダムに選ぶ
Embedding
- 2で選ばれたtoken をEmbedding してDecoder RNN への次の入力とする
Detokenize
- 1 -3 を繰り返し、2で得られたtoken を文字列に直す

(2)
- (1)は双方向RNNについて
- (3)は構文木について
- (4)はLSTMについて

HRED

Seq2Seqの課題

一問一答しかできない
- 問に対して文脈も何もなくただ応答が行われる続ける

HREDとは

Seq2Seq+ Context RNN
過去n-1 個の発話から次の発話を生成
- Seq2seqでは、会話の文脈無視で応答がなされていた
- しかし！HREDでは、前の単語の流れに即して応答されるためより人間らしい文章が生成される
Context RNN
- Encoder のまとめた各文章の系列をまとめて、これまでの会話コンテキスト全体を表すベクトルに変換する構造
- 過去の発話の履歴を加味した返答が可能

HREDの課題

確率的な多様性が字面にしかなく会話の「流れ」のような多様性が無い
- 同じコンテキスト（発話リスト）を与えられても、答えの内容が毎回会話の流れとしては同じものしか出せない
短く情報量に乏しい答えをしがち
- 短いよくある答えを学ぶ傾向がある

VHRED

VHREDとは

VAEの潜在変数の概念を追加したもの
HREDの課題をVAEの潜在変数の概念を追加することで解決した構造

Seq2qSeq
- 問に対して文脈も何もなくただ応答を続ける
HRED
- 前の単語の流れに即して応答されるためより人間らしい文章が生成可能
VHRED
- HREDの課題をVAEの潜在変数の概念を追加することで解決した構造

VAE

オートエンコーダー

教師なし学習の一つ
具体例
- MNISTの場合、28x28の数字の画像を入れて同じ画像を出力するニューラルネットワーク
オートエンコーダーの構造
- 入力データから潜在変数zに変換するニューラルネットワークをEncoder逆に潜在変数zをインプットとして元画像を復元するニューラルネットワークをDecoder
メリットとして次元削減が行える

VAE

通常のオートエンコーダーの場合
- 何かしら潜在変数zにデータを押し込めているもののその構造がどのような状態かわからない
VAEはこの潜在変数zに確率分布z∼N(0,1)を仮定したもの
VAEはデータを潜在変数zの確率分布という構造に押し込めることを可能にする

確率分布z∼N(0,1)

(1)
- 単語wはone-hotベクトルであり、それを単語埋め込みにより別の特徴量に変換する。
- これは埋め込み行列Eを用いて、E.dot(w)と書ける。

Section6：Word2vec

RNNでは、単語のような可変長の文字列をNNに与えることはできないという課題があった
この課題を解決するには固定長形式で単語を表す必要がある
学習データからボキャブラリを作成
- 例： I want to eat apples. I like apples.
- →{apples,eat,I,like,to,want}
- Applesを入力する場合は、入力層には以下のベクトルが入力される(本来は、辞書の単語数だけone-hotベクトルができあがる)

Word2vecのメリット

大規模データの分散表現の学習が、現実的な計算速度とメモリ量で実現可能にした

Section7：Attention Mechanism

Seq2Seqの課題

長い文章への対応が難しい
2単語でも、100単語でも、固定次元ベクトルの中に入力しなければならない

解決策

文章が長くなるほどそのシーケンスの内部表現の次元も大きくなっていく、仕組みが必要になる
- それがAttention Mechanism
  - 「入力と出力のどの単語が関連しているのか」の関連度を学習する仕組み

具体例

RNN
- 時系列データを処理するのに適したニューラルネットワーク
word2vec
- 単語の分散表現ベクトルを得る手法
Seq2Seq
- 1つの時系列データから別の時系列データを得るネットワーク
Attention
- 時系列データの中身のそれぞれの関連性に重みをつける手法

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up