More than 5 years have passed since last update.

【ラビットチャレンジ】深層学習（後編・day4)

深層学習

Posted at 2019-07-23

Section1：Tensorflowの実装演習

Numpyからフルスクラッチで実装することは少なく，Tensorflow，Keras，PytorchなどのWrapperのライブラリを使って実装を行う．

TensorFlow

googleが作った最も使われているディープラーニングのフレームワーク．

constant

定数を定義．
sessionを立ち上げてrunしないとTensorは動作しない．

constant

import tensorflow as tf
import numpy as np

# それぞれ定数を定義
a = tf.constant(np.arange(4), dtype=tf.float32, shape=[2,2])

print('a:', a)    # Tensorがprintされるだけ．デバッグのときなどに注意．

sess = tf.Session()    # Sessionを立ち上げる

print('a:', sess.run(a))    # runすることでTensorが動作する．

placeholder

後から値を入れることができる箱のようなもの．
学習の際に例えばxをバッチ毎に代入するときとかに使われる．

placeholder

import tensorflow as tf
import numpy as np

# プレースホルダーを定義（箱を用意している．後から値を入れることができる．）
x = tf.placeholder(dtype=tf.float32, shape=[None,3])

print('x:', x)   # runされていないので，Tensorがprintされるだけ．

sess = tf.Session()

X = np.random.rand(2,3)
print('X:', X)

# プレースホルダにX[0]を入力
# shapeを(3,)から(1,3)にするためreshape
print('x:', sess.run(x, feed_dict={x:X[0].reshape(1,-1)}))
# プレースホルダにX[1]を入力
print('x:', sess.run(x, feed_dict={x:X[1].reshape(1,-1)}))

variables

オペレータを定義して，変数xの値を更新していく．

valiables

# 定数を定義
a = tf.constant(10)
print('a:', a)
# 変数を定義
x = tf.Variable(1)
print('x:', x)

calc_op = x * a    # 更新式の定義

# xの値を更新
update_x = tf.assign(x, calc_op)

sess = tf.Session()

# 変数の初期化
init = tf.global_variables_initializer()
sess.run(init)

print(sess.run(x))

sess.run(update_x)    # update_xを呼ぶことで更新する．
print(sess.run(x))

sess.run(update_x)
print(sess.run(x))

線形回帰

[try] noiseの値を変更しよう

noiseを大きくするとデータがまばらになって線形性が失われるため，予測がうまくいかない．

linear

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline    ## jupyter notebookで最初の実行でグラフを表示するためのおまじない

iters_num = 300    ## 何回学習を行うか
plot_interval = 10  ## 何回の学習毎に誤差を表示するか

# データを生成
n = 100
x = np.random.rand(n)
d = 3 * x + 2

# ノイズを加える
noise = 0.3
d = d + noise * np.random.randn(n) 

# 入力値
## placeholder
xt = tf.placeholder(tf.float32)
dt = tf.placeholder(tf.float32)

# 最適化の対象の変数を初期化
## Variable
W = tf.Variable(tf.zeros([1]))
b = tf.Variable(tf.zeros([1]))

y = W * xt + b

# 誤差関数 平均2乗誤差
## reduce_mean:与えたリストに入っている数値の平均値を求める関数
loss = tf.reduce_mean(tf.square(y - dt))    ## 平均２乗誤差をlossにつっこむ．
optimizer = tf.train.GradientDescentOptimizer(0.1)    ## GradientDescentOptimizer：最急降下法．引数：学習率0.1
train = optimizer.minimize(loss)    ## trainの際に定義したoptimizerを使ってlossを最小化する．

###### ここまでが準備．

# 初期化
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# 作成したデータをトレーニングデータとして準備
x_train = x.reshape(-1,1)
d_train = d.reshape(-1,1)

# トレーニング
for i in range(iters_num):
    sess.run(train, feed_dict={xt:x_train,dt:d_train})    ## sess.runでtrainを呼び出す．
    if (i+1) % plot_interval == 0:
        loss_val = sess.run(loss, feed_dict={xt:x_train,dt:d_train})    ## placeholder xt, dtに先程用意したtrainデータをfeed_dictで入れていく．
        W_val = sess.run(W)
        b_val = sess.run(b)
        print('Generation: ' + str(i+1) + '. 誤差 = ' + str(loss_val))

print(W_val)
print(b_val)

#  予測関数
def predict(x):
    return W_val * x + b_val

fig = plt.figure()
subplot = fig.add_subplot(1, 1, 1)
plt.scatter(x, d)
linex = np.linspace(0, 1, 2)
liney = predict(linex)
subplot.plot(linex,liney)
plt.show()

非線形回帰

[try]下式で生成したデータの回帰．

$$ y=30x^2+0.5x+0.2 $$

AdamOptimizerの学習率を３次関数のときの0.001から0.01にすると誤差が収束した．

nonlinear

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline

iters_num = 10000
plot_interval = 100

# データを生成
n=100
x = np.random.rand(n).astype(np.float32) * 4 - 2    ##
d =  30 * x ** 2 + 0.5 * x + 0.2

#  ノイズを加える
noise = 0.05
d = d + noise * np.random.randn(n) 

# モデル
## bを使っていないことに注意.
xt = tf.placeholder(tf.float32, [None, 3])    ##  x^2,x^1,x^0の3つの値を持つので，3つのplaceholderを定義する．
dt = tf.placeholder(tf.float32, [None, 1])
## Weightが3つある．stddev=0.01：標準偏差0.01のランダムな初期値として定義している．
W = tf.Variable(tf.random_normal([3, 1], stddev=0.01))    
y = tf.matmul(xt,W)

# 誤差関数 平均２乗誤差
loss = tf.reduce_mean(tf.square(y - dt))
## AdamOptimizerの学習率を３次関数のときの0.001から0.01にすると誤差が収束した．
optimizer = tf.train.AdamOptimizer(0.01)
train = optimizer.minimize(loss)

# 初期化
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# 作成したデータをトレーニングデータとして準備
d_train = d.reshape(-1,1)
x_train = np.zeros([n, 3])
##  x^2,x^1,x^0をx_trainに代入していく．
for i in range(n):
    for j in range(3):
        x_train[i, j] = x[i]**j

#  トレーニング
for i in range(iters_num):
    if (i+1) % plot_interval == 0:
        loss_val = sess.run(loss, feed_dict={xt:x_train, dt:d_train}) 
        W_val = sess.run(W)
        print('Generation: ' + str(i+1) + '. 誤差 = ' + str(loss_val))
    sess.run(train, feed_dict={xt:x_train,dt:d_train})

print(W_val[::-1])
    
# 予測関数
def predict(x):
    result = 0.
    for i in range(0,3):
        result += W_val[i,0] * x ** i
    return result

fig = plt.figure()
subplot = fig.add_subplot(1,1,1)
plt.scatter(x ,d)
linex = np.linspace(-2,2,100)
liney = predict(linex)
subplot.plot(linex,liney)
plt.show()

分類1層 (mnist)

[try]x：入力値, d：教師データ, W：重み, b：バイアスをそれぞれ定義しよう

nmist(single_layer)

import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

iters_num = 100
batch_size = 100
plot_interval = 1

# -------------- ここを補填 ------------------------
x = tf.placeholder(tf.float32, [None, 784])    ## 28x28のデータセット
d = tf.placeholder(tf.float32, [None, 10])    ## 出力は，0-9の10クラスの分類．
W = tf.Variable(tf.random_normal([784, 10], stddev=0.01))    ## 784の数値から10個の値を取り出すので，784x10のシェイプを持つ
b = tf.Variable(tf.zeros([10]))    ## 出力と同じ次元
# ------------------------------------------------------

y = tf.nn.softmax(tf.matmul(x, W) + b)    ##

# 交差エントロピー
cross_entropy = -tf.reduce_sum(d * tf.log(y), reduction_indices=[1])    ## クロスエントロピーの定義
loss = tf.reduce_mean(cross_entropy)    ## クロスエントロピーの誤差平均
train = tf.train.GradientDescentOptimizer(0.1).minimize(loss)    ## GradientDescentOptimizer：最急降下法．引数：学習率0.1 
# ---------------以下の２式を1つにしたもの---------------------
# optimizer = tf.train.GradientDescentOptimizer(0.1)    ## GradientDescentOptimizer：最急降下法．引数：学習率0.1
# train = optimizer.minimize(loss)    ## trainの際に定義したoptimizerを使ってlossを最小化する．
# ------------------------------------------------------------------

# 正誤を保存
correct = tf.equal(tf.argmax(y, 1), tf.argmax(d, 1))
# 正解率
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

accuracies = []
for i in range(iters_num):
    x_batch, d_batch = mnist.train.next_batch(batch_size)    ## 訓練セットから 100 のランダムなデータの “バッチ” を取得する．
    sess.run(train, feed_dict={x: x_batch, d: d_batch})    ## feed_dictで渡す
    if (i+1) % plot_interval == 0:
        print(sess.run(correct, feed_dict={x: mnist.test.images, d: mnist.test.labels}))
        accuracy_val = sess.run(accuracy, feed_dict={x: mnist.test.images, d: mnist.test.labels})
        accuracies.append(accuracy_val)
        print('Generation: ' + str(i+1) + '. 正解率 = ' + str(accuracy_val))
        
        
lists = range(0, iters_num, plot_interval)
plt.plot(lists, accuracies)
plt.title("accuracy")
plt.ylim(0, 1.0)
plt.show()

show_data

print(x_batch[0])
print(d_batch[0])
plt.imshow(x_batch[0].reshape(28,28))

分類3層 (mnist)

[try] 隠れ層のサイズを変更してみよう

サイズが増えるとパラメータが増えるので，計算が遅くなる．精度と速度のトレードオフを考えて設計する．

[try] optimizerを変更しよう

学習回数に対して精度が頭打ちになったところで精度を比較するのが良い．

nmist(3_layers)

import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

iters_num = 3000
batch_size = 100
plot_interval = 100

hidden_layer_size_1 = 600
hidden_layer_size_2 = 300

dropout_rate = 0.5

x = tf.placeholder(tf.float32, [None, 784])
d = tf.placeholder(tf.float32, [None, 10])
W1 = tf.Variable(tf.random_normal([784, hidden_layer_size_1], stddev=0.01))    ## 隠れ層1
W2 = tf.Variable(tf.random_normal([hidden_layer_size_1, hidden_layer_size_2], stddev=0.01))    ## 隠れ層2
W3 = tf.Variable(tf.random_normal([hidden_layer_size_2, 10], stddev=0.01))    ## 出力

b1 = tf.Variable(tf.zeros([hidden_layer_size_1]))    ## 隠れ層1
b2 = tf.Variable(tf.zeros([hidden_layer_size_2]))    ## 隠れ層2
b3 = tf.Variable(tf.zeros([10]))    ## 出力層

z1 = tf.sigmoid(tf.matmul(x, W1) + b1)    ## sigmoid
z2 = tf.sigmoid(tf.matmul(z1, W2) + b2)    ## sigmoid

keep_prob = tf.placeholder(tf.float32)
drop = tf.nn.dropout(z2, keep_prob)    ## z2の後にdropoutを通す

y = tf.nn.softmax(tf.matmul(drop, W3) + b3)
loss = tf.reduce_mean(-tf.reduce_sum(d * tf.log(y), reduction_indices=[1]))

# optimizer = tf.train.GradientDescentOptimizer(0.5)
# optimizer = tf.train.MomentumOptimizer(0.1, 0.9)    ## learning_rate=0.1, momentum=0.9
# optimizer = tf.train.AdagradOptimizer(0.1)
# optimizer = tf.train.RMSPropOptimizer(0.001)
optimizer = tf.train.AdamOptimizer(1e-4)

train = optimizer.minimize(loss)
correct = tf.equal(tf.argmax(y, 1), tf.argmax(d, 1))
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

accuracies = []
for i in range(iters_num):
    x_batch, d_batch = mnist.train.next_batch(batch_size)
    sess.run(train, feed_dict={x:x_batch, d:d_batch, keep_prob:(1 - dropout_rate)})    ## feed_dictでx, d, dropout_rateを渡す．
    if (i+1) % plot_interval == 0:
        accuracy_val = sess.run(accuracy, feed_dict={x:mnist.test.images, d:mnist.test.labels, keep_prob:1.0})    ## accuracyを出すときはdropoutを行わない．dropout = 1がdropoutの無い状態．
        accuracies.append(accuracy_val)
        print('Generation: ' + str(i+1) + '. 正解率 = ' + str(accuracy_val))        
    
lists = range(0, iters_num, plot_interval)
plt.plot(lists, accuracies)
plt.title("accuracy")
plt.ylim(0, 1.0)
plt.show()

分類CNN (mnist)

[try] ドロップアウト率を0に変更しよう

テストデータに対する精度が上がるわけでもなく，汎化性能は落ちる．

nmist(CNN)

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
import matplotlib.pyplot as plt

iters_num = 300
batch_size = 100
plot_interval = 10

dropout_rate = 0.5

# placeholder
x = tf.placeholder(tf.float32, shape=[None, 784])
d = tf.placeholder(tf.float32, shape=[None, 10])

# 画像を784の一次元から28x28の二次元に変換する
# 画像を28x28にreshape
x_image = tf.reshape(x, [-1,28,28,1])

# 第一層のweightsとbiasのvariable
W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))    ## 5x5がフィルタサイズ．1チャンネルのものを32チャンネルに拡張する．
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32]))

# 第一層のconvolutionalとpool
# strides[0] = strides[3] = 1固定
## x_imageとW_conv1を使ってconvolutionを行う．strideは1, paddingがSAMEというのは，shapeが変わらないということ．
## バイアスを足して，reluを通す．
## conv2dにフィルタを渡してあげることでCNNの実装が簡単に行えるようになっている．
h_conv1 = tf.nn.relu(tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)
# プーリングサイズ n*n にしたい場合 ksize=[1, n, n, 1]
## max_poolにフィルタを渡してあげることでmaxpoolingの実装が簡単に行えるようになっている．
h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# 第二層
W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1))
b_conv2 = tf.Variable(tf.constant(0.1, shape=[64]))
h_conv2 = tf.nn.relu(tf.nn.conv2d(h_pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)
h_pool2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# 第一層と第二層でreduceされてできた特徴に対してrelu
W_fc1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1024], stddev=0.1))
b_fc1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# Dropout
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# 出来上がったものに対してSoftmax
W_fc2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1))
b_fc2 = tf.Variable(tf.constant(0.1, shape=[10]))
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

# 交差エントロピー
loss = -tf.reduce_sum(d * tf.log(y_conv))

train = tf.train.AdamOptimizer(1e-4).minimize(loss)

correct = tf.equal(tf.argmax(y_conv,1), tf.argmax(d,1))
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)


accuracies = []
for i in range(iters_num):
    x_batch, d_batch = mnist.train.next_batch(batch_size)
    sess.run(train, feed_dict={x: x_batch, d: d_batch, keep_prob: 1-dropout_rate})
    if (i+1) % plot_interval == 0:
        accuracy_val = sess.run(accuracy, feed_dict={x:x_batch, d: d_batch, keep_prob: 1.0})    ## 評価にtrainを用いている．
        accuracies.append(accuracy_val)
        print('Generation: ' + str(i+1) + '. 正解率 = ' + str(accuracy_val))        
    
    
lists = range(0, iters_num, plot_interval)
plt.plot(lists, accuracies)
plt.title("accuracy")
plt.ylim(0, 1.0)
plt.show()

確認テスト

VGG・GoogLeNet・ResNetの特徴をそれぞれ簡潔に述べよ

VGG： 2014年 Convolution，Convolution，max_poolという単純なネットワークの積み重ねでできており，シンプルなネットワーク構成．パラメータ数はGoogLeNet・ResNetに比べて多い．
GoogLeNet： inception moduleを用いる．$1 \times 1$の畳み込みを使った次元削減や様々なフィルターサイズを使うことによるスパースな演算が特徴．
ResNet：スキップコネクションアイデンティモジュールを使うことによって残差接続を行い，それによりより深い層での学習が可能となっている．

Keras

線形回帰

linear

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

iters_num = 1000
plot_interval = 10

x = np.linspace(-1, 1, 200)
np.random.shuffle(x)
d = 0.5 * x + 2 + np.random.normal(0, 0.05, (200,))    ## 正規分布のノイズを加える．

from keras.models import Sequential
from keras.layers import Dense    ## 全結合層

# モデルを作成
model = Sequential()
model.add(Dense(input_dim=1, output_dim=1))

# モデルを表示
model.summary()

# モデルのコンパイル
model.compile(loss='mse', optimizer='sgd')    ## lossとoptimizerを設定できる

# train
for i in range(iters_num):
    loss = model.train_on_batch(x, d)
    if (i+1) % plot_interval == 0:
        print('Generation: ' + str(i+1) + '. 誤差 = ' + str(loss))

W, b = model.layers[0].get_weights()
print('W:', W)
print('b:', b)

y = model.predict(x)
plt.scatter(x, d)
plt.plot(x, y)
plt.show()

単純パーセプトロン

or回路

[try] np.random.seed(0)をnp.random.seed(1)に変更

初期値が変わると精度が変わってくるので，少しでも精度を上げたい場合には検討する必要がある．

[try] エポック数を100に変更

精度が上がる(ロスが下がっていく)．汎化性能が重要になるので，検証データの精度が下がらなくなるまで学習回数を増やす．（それ以上学習しない）

[try] AND回路, XOR回路に変更

ANDは精度良く学習が終了する．
XORは学習がうまくいかない．XORは線形分離不可能なので，１層のパーセプトロンでは表現できない．
- Reluを通してあげればうまくいく．
- ニューロンを増やしてあげても良い．

[try] XOR回路にしてバッチサイズを10に変更

10に変更することで1回の学習について4個まるまる学習する．
１だと，１回の学習で１個しか学習しないので，４回繰り返す必要がある．
一般的にはバッチサイズは2の倍数で設定する．GPUで扱いやすくなるという事情．

[try] エポック数を300に変更しよう

Perceptron

# モジュール読み込み
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD
 
# 乱数を固定値で初期化
np.random.seed(0)

# シグモイドの単純パーセプトロン作成
model = Sequential()
model.add(Dense(input_dim=2, units=1))
model.add(Activation('sigmoid'))
model.summary()

## モデルのコンパイル
model.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.1))    ## lossとoptimizerを設定できる
 
# トレーニング用入力 X と正解データ T
# X = np.array( [[0,0], [0,1], [1,0], [1,1]] )
# T = np.array( [[0], [1], [1], [1]] )

## AND回路
# X = np.array( [[0,0], [0,1], [1,0], [1,1]] )
# T = np.array( [[0], [0], [0], [1]] )

## NOR回路
X = np.array( [[0,0], [0,1], [1,0], [1,1]] )
T = np.array( [[0], [1], [1], [0]] )

# トレーニング
model.fit(X, T, epochs=100, batch_size=1)    ## epochsで何回学習を繰り返すのかを設定できる．
 
# トレーニングの入力を流用して実際に分類
Y = model.predict_classes(X, batch_size=1)

print("TEST")
print(Y == T)

分類 (iris)

Irisの分類

がく片の長さ，幅，花びらの大きさ，長さ，という4次元のxが与えられて，それを３種類の花に分類する．
SGD 学習率を変更するためには，SGDをimportしてSGD(Ir=0.1)を渡す．

[try] 中間層の活性関数をsigmoidに変更しよう

sigmoidもReLUも同等に精度が高かった．
層が深くなると，勾配消失を起こしてしまうので注意．

[try] SGDをimportしoptimizerをSGD(lr=0.1)に変更しよう

defaultはir=0.01．SGDの学習率を変更するためには，SGDをimportしてSGD(Ir=0.1)を渡す．
- 早い段階で精度が上がる

iris

import matplotlib.pyplot as plt
from sklearn import datasets
iris = datasets.load_iris()
x = iris.data
d = iris.target

from sklearn.model_selection import train_test_split
x_train, x_test, d_train, d_test = train_test_split(x, d, test_size=0.2)    ## 訓練データとテストデータをtest_sizeの割合で分ける．

from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD

# モデルの設定(2層)
model = Sequential()
model.add(Dense(12, input_dim=4))    ## inputの次元が4でサイズ12の中間層に渡す．
model.add(Activation('relu'))    ## reluを通す．
# model.add(Activation('sigmoid'))    ## try．活性化関数の変更．
model.add(Dense(3, input_dim=12))    ## 次のマトリックスサイズが3．
model.add(Activation('softmax'))    ## 最後にsoftmaxを通す．
model.summary()

## targetが0, 1, 2と書かれているので，sparse_categorical_crossentropyを使っている．
### one_hotの形で書かれていたらcategorical_entropyとかを使う．
## metrics=accuracyは，学習の途中でaccuracyを表示するという意味．
# model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.compile(optimizer=SGD(lr=0.1), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

## x_trainとd_trainを使って学習を行う．
## validation_dataとして学習には使わないけど精度を見るものとしてxテストとdテストを渡している．
history = model.fit(x_train, d_train, batch_size=5, epochs=20, verbose=1, validation_data=(x_test, d_test))
loss = model.evaluate(x_test, d_test, verbose=0)

# Accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.ylim(0, 1.0)
plt.show()

分類 (mnist)

784->512->512->10

[try] load_mnistのone_hot_labelをFalseに変更しよう (errorになる)

[try] 誤差関数をsparse_categorical_crossentropyに変更しよう

ラベルがone_hotの場合は，categorical_crossentropyを使う，
ラベルがone_hotの場合は，sparse_categorical_entropyを使う．

[try] Adamの引数の値を変更しよう

lrを上げすぎると，精度が下がる．

mnist

# 必要なライブラリのインポート
import sys, os
sys.path.append(os.pardir)  # 親ディレクトリのファイルをインポートするための設定
import keras
import matplotlib.pyplot as plt
from data.mnist import load_mnist

(x_train, d_train), (x_test, d_test) = load_mnist(normalize=True, one_hot_label=True)

# 必要なライブラリのインポート、最適化手法はAdamを使う
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam

# モデル作成
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
model.summary()

# バッチサイズ、エポック数
batch_size = 128
epochs = 20

model.compile(loss='categorical_crossentropy', 
              optimizer=Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False), 
              metrics=['accuracy'])

history = model.fit(x_train, d_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, d_test))
loss = model.evaluate(x_test, d_test, verbose=0)
print('Test loss:', loss[0])
print('Test accuracy:', loss[1])
# Accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
# plt.ylim(0, 1.0)
plt.show()

CNN分類 (mnist)

実行に時間がかかるためデータ数を減らして実行
Kerasでは，途中のShapeは自動で計算してくれる．
vgg16はサイズを48以上にして，３チャンネルにする必要がある．
- imgをresizeするスクリプトを作る．repeatで複製して3チャンネルにする．

CNN

# 必要なライブラリのインポート
import sys, os
sys.path.append(os.pardir)  # 親ディレクトリのファイルをインポートするための設定
import keras
import matplotlib.pyplot as plt
from data.mnist import load_mnist

(x_train, d_train), (x_test, d_test) = load_mnist(normalize=True, one_hot_label=True)

# ------------ データセットを小さくして試しに動かす --------------------
x_train = x_train[:1000]
x_test = x_test[:1000]
d_train = d_train[:1000]
d_test = d_test[:1000]
# ------------ データセットを小さくして試しに動かす --------------------

# 行列として入力するための加工
batch_size = 128
num_classes = 10
# epochs = 20
epochs = 3   # 試行回数を少なくして試しに動かす

## 28x28x1にreshapeしておく．
img_rows, img_cols = 28, 28

x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)    


# 必要なライブラリのインポート、最適化手法はAdamを使う
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D    ## これらを用いて，convolutionが簡単に実装できる．
from keras.optimizers import Adam

model = Sequential()
## 32:出力のチャンネル数kernel_size：Convolutionのフィルターサイズ
model.add(Conv2D(32, kernel_size=(3, 3),    
                 activation='relu',
                 padding='same',    ## 追加．2８ｘ２８のサイズが維持されるpaddingとなる．
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))    ## 64チャンネル，padding='same'を追加．
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())    ## 1次元に直す．
model.add(Dense(128, activation='relu'))    ## 全結合相で128次元にする
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax')) ## 全結合10(class数．)
model.summary()

# バッチサイズ、エポック数
batch_size = 128
epochs = 20

model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])
history = model.fit(x_train, d_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, d_test))

# Accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
# plt.ylim(0, 1.0)
plt.show()

cifar10

データセット cifar10
- 32x32ピクセルのカラー画像データ
- 10種のラベル「飛行機、自動車、鳥、猫、鹿、犬、蛙、馬、船、トラック」
- トレーニングデータ数:50000, テストデータ数:10000
  
  http://www.cs.toronto.edu/~kriz/cifar.html

cifar10

# CIFAR-10のデータセットのインポート
from keras.datasets import cifar10
(x_train, d_train), (x_test, d_test) = cifar10.load_data()

# CIFAR-10の正規化
from keras.utils import to_categorical
  
# 特徴量の正規化
## ニューラルネットワークでは0~1に正規化する必要がある．
x_train = x_train/255.
x_test = x_test/255.
 
# クラスラベルの1-hotベクトル化
d_train = to_categorical(d_train, 10)
d_test = to_categorical(d_test, 10)
 
# CNNの構築
import keras
from keras.models import Sequential
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.normalization import BatchNormalization    ## 追加
import numpy as np
 
model = Sequential()
 
model.add(Conv2D(32, (3, 3), padding='same',input_shape=x_train.shape[1:]))
model.add(BatchNormalization());    ## 追加
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(BatchNormalization());    ## 追加
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
 
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(BatchNormalization());    ## 追加
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(BatchNormalization());    ## 追加
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
 
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))
 
# コンパイル
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
 
# 訓練
history = model.fit(x_train, d_train, epochs=20, validation_data=(x_test, d_test))
 
# モデルの保存
## 学習の結果を保存しておいて，あとでモデルを呼び出せる．
model.save('./CIFAR-10.h5')
# model.load_weights('./CIFAR-10.h5')
 
# 評価 & 評価結果出力
print(model.evaluate(x_test, d_test))

RNN

2進数足し算の予測
- 下位から足していって，繰り上がりがあるため，RNNとして扱う．
Keras RNNのドキュメント
https://keras.io/ja/layers/recurrent/#simplernn

[try] RNNの出力ノード数を128に変更

精度が上がる

[try] RNNの出力ノード数を128に変更

精度が上がる

[try] RNNの出力活性化関数を sigmoid に変更

reluと比べて精度が落ちる

[try] RNNの出力活性化関数を tanh に変更

reluと同等の精度

[try] 最適化方法をadamに変更

SGD(lr=0.1)からの変更により精度が上がる．

[try] RNNの入力 Dropout を0.5に設定

dropoutは汎化性能は上がるが，学習の収束スピードは落ちる．

[try] RNNの再帰 Dropout を0.3に設定

同上．

[try] RNNのunrollをTrueに設定

ネットワークを展開するとメモリ集中傾向になるがスピードアップができる．

[try] LSTM,GRUも実装可能．

系列が長い場合は，LSTMやGRUを用いる． RNNの出力活性化関数を sigmoid に変更
reluと比べて精度が落ちる

[try] RNNの出力活性化関数を tanh に変更

reluと同等の精度

[try] 最適化方法をadamに変更

SGD(lr=0.1)からの変更により精度が上がる．

[try] RNNの入力 Dropout を0.5に設定

dropoutは汎化性能は上がるが，学習の収束スピードは落ちる．

[try] RNNの再帰 Dropout を0.3に設定

同上．

[try] RNNのunrollをTrueに設定

ネットワークを展開するとメモリ集中傾向になるがスピードアップができる．

LSTM,GRUも実装可能．

系列が長い場合は，LSTMやGRUを用いる．

RNN

import sys, os
sys.path.append(os.pardir)  # 親ディレクトリのファイルをインポートするための設定
import numpy as np
import matplotlib.pyplot as plt

import keras
from keras.models import Sequential
from keras.layers.core import Dense, Dropout,Activation
from keras.layers.wrappers import TimeDistributed
from keras.optimizers import SGD
from keras.layers.recurrent import SimpleRNN, LSTM, GRU


# データを用意
# 2進数の桁数
binary_dim = 8
# 最大値 + 1
largest_number = pow(2, binary_dim)

# largest_numberまで2進数を用意
binary = np.unpackbits(np.array([range(largest_number)], dtype=np.uint8).T,axis=1)[:, ::-1]


# A, B初期化 (a + b = d)
a_int = np.random.randint(largest_number/2, size=20000)
a_bin = binary[a_int] # binary encoding
b_int = np.random.randint(largest_number/2, size=20000)
b_bin = binary[b_int] # binary encoding

x_int = []
x_bin = []
for i in range(10000):
    x_int.append(np.array([a_int[i], b_int[i]]).T)
    x_bin.append(np.array([a_bin[i], b_bin[i]]).T)

x_int_test = []
x_bin_test = []
for i in range(10001, 20000):
    x_int_test.append(np.array([a_int[i], b_int[i]]).T)
    x_bin_test.append(np.array([a_bin[i], b_bin[i]]).T)

x_int = np.array(x_int)
x_bin = np.array(x_bin)
x_int_test = np.array(x_int_test)
x_bin_test = np.array(x_bin_test)


# 正解データ
d_int = a_int + b_int
d_bin = binary[d_int][0:10000]
d_bin_test = binary[d_int][10001:20000]

model = Sequential()

model.add(SimpleRNN(units=16,    ## 出力 units = 16->128
               return_sequences=True,
               input_shape=[8, 2],    ## 時系列が8, aとbの2つのデータ
               go_backwards=False,
               activation='relu',
#                activation='sigmoid',
#                activation='tanh',
#                dropout=0.5,
#                recurrent_dropout=0.3,
               unroll = True,
            ))
# 出力層
model.add(Dense(1, activation='sigmoid', input_shape=(-1,2)))    ## Denseで1次元にしてsigmoidで出す．
model.summary()
# model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.1), metrics=['accuracy'])
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
# model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

history = model.fit(x_bin, d_bin.reshape(-1, 8, 1), epochs=5, batch_size=2)

# テスト結果出力
score = model.evaluate(x_bin_test, d_bin_test.reshape(-1,8,1), verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Section2：強化学習

長期的に報酬を最大化できるように環境の中で行動を選択できるエージェントを作ることを目標とする機械学習の一分野
- 行動の結果として与えられる利益(報酬)をもとに，行動を決定する原理を改善していく仕組み．

確認テスト

強化学習の応用分野 → ボードゲーム，テレビゲーム
環境：盤面，エージェント：プレイヤー，行動：ゲームの操作，報酬：勝敗

探索と利用のトレードオフ

不完全な知識を持とに行動しながら，データを収集し，最適な行動を見つけていく．

過去のデータで，ベストとされる行動のみを常に取り続けていると，他のよりベストな行動を見つけることができない = 探索が足りない状態．
未知の行動のみを常に取り続けていると，過去の経験が活かせない = 利用が足りない状態

強化学習の差分

教師なし・あり学習ではデータに含まれるパターンを見つけ出すことや，そのデータから予測することが目標．
強化学習では、優れた方策を見つけることが目標

行動価値関数

価値を表す関数として，状態価値関数と行動価値関数がある．
- 状態価値関数：ある状態の価値に注目する．
- 行動価値関数：状態と価値を組み合わせた価値に注目する．

方策関数

方策ベースの強化学習手法において，ある状態でどのような行動を取るかの確率を与える関数．

方策勾配法

方策反復法
- 方策をモデル化して最適化する手法．$$ \theta^{(t+1)}=theta^{(t)}+\epsilon \nabla J (\theta)$$
- J（方策の良さ）を定義する必要がある．
Jの定義方法
- 平均報酬：行動をとったときに生まれる価値全部の平均をとったもの．
- 割引報酬和：その報酬の加算する割合を減らしていく，いわゆる減衰を用いたもの．
上記の定義に対応して，行動価値関数$Q(s,a)$の定義を行うことで，方策勾配定理が成り立つ．$$\nabla _ \theta J(\theta)=\mathbb{E}_{\pi_{\theta}}[(\nabla _\theta \log \pi _\theta (a|s)Q^\pi (s,a))]$$
方策勾配定理導出のポイント
- 状態価値関数 $$v(s)=\sum_a(\pi(a|s)Q(s,a))$$
- ベルマン方程式 $$Q(s,a)=\sum_{s'}(P(s'|s,s)[r(s'|s,s)+\gamma V(s')]$$
- 対数微分法

DeepLearning ラビットチャレンジ

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

【ラビットチャレンジ】 深層学習（後編・day4)

Section1：Tensorflowの実装演習

TensorFlow

constant

placeholder

variables

線形回帰

[try] noiseの値を変更しよう

非線形回帰

[try]下式で生成したデータの回帰．

分類1層 (mnist)

[try]x：入力値, d：教師データ, W：重み, b：バイアス をそれぞれ定義しよう

分類3層 (mnist)

[try] 隠れ層のサイズを変更してみよう

[try] optimizerを変更しよう

分類CNN (mnist)

[try] ドロップアウト率を0に変更しよう

確認テスト

Keras

線形回帰

単純パーセプトロン

or回路

[try] np.random.seed(0)をnp.random.seed(1)に変更

[try] エポック数を100に変更

[try] AND回路, XOR回路に変更

[try] XOR回路にしてバッチサイズを10に変更

[try] エポック数を300に変更しよう

分類 (iris)

Irisの分類

[try] 中間層の活性関数をsigmoidに変更しよう

[try] SGDをimportしoptimizerをSGD(lr=0.1)に変更しよう

分類 (mnist)

[try] load_mnistのone_hot_labelをFalseに変更しよう (errorになる)

[try] 誤差関数をsparse_categorical_crossentropyに変更しよう

[try] Adamの引数の値を変更しよう

CNN分類 (mnist)

cifar10

RNN

[try] RNNの出力ノード数を128に変更

[try] RNNの出力ノード数を128に変更

[try] RNNの出力活性化関数を sigmoid に変更

[try] RNNの出力活性化関数を tanh に変更

[try] 最適化方法をadamに変更

[try] RNNの入力 Dropout を0.5に設定

[try] RNNの再帰 Dropout を0.3に設定

[try] RNNのunrollをTrueに設定

[try] LSTM,GRUも実装可能．

[try] RNNの出力活性化関数を tanh に変更

[try] 最適化方法をadamに変更

[try] RNNの入力 Dropout を0.5に設定

[try] RNNの再帰 Dropout を0.3に設定

[try] RNNのunrollをTrueに設定

LSTM,GRUも実装可能．

Section2：強化学習

確認テスト

探索と利用のトレードオフ

強化学習の差分

行動価値関数

方策関数

方策勾配法

【ラビットチャレンジ】深層学習（後編・day4)

[try]x：入力値, d：教師データ, W：重み, b：バイアスをそれぞれ定義しよう