Help us understand the problem. What is going on with this article?

Google Colaboratory で Chainer を触ってみるvol.2 ~隠れ層を理解する~

More than 1 year has passed since last update.

※この記事では, 一般的なニューラルネットワークの特徴, CNN のしくみなどは取り上げません. それらは各自で学習済みと想定しています.

前回

Google Colaboratory で Chainer を触ってみるvol.1 ~HandsOn 通りに動かす~

進捗

隠れ層の数, 活性化関数, optimizer, 学習時のパラメータなどが学習時間と学習精度にどのような影響を及ぼすか, 「00 Colaboratory で Chainer を動かしてみよう」で実験中です. 今回は, 「隠れ層の数の影響」を確認しました.

理解したこと表

どの影響? 学習時間 学習精度 備考
隠れ層の数 + 0.3秒/1layer 8層でほぼ頭打ち -
隠れ層の node 数 - - -
活性化関数 - - -
epoch 数 - - -
batchsize - - -
optimizer - - -

層とは

(説明するまでもないかも. ネット検索するといくらでも出てきます)
ニューラルネットワークへの入力列を「入力層」, ニューラルネットワークからの出力列を「出力層」といいます. そして, 入力層と出力層の間にあるニューロン列を「隠れ層」または「中間層」といいます. 隠れ層を何層にも重ねてディープにラーニングすることで, 機械が複雑な特徴量表現をできるようになる, と言われています.

試行錯誤するコード

下記コード中のパラメータや関数を変更します. そのほかのコードは, Vo.1で投稿したものと同じです.

class_model.py
# エポック数10以下、かつ訓練時間100秒以内という制限の中で、テスト用データにおいて88%を超える精度を目指す
import chainer.functions as F
import chainer.links as L
from chainer import Chain
class MLPNew(Chain):

  def __init__(self):
      super(MLPNew, self).__init__()
      with self.init_scope():
          # Add more layers?
          self.l1 = L.Linear(784, 200) # Increase output node as (784, 300)?
          self.l2 = L.Linear(200, 200) # Increase nodes as (300, 300)?
          self.l3 = L.Linear(200, 10)  # Increase nodes as (300, 10)?

  def forward(self, x):
      h1 = F.tanh(self.l1(x))  # Replace F.tanh with F.sigmoid or F.relu?
      h2 = F.tanh(self.l2(h1)) # Replace F.tanh with F.sigmoid or F.relu?
      y = self.l3(h2)
      return y

do_train_and_validate.py
device = 0
n_epoch = 5     # Add more epochs?
batchsize = 256 # Increase/Decrease mini-batch size?

model = MLPNew()
classifier_model = L.Classifier(model)
optimizer = optimizers.SGD() # Default SGD(). Use other optimizer, Adam()?(Are there Momentum and AdaGrad?)

train_and_validate(
    classifier_model, optimizer, train, validation, n_epoch, batchsize, device)

結果詳細

デフォルト(HandsOn 通りの設定)

  • 隠れ層 : 3
  • 隠れ層の node 数 : 200
  • 活性化関数 : tanh
  • epoch 数 : 5
  • batchsize : 256
  • optimizer : SGD
epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  elapsed_time
1           1.36351     0.623465       0.977643       0.690527           3.01746       
2           0.85074     0.7251         0.782766       0.736035           6.52711       
3           0.721985    0.760557       0.696563       0.759961           10.1278       
4           0.653269    0.780692       0.644298       0.779688           13.557        
5           0.607298    0.795272       0.608763       0.790332           16.9682

Test accuracy: 0.7873047

隠れ層+1

まず, 隠れ層をひとつ増やしました(隠れ層は全部で4つ).

add_1layer.py
class MLPNew(Chain):

  def __init__(self):
      super(MLPNew, self).__init__()
      with self.init_scope():
          # Add more layers?
          self.l1 = L.Linear(784, 200) # Increase output node as (784, 300)?
          self.l2 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l3 = L.Linear(200, 200) # Increase nodes as (300, 300)?
          self.l4 = L.Linear(200, 10)  # Increase nodes as (300, 10)?

  def forward(self, x):
      h1 = F.tanh(self.l1(x))  # Replace F.tanh with F.sigmoid or F.relu?
      h2 = F.tanh(self.l2(h1)) # ←追加
      h3 = F.tanh(self.l3(h2)) # Replace F.tanh with F.sigmoid or F.relu?
      y = self.l4(h3)
      return y
epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  elapsed_time
1           1.37984     0.61067        0.96462        0.684277           3.35165       
2           0.834291    0.725361       0.760951       0.73877            7.14185       
3           0.698771    0.765064       0.671574       0.76709            10.8874       
4           0.628342    0.786332       0.618533       0.784277           14.6952       
5           0.582528    0.800881       0.581186       0.798828           18.4509

Test accuracy: 0.79609376

デフォルトと比較して, 学習時間が 0.3 秒ずつ増加, 精度は約 0.9% 上昇しました. 訓練結果とテスト結果に大きな差はないので, 過学習はなさそうです.

隠れ層+5

一層だけでは傾向がよくわからないので, 5層追加してみました(隠れ層は計8層).

add_5layer.py
import chainer.functions as F
import chainer.links as L
from chainer import Chain
class MLPNew(Chain):

  def __init__(self):
      super(MLPNew, self).__init__()
      with self.init_scope():
          # Add more layers?
          self.l1 = L.Linear(784, 200) # Increase output node as (784, 300)?
          self.l2 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l3 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l4 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l5 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l6 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l7 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l8 = L.Linear(200, 10)  # Increase nodes as (300, 10)?

  def forward(self, x):
      h1 = F.tanh(self.l1(x))  # Replace F.tanh with F.sigmoid or F.relu?
      h2 = F.tanh(self.l2(h1)) # ←追加
      h3 = F.tanh(self.l3(h2)) # Replace F.tanh with F.sigmoid or F.relu?
      h4 = F.tanh(self.l4(h3)) # ←追加
      h5 = F.tanh(self.l5(h4)) # ←追加
      h6 = F.tanh(self.l6(h5)) # ←追加
      h7 = F.tanh(self.l7(h6)) # ←追加
      y = self.l8(h7)
      return y

epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  elapsed_time
1           1.34231     0.607502       0.895956       0.707617           4.68054       
2           0.74458     0.750861       0.673352       0.761719           9.61153       
3           0.604405    0.791306       0.588284       0.790918           14.6184       
4           0.539773    0.811982       0.542256       0.809473           19.625        
5           0.503576    0.822957       0.514283       0.814648           24.7382

Test accuracy: 0.819043

デフォルトと比較して, 学習時間が 1.6 秒ずつ増加, 精度は約 2.3% 上昇しました. これも訓練結果とテスト結果に大差ないので, こちらも過学習はなさそうです.

隠れ層+15

最後に, 15層追加してみました(隠れ層は計18層)

add_15layer.py
import chainer.functions as F
import chainer.links as L
from chainer import Chain
class MLPNew(Chain):

  def __init__(self):
      super(MLPNew, self).__init__()
      with self.init_scope():
          # Add more layers?
          self.l1 = L.Linear(784, 200) # Increase output node as (784, 300)?
          self.l2 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l3 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l4 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l5 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l6 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l7 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l8 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l9 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l10 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l11 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l12 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l13 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l14 = L.Linear(200, 200) # Increase nodes as (300, 300)? # ←追加
          self.l15 = L.Linear(200, 10)  # Increase nodes as (300, 10)?

  def forward(self, x):
      h1 = F.tanh(self.l1(x))  # Replace F.tanh with F.sigmoid or F.relu?
      h2 = F.tanh(self.l2(h1)) # ←追加
      h3 = F.tanh(self.l3(h2)) # Replace F.tanh with F.sigmoid or F.relu?
      h4 = F.tanh(self.l4(h3)) # ←追加
      h5 = F.tanh(self.l5(h4)) # ←追加
      h6 = F.tanh(self.l6(h5)) # ←追加
      h7 = F.tanh(self.l7(h6)) # ←追加
      h8 = F.tanh(self.l8(h7)) # ←追加
      h9 = F.tanh(self.l9(h8)) # ←追加
      h10 = F.tanh(self.l10(h9)) # ←追加
      h11 = F.tanh(self.l11(h10)) # ←追加
      h12 = F.tanh(self.l12(h11)) # ←追加
      h13 = F.tanh(self.l13(h12)) # ←追加
      h14 = F.tanh(self.l14(h13)) # ←追加
      y = self.l15(h14)
      return y
epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  elapsed_time
1           1.18994     0.62785        0.788837       0.729004           7.36658       
2           0.670227    0.770974       0.617476       0.77832            15.1356       
3           0.556932    0.805008       0.548517       0.802148           22.7699       
4           0.508303    0.821807       0.509836       0.816406           30.4146       
5           0.481139    0.830128       0.498664       0.824609           38.0188 

Test accuracy: 0.81943357

デフォルトと比較して, 学習時間が 4.3 秒ずつ増加, 精度は約 2.3% 上昇しました. 過学習はしてないようですが, 5層追加時の精度とほぼ変わりませんので, ここまで層を増やす必要はなさそうです.

まとめ

隠れ層を一層増やすと, 学習時間が 0.3 秒延びる. 隠れ層は計 8 層もあれば十分. 「多けりゃいいというものではない」ということを確認できました. ゼロから作る DeepLearningをもう一度やりたくなってきたな.

次回

DeepLearning の node 数を理解する.

Why do not you register as a user and use Qiita more conveniently?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away