More than 5 years have passed since last update.

Google Colaboratory で Chainer を触ってみるvol.3 ~nodeを理解する~

Last updated at 2018-12-01Posted at 2018-12-01

※この記事では, 一般的なニューラルネットワークの特徴, CNN のしくみなどは取り上げません. それらは各自で学習済みと想定しています.

前回

Google Colaboratory で Chainer を触ってみるvol.2 ~隠れ層を理解する~

進捗

隠れ層の数, 活性化関数, optimizer, 学習時のパラメータなどが学習時間と学習精度にどのような影響を及ぼすか, 「00 Colaboratory で Chainer を動かしてみよう」で実験中です. 今回は, 「node 数の影響」を確認しました.

理解したこと表

どの影響？	学習時間	学習精度	備考
隠れ層の数	+ 0.3秒/1layer	8層でほぼ頭打ち	-
隠れ層の node 数	ほぼ変わらず	500 node で +1.5%	GPU 使えば何でもよさそう
活性化関数	-	-	-
epoch 数	-	-	-
batchsize	-	-	-
optimizer	-	-	-

node とは

(詳しく知りたい方, 図を見たい方はネット検索してください)
ニューラルネットワークを構成する, 単位のことです. node を列にすると層になります. node には重みづけされた値とバイアスが入力されます. これらの入力は node 中の活性化関数にインプットされ, 値が出力されます. 活性化関数の出力値は次の node への入力値となります. ヒトの脳みそのニューロンと同じ役割を持つということで, ニューロンと呼ばれることもあります.

※活性化関数については, 次回取り上げる予定です.

試行錯誤するコード

下記コード中のパラメータや関数を変更します. そのほかのコードは, Vo.1で投稿したものと同じです.

class_model.py

# エポック数10以下、かつ訓練時間100秒以内という制限の中で、テスト用データにおいて88％を超える精度を目指す
import chainer.functions as F
import chainer.links as L
from chainer import Chain
class MLPNew(Chain):
  
  def __init__(self):
      super(MLPNew, self).__init__()
      with self.init_scope():
          # Add more layers?
          self.l1 = L.Linear(784, 200) # Increase output node as (784, 300)?
          self.l2 = L.Linear(200, 200) # Increase nodes as (300, 300)?
          self.l3 = L.Linear(200, 10)  # Increase nodes as (300, 10)?
  
  def forward(self, x):
      h1 = F.tanh(self.l1(x))  # Replace F.tanh with F.sigmoid or F.relu?
      h2 = F.tanh(self.l2(h1)) # Replace F.tanh with F.sigmoid or F.relu?
      y = self.l3(h2)
      return y

do_train_and_validate.py

device = 0
n_epoch = 5     # Add more epochs?
batchsize = 256 # Increase/Decrease mini-batch size?

model = MLPNew()
classifier_model = L.Classifier(model)
optimizer = optimizers.SGD() # Default SGD(). Use other optimizer, Adam()?(Are there Momentum and AdaGrad?)

train_and_validate(
    classifier_model, optimizer, train, validation, n_epoch, batchsize, device)

結果詳細

デフォルト(HandsOn 通りの設定)

隠れ層 : 3
隠れ層の node 数 : 200
活性化関数 : tanh
epoch 数 : 5
batchsize : 256
optimizer : SGD

epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  elapsed_time
1           1.42886     0.597377       1.00901        0.687109           2.84684       
2           0.875153    0.71893        0.801961       0.734766           5.95564       
3           0.73895     0.755148       0.713522       0.753906           9.16102       
4           0.668023    0.77563        0.659217       0.772656           12.3336       
5           0.622229    0.790645       0.623375       0.784473           15.4588

Test accuracy: 0.7852539

node 数:全部300

隠れ層の node 数を100ずつ増やしてみました.

node300.py

  def __init__(self):
      super(MLPNew, self).__init__()
      with self.init_scope():
          # Add more layers?
          self.l1 = L.Linear(784, 300) # Increase output node as (784, 300)?
          self.l2 = L.Linear(300, 300) # Increase nodes as (300, 300)?
          self.l3 = L.Linear(300, 10)  # Increase nodes as (300, 10)?

epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  elapsed_time
1           1.34655     0.604532       0.95886        0.695215           2.78899       
2           0.839066    0.728005       0.772067       0.741602           5.92131       
3           0.713855    0.766406       0.687474       0.771484           9.0102        
4           0.645723    0.787827       0.63594        0.784668           12.1561       
5           0.60076     0.800921       0.600889       0.79668            15.4437

Test accuracy: 0.79365236

デフォルトと比較して, 学習時間はほぼ変わらず, 精度は約 0.8% 上昇しました. 過学習はしてなさそうです.

node 数:全部50

減らしたらどうなるか確認しました.

node50.py

  def __init__(self):
      super(MLPNew, self).__init__()
      with self.init_scope():
          # Add more layers?
          self.l1 = L.Linear(784, 50) # Increase output node as (784, 300)?
          self.l2 = L.Linear(50, 50) # Increase nodes as (300, 300)?
          self.l3 = L.Linear(50, 10)  # Increase nodes as (300, 10)?

epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  elapsed_time
1           1.5931      0.548091       1.21115        0.650488           2.81381       
2           1.04272     0.682131       0.940917       0.691602           5.9543        
3           0.858707    0.722576       0.816952       0.723047           9.10377       
4           0.761147    0.751315       0.742234       0.744434           12.4361       
5           0.697557    0.772256       0.688601       0.766895           15.6419  

Test accuracy: 0.77304685

デフォルトと比較して, 精度は 0.8% 減り, 学習時間は 0.2 秒増えました. 過学習もしてなさそうです.

node 数:全部500

すこし極端に増やしました.

node500.py

  def __init__(self):
      super(MLPNew, self).__init__()
      with self.init_scope():
          # Add more layers?
          self.l1 = L.Linear(784, 500) # Increase output node as (784, 300)?
          self.l2 = L.Linear(500, 500) # Increase nodes as (300, 300)?
          self.l3 = L.Linear(500, 10)  # Increase nodes as (300, 10)?

epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  elapsed_time
1           1.22904     0.643116       0.885          0.713477           2.85555       
2           0.779155    0.748958       0.726497       0.757227           6.11784       
3           0.67144     0.780469       0.656133       0.778418           9.2994        
4           0.612749    0.798908       0.609428       0.79043            12.4798       
5           0.573848    0.808494       0.578746       0.801855           15.6547

Test accuracy: 0.8011719

デフォルトと比較して, 精度は約 1.5% 上がり, 学習時間は0.2秒増えました. 過学習もしてなさそうです.

node 数:500→200

node 数を減らしていきました.

node500_200.py

  def __init__(self):
      super(MLPNew, self).__init__()
      with self.init_scope():
          # Add more layers?
          self.l1 = L.Linear(784, 500) # Increase output node as (784, 300)?
          self.l2 = L.Linear(500, 200) # Increase nodes as (300, 300)?
          self.l3 = L.Linear(200, 10)  # Increase nodes as (300, 10)?

epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  elapsed_time
1           1.32913     0.62223        0.957114       0.696777           2.78969       
2           0.835964    0.73119        0.774449       0.738086           5.99716       
3           0.712941    0.765905       0.690429       0.766113           9.16817       
4           0.646313    0.785037       0.639813       0.782617           12.3917       
5           0.602354    0.799519       0.604576       0.794531           15.5959

Test accuracy: 0.79248047

デフォルトと比較して, 精度は約 0.7% 上がり, 学習時間は 0.1 秒増えました. 過学習もしてなさそうです.

node 数:200→500

最後に, node 数を増やしていきました.

node200_500.py

  def __init__(self):
      super(MLPNew, self).__init__()
      with self.init_scope():
          # Add more layers?
          self.l1 = L.Linear(784, 200) # Increase output node as (784, 300)?
          self.l2 = L.Linear(200, 500) # Increase nodes as (300, 300)?
          self.l3 = L.Linear(500, 10)  # Increase nodes as (300, 10)?

epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  elapsed_time
1           1.30508     0.619121       0.928058       0.69209            2.82672       
2           0.811347    0.732252       0.748695       0.739551           5.99507       
3           0.691686    0.770473       0.667777       0.772754           9.25792       
4           0.626781    0.791135       0.617296       0.791895           12.4678       
5           0.584437    0.803586       0.583613       0.800195           15.647

Test accuracy: 0.7953125

デフォルトと比較して, 精度は約 1% 上がり, 学習時間は 0.2 秒増えました. 過学習もしてなさそうです.

まとめ

node 数を変更したところ, 学習時間はほぼ変わりませんが, 学習精度が最大で 1.5% 程度向上することがわかりました. また, 隠れ層ごとに node 数を変える必要は, このデータに対しては必要なさそうです.

この実験ではアクセラレータに GPU を設定しています. 「各ノードに対して重みをかけてバイアスを加える」という並列処理は GPU が得意とするところですので, アクセラレータなし(= CPU のみで演算)となった場合は, node 数が学習時間に大きな影響を与えそうな気がします(未確認).

次回

DeepLearning の活性化関数を理解する.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up