Edited at

# ニューラルネットで最適化アルゴリズムを色々試してみる

More than 3 years have passed since last update.

# はじめに

MNISTをベンチマークとして簡単なニューラルネットワークで以下の学習係数最適化を使ってみました．それぞれの評価用データセットに対するaccuracyを比較してみます．

• SGD

• Momentum SGD

• RMSprop

# 実装

optimizers.py

```class Optimizer(object):
def __init__(self, params=None):
if params is None:
return NotImplementedError()
self.params = params

if loss is None:
return NotImplementedError()

self.gparams = [T.grad(loss, param) for param in self.params]
```

ちなみにここでの`self.updates`は重みなどを更新するときに使います．

## SGD

optimizers.py

```class SGD(Optimizer):
def __init__(self, learning_rate=0.01, params=None):
super(SGD, self).__init__(params=params)
self.learning_rate = 0.01

for param, gparam in zip(self.params, self.gparams):
self.updates[param] = param - self.learning_rate * gparam

```

## Momentum SGD

optimizers.py

```class MomentumSGD(Optimizer):
def __init__(self, learning_rate=0.01, momentum=0.9, params=None):
super(MomentumSGD, self).__init__(params=params)
self.learning_rate = learning_rate
self.momentum = momentum
self.vs = [build_shared_zeros(t.shape.eval(), 'v') for t in self.params]

for v, param, gparam in zip(self.vs, self.params, self.gparams):
_v = v * self.momentum
_v = _v - self.learning_rate * gparam

```

optimizers.py

```class AdaGrad(Optimizer):
def __init__(self, learning_rate=0.01, eps=1e-6, params=None):

self.learning_rate = learning_rate
self.eps = eps

dx = - (self.learning_rate / T.sqrt(agrad + self.eps)) * gparam

```

## RMSprop

optimizers.py

```class RMSprop(Optimizer):
def __init__(self, learning_rate=0.001, alpha=0.99, eps=1e-8, params=None):
super(RMSprop, self).__init__(params=params)

self.learning_rate = learning_rate
self.alpha = alpha
self.eps = eps

self.mss = [build_shared_zeros(t.shape.eval(),'ms') for t in self.params]

for ms, param, gparam in zip(self.mss, self.params, self.gparams):
_ms = ms*self.alpha
_ms += (1 - self.alpha) * gparam * gparam
self.updates[param] = param - self.learning_rate * gparam / T.sqrt(_ms + self.eps)

```

optimizers.py

```class AdaDelta(Optimizer):
def __init__(self, rho=0.95, eps=1e-6, params=None):

self.rho = rho
self.eps = eps
self.accudeltas = [build_shared_zeros(t.shape.eval(),'accudelta') for t in self.params]

agrad = self.rho * accugrad + (1 - self.rho) * gparam * gparam
dx = - T.sqrt((accudelta + self.eps)/(agrad + self.eps)) * gparam
self.updates[accudelta] = (self.rho*accudelta + (1 - self.rho) * dx * dx)

```

optimizers.py

```class Adam(Optimizer):
def __init__(self, alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-8, gamma=1-1e-8, params=None):

self.alpha = alpha
self.b1 = beta1
self.b2 = beta2
self.gamma = gamma
self.t = theano.shared(np.float32(1))
self.eps = eps

self.ms = [build_shared_zeros(t.shape.eval(), 'm') for t in self.params]
self.vs = [build_shared_zeros(t.shape.eval(), 'v') for t in self.params]

self.b1_t = self.b1 * self.gamma ** (self.t - 1)

for m, v, param, gparam \
in zip(self.ms, self.vs, self.params, self.gparams):
_m = self.b1_t * m + (1 - self.b1_t) * gparam
_v = self.b2 * v + (1 - self.b2) * gparam ** 2

m_hat = _m / (1 - self.b1 ** self.t)
v_hat = _v / (1 - self.b2 ** self.t)

self.updates[param] = param - self.alpha*m_hat / (T.sqrt(v_hat) + self.eps)

```

# 実験

MNISTを使って20epochずつ30シード平均とりました．学習係数やニューラルネットの細かい設定に関しては私のGitHubリポジトリを参照ください．

# 結果

うっ，上の方がごちゃごちゃしててわからない，拡大しよう．

SGDさんが消えてしまった．