LoginSignup
2
2

More than 5 years have passed since last update.

ChainerのMNIST exampleで発生したOverflowErrorについて

Posted at

Chainer(v1.4.1)のMNIST exampleをGPU上で実行した際にOverflowErrorが発生したときのメモ.

load MNIST dataset
epoch 1
graph generated
train mean loss=0.189192938904, accuracy=0.9427833369
test  mean loss=0.0907700556988, accuracy=0.971400004625
epoch 2
train mean loss=0.0753641784944, accuracy=0.977200009624
test  mean loss=0.0728686049528, accuracy=0.977100006938
epoch 3
train mean loss=0.0482070475052, accuracy=0.984483343263
test  mean loss=0.0654721940898, accuracy=0.981700006723
epoch 4
train mean loss=0.0369998926547, accuracy=0.98763334175
test  mean loss=0.0861884641147, accuracy=0.976300006509
epoch 5
train mean loss=0.028689230929, accuracy=0.990883341332
test  mean loss=0.0656151846884, accuracy=0.982000008225
epoch 6
train mean loss=0.0242170301934, accuracy=0.991766673426
test  mean loss=0.092080684273, accuracy=0.976500005126
epoch 7
train mean loss=0.0214745011753, accuracy=0.993416672448
test  mean loss=0.0633907641986, accuracy=0.984200007915
epoch 8
train mean loss=0.0201626637221, accuracy=0.993866672119
test  mean loss=0.0928991907273, accuracy=0.979100006819
epoch 9
train mean loss=0.0160159983088, accuracy=0.994700004657
test  mean loss=0.0739389384927, accuracy=0.982000007629
epoch 10
train mean loss=0.0127062624457, accuracy=0.99568333745
test  mean loss=0.0723534981629, accuracy=0.983200008869
epoch 11
train mean loss=0.0158423399801, accuracy=0.994966671367
test  mean loss=0.0910608516628, accuracy=0.980800007582
epoch 12
Traceback (most recent call last):
  File "train_mnist.py", line 88, in <module>
    optimizer.update(model, x, t)
  File "/***/.pyenv/versions/anaconda-2.4.0/lib/python2.7/site-packages/chainer-1.4.1-py2.7-linux-x86_64.egg/chainer/optimizer.py", line 386, in update
    self.update_one(param, states[name])
  File "/***/.pyenv/versions/anaconda-2.4.0/lib/python2.7/site-packages/chainer-1.4.1-py2.7-linux-x86_64.egg/chainer/optimizer.py", line 402, in update_one
    self.update_one_gpu(param, state)
  File "/***/.pyenv/versions/anaconda-2.4.0/lib/python2.7/site-packages/chainer-1.4.1-py2.7-linux-x86_64.egg/chainer/optimizers/adam.py", line 43, in update_one_gpu
    'adam')(param.grad, self.lr, 1 - self.beta1, 1 - self.beta2,
  File "/***/.pyenv/versions/anaconda-2.4.0/lib/python2.7/site-packages/chainer-1.4.1-py2.7-linux-x86_64.egg/chainer/optimizers/adam.py", line 56, in lr
    fix1 = 1. - self.beta1 ** self.t
OverflowError: (34, 'Numerical result out of range')

adam.pyの中で$\beta_1^t$を計算している時にOverflowを出しているみたい.
self.beta1 ** self.tの計算結果を監視すると,self.beta1 ** self.t = 2.3571703009247695e-308となる時の次のイテレーションでOverflowErrorが発生することを確認した.

pythonのfloat型に関する情報を確認してみる

Python 2.7.10 |Anaconda 2.4.0 (64-bit)| (default, Oct 19 2015, 18:04:42) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import sys
>>> sys.float_info.min
2.2250738585072014e-308

確かにfloat型で取り扱える値よりも小さくなってしまうため,OverflowErrorが発生していることがわかる.
ここで,python(2.7.10 | Anaconda 2.4.0(64-bit))の少し奇妙な動作を確認した.


>>> 10 ** -307
1e-307
>>> 10 ** -308
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: (34, 'Numerical result out of range')
>>> 10 ** -350
0.0

10 ** -308を計算するとOverflowErrorが発生するものの,10 ** -350は0.0と計算してくれる.

adam.pyについては,応急処置的に以下のように書き換えを行った.

adam.py
    @property
    def lr(self):
        if -self.t <= sys.float_info.min_10_exp:
            fix1 = 1.
            fix2 = 1.
        else:
            fix1 = 1. - self.beta1 ** self.t
            fix2 = 1. - self.beta2 ** self.t
        """
        fix1 = 1. - self.beta1 ** self.t
        fix2 = 1. - self.beta2 ** self.t
        """
        return self.alpha * math.sqrt(fix2) / fix1

OverflowErrorは発生しなくなるものの,うーん……
$\beta_1$の値を0.99など少し大きめに設定することも考えられるが,本質的な解決にはならないし……

2
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
2