LoginSignup
0

More than 5 years have passed since last update.

Stochastic Gradient Descent > keyword > is coalesced | the learning rate schedule | (batch size B) with values above 10 taking advantage of the speed-up of matrix-matrix products over matrix-vector products

Last updated at Posted at 2017-08-17

How large should the batch size be for stochastic gradient descent?
https://stats.stackexchange.com/questions/140811/how-large-should-the-batch-size-be-for-stochastic-gradient-descent

sabalabaによる回答の中で知らなかった単語など

be coalesced

your algorithm will run faster if the memory accesses are coalesced, i.e. when you read the memory in order and don't jump around randomly.

the learning rate schedule

I like to think of epsilon as a function from the epoch count to a learning rate. This function is called the learning rate schedule.

関連: http://machinelearningmastery.com/using-learning-rate-schedules-deep-learning-models-python-keras/

advantage of the speed-up

Antoineによるコメント (Mar 24 at 9:44)

e.g. B = 32 is a good default value, with values above 10 taking advantage of the speed-up of matrix-matrix products over matrix-vector products.

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0