0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Machines Learning 学习笔记(Week10)

Posted at

##Gradient Descent with Large Datasets

  • Batch gradient descent: Use all $m$ examples in each iteration

  • very computationally expensive

  • Stochastic gradient descent: Use 1 example in each iteration

  • We can plot cost $\bigl(\theta,(x^{(i)},y^{(i)})\bigr)$ (averaged over the last (say) 1000 examples) to monitor how stochastic gradient descent is doing

  • If we reduce the learning rate α (and run stochastic gradient descent long enough), it’s possible that we may find a set of better parameters than with larger α.

  • If we want stochastic gradient descent to converge to a (local) minimum rather than wander of "oscillate" around it, we should slowly decrease α over time.

  • Mini-batch gradient descent: Use $b$ examples in each iteration (b = mini-batch size 2-100)

  • Map Reduce (Parallelize data computing)

  • use multiple computers or cores to parallelize learning algorithm

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?