0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Machines Learning 学习笔记(Week3)

Last updated at Posted at 2020-04-30

#Supervised Learning

##Classification Problem

1. Hypothesis Function:

  • "Sigmoid Function" or "Logistic Function":
    $g(z) = \frac{1}{1+e^{-z}} $
h_\theta(x) = g(\theta^Tx) = 
\frac{1}{1 + e^{-\theta^Tx}}

Logistic_function.png

  • Interpretation:
    the probability that our output is 1
    $h_\theta(x) = P(y=1|x;\theta) = 1 - P(y=0|x;\theta) $

  • Decision Boundary:

    • when
      $h_\theta(x) \geq 0.5 \rightarrow y=1, h_\theta(x) < 0.5 \rightarrow y=0$
      means $g(z) \geq 0.5 \rightarrow z \geq 0 \rightarrow y=1 $
    • z is the input (e.g. $z = \theta^Tx$)
    • decision boundary could be any shape:

DB.pngNLDB.png

2. Cost Function:

J(\theta) = - \frac{1}{m}
\sum_{i=1}^{m}
\Bigl[y^{(i)}log\bigl(h_\theta(x^{(i)})\bigr)
+(1-y^{(i)})log\bigl(1-h_\theta(x^{(i)})\bigr)
\Bigr]

Vectorized implementation:
$h=g(X\theta)$
$J(\theta)=\frac{1}{m} \bigl(-y^Tlog(h)-(1-y)^Tlog(1-h)\bigr)$

3. Gradient Descent:
Repeat {

\theta_j := \theta_j- \frac{\alpha}{m}
\sum_{i=1}^{m}
\bigl(h_\theta(x^{(i)})-y^{(i)}\bigr)x_j^{(i)}

}
Vectorized implementation:
$\theta := \theta - \frac{\alpha}{m}X^T\bigl(g(X\theta)-\vec{y}\bigr)$

##Advanced Optimization

  • Optimization algorithms:
  • Gradient descent
  • Conjugate gradient
  • BFGS
  • L-BFGS

Code:
First, we need to provide a function that evaluates both
$J(\theta)$, and
$\frac{\alpha}{\alpha\theta_j}J(\theta)$

function [jVal, gradient] = costFunction(theta)
  jVal = [...code to compute J(theta)...];
  gradient = [...code to compute derivative of J(theta)...];
end

Then we use "fminunc()" optimization algorithm along with the "optimset()" function that creates an object containing the options we want to send to "fminunc()".

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2,1);
   [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

##Multiclass Classification: One-vs-all
Multiclass means y = {0,1, ... ,n}. Simply apply the same logistic algorithem to each class:

Train a logistic regression classifier $h_\theta(x)$ for each class to predict the probability that  y = i .

To make a prediction on a new x, pick the class that maximizes $h_\theta(x)$

MCC.png

The Problem of Overfitting

Too many features, too complicated function - high variance:
overfit.png

1. Regularized Cost Function:
regularize all of our theta parameters in a single summation:

min_\theta \rightarrow J(\theta)= \frac{1}{2m}\Biggl[ 
\sum_{i=1}^m \bigl(h_\theta(x^{(i)})-y^{(i)} \bigr)^2 +
\lambda\sum_{j=1}^n\theta_j^2  \Biggr]

The $\lambda$, or lambda, is the regularization parameter.

2. Regularized Gradient Descent:

  • Regularized Linear Regression:

Repeat {

\theta_0 := \theta_0 - \alpha\frac{1}{m} \sum_{i=1}^m \bigl(h_\theta(x^{(i)})-y^{(i)}\bigr)x_0^{(i)}
\theta_j := \theta_j - \alpha \Biggl[ \Bigl(\frac{1}{m} \sum_{i=1}^{m} \bigl(h_\theta(x^{(i)})-y^{(i)} \bigr)x_j^{(i)}\Bigr)+\frac{\lambda}{m}\theta_j \Biggr]\qquad j\in {1,2...n}

}

$\theta_j$ can also be represented as:
$\theta_j := \theta_j(1-\alpha\frac{\lambda}{m})-\alpha\frac{1}{m} \sum_{i=1}^m \bigl(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$

Intuitively, you can see it as reducing the value of $\theta_j$ by some amount on every update.

  • Normal Equation:

$\theta = (X^TX+\lambda L)^{-1}X^Ty$
where $L$ =

\begin{bmatrix}
0& & & &  \\
 &1& & &  \\
 & &1& &  \\
 & & &\ddots& \\
 & & & &1\\
\end{bmatrix}\qquad (n+1) \, x \, (n+1) \, dimension
  • Regularized Logistic Regression:

Repeat {

\theta_0 := \theta_0 - \alpha\frac{1}{m} \sum_{i=1}^m \bigl(h_\theta(x^{(i)})-y^{(i)}\bigr)x_0^{(i)}
\theta_j := \theta_j - \alpha \Biggl[ \Bigl(\frac{1}{m} \sum_{i=1}^{m} \bigl(h_\theta(x^{(i)})-y^{(i)} \bigr)x_j^{(i)}\Bigr)+\frac{\lambda}{m}\theta_j \Biggr]\qquad j\in {1,2...n}

}

Cost function (regularized):
$J(\theta) = - \frac{1}{m}
\sum_{i=1}^{m}
\Bigl[y^{(i)}log\bigl(h_\theta(x^{(i)})\bigr)
+(1-y^{(i)})log\bigl(1-h_\theta(x^{(i)})\bigr)
\Bigr]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2$

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?