More than 3 years have passed since last update.

Machines Learning 学习笔记(Week7)

MachineLearning

Last updated at 2020-05-29Posted at 2020-05-28

Support Vector Machines (SVM)

1. SVM Hypothesis

min \, C\sum_{i=1}^m \Bigl[ y^{(i)}cost_1(\theta^Tx^{(i)})+(1-y^{(i)})cost_0(\theta^Tx^{(i)})\Bigr]+\frac{1}{2}\sum_{i=1}^n\theta_j^2

2. Kernels

Gaussian Kernel:

f_1 = similarity(x,l^{(1)}) = exp\Bigl(-\frac {||x-l_j^{(1)}||^2} {2\sigma^2} \Bigr) = exp \Bigl(- \frac {\sum_{j=1}^n(x_j-l_j^{(1)})^2} {2\sigma^2} \Bigr)

SVM with Kernels

Hypothesis: Given x, compute features $f \in R^{m+1}$

Predict "y=1" if $\theta^Tf \geq 0$

Training:

min \, C \sum_{i=1}^m y^{(i)}cost_1(\theta^Tf^{(i)})+(1-y^{(i)})cost_0(\theta^Tf^{(i)})+\frac{1}{2}\sum_{j=1}^n\theta_j^2

Notes:

C (=$\frac{1}{\lambda}$):
Large C: Lower bias, high variance.
Small C: Higher bias, low variance.

$\sigma^2$:
Large $\sigma^2$: Higher bias, lower variance
Small $\sigma^2$: Lower bias, high variance

3. Using an SVM

Use SVM software package to solve for parameters $\theta$. Need to specify:

Choice of parameter C
Choice of kernel: No kernel ("linear kernel"), Gaussian kernel, ...

4. Logistic Regression vs. SVMs

n = number of features, m = number of training examples

if n is large (relative to m): use logistic regression, or SVM without a kernel ("linear kernel") -- n=10,000, m=10...1,000
if n is small, m is intermediate: use SVM with Gaussian kernel -- n=1,000, m=10-10,000
if n is small, m is large: create/add more features, then use logistic regression or SVM without a kernel -- n=1,000, m=50,000+
Neural network likely to work well for most of the these settings, but may be slower to train

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up