More than 5 years have passed since last update.

Machines Learning 学习笔记(Week8)

Last updated at 2020-06-03Posted at 2020-06-02

Unsupervised Learning

randomly initialize K cluster centroids $\mu_1,\mu_2,...,\mu_K$
cluster assignment: for i=1:m, $c^{(i)}$:=index of cluster centroid closest to x(i)
move centroid: for k=1:K, $\mu_k$:=average (mean) of points assigned to cluster k

min\,J(c^{(1)},...,c^{(m)},\mu_1,...,\mu_K) = \frac{1}{m}\sum_{i=1}^m||\,x^{(i)}-\mu_{c^{(i)}}||^2

$c^{(i)}$ = index of cluster (1,2,...,K) to which example x(i) is currently assigned
$\mu_k$ = cluster centroid k
$\mu_{c^{(i)}}$ = cluster centroid of cluster to which example x(i) has been assigend

For i = 1:100 {

Randomly initialize K-means

Run K-means. Get $c^{(1)},...,c^{(m)},\mu_1,...,\mu_K$

Compute cost function (distortion) $J(c^{(1)},...,c^{(m)},\mu_1,...,\mu_K)$

}

=>Pick clustering that gave lowest cost J

Purposes:

To reduce the dimension of the input data so as to speed up a learning algorithm
To compress data to reduce the memory of disk space
To visualize high-dimensional data (by choosing k=2 or 3)

Σ(Sigma) = \frac{1}{m}\sum_{i=1}^n(x^{(i)})(x^{(i)})^T

$X_{approx} = U_{reduce} * Z$