0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Machines Learning 学习笔记(Week9)

Last updated at Posted at 2020-06-11

Anomaly Detection

Algorithm (based on Gaussian Distribution)

1, Choose features $x_i$ that you think might be indicative of anomalous examples

2, Fit parameters $\mu_1,...,\mu_n, \sigma_1^2,...,\sigma_n^2$

\mu_j = \frac{1}{m}\sum_{i=1}^mx_j^{(i)}\\
\sigma_j^2 = \frac{1}{m}\sum_{i=1}^m(x_j^{(i)}-\mu_j)^2

3, Given new examples x, compute $p(x)$:

p(x) = \prod_{j=1}^n p(x_j;\mu_j,\sigma_j^2) = \prod_{j=1}^n \frac{1}{\sqrt{2\pi}\sigma_j}exp(-\frac{(x_j-\mu_j)^2}{2\sigma_j^2})

Anomaly if $p(x)<\varepsilon$

Developing and Evaluating

Assume we have some labeled data, of anomalous and non-anomalous examples (y=0 if normal, y=1 if anomalous).

Training set(60%): $x^{(1)},x^{(2)},...,x^{(m)}$(assume normal examples/not anomalous)

Cross validation set(20%):$(x_{cv}^{(1)},y_{cv}^{(1)}),...,(x_{cv}^{(m_{cv})},y_{cv}^{(m_{cv})})$

Test set(20%):$(x_{test}^{(1)},y_{test}^{(1)}),...,(x_{test}^{(m_{test})},y_{test}^{(m_{test})})$

Aircraft engines example

10000 good (normal) engines, 20 flawed engines (anomalous)

Training set: 6000 good engines
CV set: 2000 good engines, 10 anomalous
Test set: 2000 good engines, 10 anomalous

####Evaluation

Fit model p(x) on the training set
On a cross validation set, predict:

  • y=1 if p(x)<ε (anomaly)
  • y=0 if p(x)>=ε (normal)
    Possible evaluation metrics:
  • Precision/Recall
  • F1 Score
    Can also use cross validation set to choose parameter ε

Anomaly Detection vs. Supervised Learning

Anomaly Detection Supervised Learning
Very small number of positive examples(y=1). (0-20 is common). Large number of negative examples(y=0). Large number of positive and negative examples.
Many different "types" of anomalies. Hard for any algorithm to learn from positive examples what the anomalies look like; future anomalies may look like nothing like any of the anomalous examples we've seen so far. Enough positive examples for algorithm to get a sense of what positive examples are like, future positive examples likely to be similar to ones in training set.

Non-gaussian features

Transform the data: $log(x),,log(x+c),,x^{\frac{1}{2}},,x^{\frac{1}{3}}...$

Screen Shot 2020-06-11 at 22.50.22.png

Multivariate Gaussian Distribution (Optional)

Purpose: capture correlations between different features in case of unusual combinations of features

Anomaly detection with the multivariate Gaussian

1, Fit model p(x) by setting

\mu = \frac{1}{m}\sum_{i=1}^mx^{(i)}\\
\Sigma = \frac{1}{m}\sum_{i=1}^m(x^{(i)}-\mu)(x^{(i)}-\mu)^T\\
(\mu \in R^n,\,\Sigma\in R^{n*n})

2, Given a new example x, compute

p(x) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}exp\Bigl(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\Bigr)\\
(|\Sigma|=determinant\,of\,\Sigma)

Flag an anomaly if $p(x) < \varepsilon$

Relationship with orignial model

  • $\Sigma$ measures the correlations between features. If Sigma is all 0s except the diagonal, then it's identical to the original Gaussian model.
    Screen Shot 2020-06-11 at 22.49.17.png

  • Change the 0s to positive numbers, means 正相关
    Screen Shot 2020-06-11 at 22.57.03.png

  • Similarly, change the 0s to negative numbers, means 负相关
    Screen Shot 2020-06-11 at 22.57.30.png

Original Model vs. Multivariate Gaussian

Screen Shot 2020-06-11 at 22.48.21.png

#Recommender Systems

Screen Shot 2020-06-12 at 22.07.08.png

$r(i,j)=1$ if user j has rated movie i (0 otherwise)
$y^{(i,j)}=$ rating by user j on movie i (if defined)
$\theta^{(j)}=$ parameter vector for user j
$x^{(i)}=$ feature vector for movie i
For user j, movie i, predicted rating: $(\theta^{(j)})^T(x^{(i)})$
$n_u=$ no. users
$n_m=$ no. movies
$m^{(j)}=$ no. of movies rated by user j

Content-based Recommendations

We know what the content is and how the content is. We are trying to predict users' preferences.

###Optimization algorithm:

\min_{\theta^{(1)},...,\theta^{(n_u)}} \frac{1}{2}\sum_{j=1}^{n_u} \sum_{i:r(i,j)=1} \Bigl((\theta^{(j)})^Tx^{(i)}-y^{(i,j)}\Bigr)^2+\frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^n(\theta_k^{(j)})^2\\
(\theta^{(j)} \in R^{n+1})

Gradient descent update:

Screen Shot 2020-06-12 at 22.22.56.png

Collaborative Filtering

We don't know the content or the users' preferences. But we are trying to:
Screen Shot 2020-06-12 at 22.28.31.png

$x \in R^n$
$\theta \in R^n$
Don't need $x_0$ and $\theta_0$

Collaborative filtering algorithm

Screen Shot 2020-06-12 at 22.32.48.png

####Vectorization: Low rank matrix factorization

Screen Shot 2020-06-12 at 22.38.06.png

Find related products (movies)

For each product i, we learn a feature vector $x^{(i)} \in R^n$.

Find the products (movies) j with the smallest $||x^{(i)}-x^{(j)}||$.

Mean Normalization

For users who have not rated any movies, cannot predict initial values (all 0s):
Screen Shot 2020-06-12 at 22.45.32.png

Screen Shot 2020-06-12 at 22.47.41.png

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?