More than 5 years have passed since last update.

Machine learning notes

Last updated at 2019-09-22Posted at 2019-09-22

These are my machine learning notes.

I would like to predict the house estimate value, so I will write things I searched.

statistical knowledges

Covariance is Large negative covariance, Near zero covariance, Large positive covariance

Covariance = 
\left( \sum_{k=1}^n (x-u)(y-v)/\right)(n-1)

reference: https://www.youtube.com/watch?v=0nZT9fqr2MU

Person’s r measures of the strength of the linear relationship between two variables and is always between 1 and -1

reference: https://www.youtube.com/watch?v=2B_UW-RweSE

R-squared or coefficient of determination can be thought of as a percent, it gives you an idea of how many data points fall within the results of the line formed by regression equation. If the coefficient is 0.80, then 80% of the points should fall within the regression line.

reference: https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/coefficient-of-determination-r-squared/

useful machine learning techniches

how show Person's r between features

example

# Calculate and show correlation matrix
cm = np.corrcoef(data.values.T)
sns.set(font_scale=1.5)
hm = sns.heatmap(cm,
                cbar=True,
                annot=True,
                square=True,
                fmt='.2f',
                annot_kws={'size': 15},
                yticklabels=cols,
                xticklabels=cols)

reference: https://towardsdatascience.com/machine-learning-project-predicting-boston-house-prices-with-regression-b4e47493633d

how to measure the quality of models
example

Import 'r2_score'

from sklearn.metrics import r2_score

def performance_metric(y_true, y_predict):
""" Calculates and returns the performance score between
true (y_true) and predicted (y_predict) values based on the metric chosen. """

score = r2_score(y_true, y_predict)

# Return the score
return score


* we have to choose which maximum depths is suitable for the model,
so visualization of models which have different maximum depths.

  ``` python:example
# Produce learning curves for varying training set sizes and maximum depths
vs.ModelLearning(features, prices)

it is easy which depth is good for the model.

example

# Produce complexity curve for varying training set sizes and maximum depths
vs.ModelComplexity(X_train, y_train)

for example if the depth is one, training and validation data don't return good score. It is called a symptom of undercfitting and so, high bias.
If the depth is maximum, the model learn perfectly well from training data and also returns poor results on test data. It is called a symptom of overfitting.

reference:https://towardsdatascience.com/machine-learning-project-predicting-boston-house-prices-with-regression-b4e47493633d

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up