#These are my machine learning notes.
I would like to predict the house estimate value, so I will write things I searched.
##statistical knowledges
- Covariance is Large negative covariance, Near zero covariance, Large positive covariance
Covariance =
\left( \sum_{k=1}^n (x-u)(y-v)/\right)(n-1)
reference: https://www.youtube.com/watch?v=0nZT9fqr2MU
-
Person’s r measures of the strength of the linear relationship between two variables and is always between 1 and -1
reference: https://www.youtube.com/watch?v=2B_UW-RweSE
- R-squared or coefficient of determination can be thought of as a percent, it gives you an idea of how many data points fall within the results of the line formed by regression equation. If the coefficient is 0.80, then 80% of the points should fall within the regression line.
reference: https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/coefficient-of-determination-r-squared/
##useful machine learning techniches
- how show Person's r between features
#Calculate and show correlation matrix
cm = np.corrcoef(data.values.T)
sns.set(font_scale=1.5)
hm = sns.heatmap(cm,
cbar=True,
annot=True,
square=True,
fmt='.2f',
annot_kws={'size': 15},
yticklabels=cols,
xticklabels=cols)
-
how to measure the quality of models
example
Import 'r2_score'
from sklearn.metrics import r2_score
def performance_metric(y_true, y_predict):
""" Calculates and returns the performance score between
true (y_true) and predicted (y_predict) values based on the metric chosen. """
score = r2_score(y_true, y_predict)
# Return the score
return score
* we have to choose which maximum depths is suitable for the model,
so visualization of models which have different maximum depths.
``` python:example
# Produce learning curves for varying training set sizes and maximum depths
vs.ModelLearning(features, prices)
it is easy which depth is good for the model.
# Produce complexity curve for varying training set sizes and maximum depths
vs.ModelComplexity(X_train, y_train)
for example if the depth is one, training and validation data don't return good score. It is called a symptom of undercfitting and so, high bias.
If the depth is maximum, the model learn perfectly well from training data and also returns poor results on test data. It is called a symptom of overfitting.