LoginSignup
0
0

More than 3 years have passed since last update.

Machine learning notes

Last updated at Posted at 2019-09-22

These are my machine learning notes.

I would like to predict the house estimate value, so I will write things I searched.

statistical knowledges

  • Covariance is Large negative covariance, Near zero covariance, Large positive covariance
Covariance = 
\left( \sum_{k=1}^n (x-u)(y-v)/\right)(n-1) 

      reference: https://www.youtube.com/watch?v=0nZT9fqr2MU

  • Person’s r measures of the strength of the linear relationship between two variables and is always between 1 and -1

  •       reference: https://www.youtube.com/watch?v=2B_UW-RweSE

  • R-squared or coefficient of determination can be thought of as a percent, it gives you an idea of how many data points fall within the results of the line formed by regression equation. If the coefficient is 0.80, then 80% of the points should fall within the regression line.

      reference: https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/coefficient-of-determination-r-squared/

useful machine learning techniches

  • how show Person's r between features
example
#Calculate and show correlation matrix
cm = np.corrcoef(data.values.T)
sns.set(font_scale=1.5)
hm = sns.heatmap(cm,
                cbar=True,
                annot=True,
                square=True,
                fmt='.2f',
                annot_kws={'size': 15},
                yticklabels=cols,
                xticklabels=cols)

reference: https://towardsdatascience.com/machine-learning-project-predicting-boston-house-prices-with-regression-b4e47493633d

  • how to measure the quality of models
example
# Import 'r2_score'

from sklearn.metrics import r2_score

def performance_metric(y_true, y_predict):
    """ Calculates and returns the performance score between 
        true (y_true) and predicted (y_predict) values based on the metric chosen. """

    score = r2_score(y_true, y_predict)

    # Return the score
    return score
  • we have to choose which maximum depths is suitable for the model, so visualization of models which have different maximum depths.
example
# Produce learning curves for varying training set sizes and maximum depths
vs.ModelLearning(features, prices)

it is easy which depth is good for the model.

example
# Produce complexity curve for varying training set sizes and maximum depths
vs.ModelComplexity(X_train, y_train)

for example if the depth is one, training and validation data don't return good score. It is called a symptom of undercfitting and so, high bias.
If the depth is maximum, the model learn perfectly well from training data and also returns poor results on test data. It is called a symptom of overfitting.

reference:https://towardsdatascience.com/machine-learning-project-predicting-boston-house-prices-with-regression-b4e47493633d

0
0
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0