More than 5 years have passed since last update.

Coursera Machine LearningをPythonで実装 - [Week2]単回帰分析、重回帰分析

Last updated at 2018-05-08Posted at 2018-04-20

機械学習の学習素材として大人気の、スタンフォード大学のAndrew Ng先生によるCourseraの講義を、復習がてらにPythonで書き換えてみました。講義ではOctave/Matlabを使っていますが、機械学習やるからにはPythonでやりたいですよね。Honor Codeに引っかからない程度で（これをそのままコピペしても課題はできません）。2018年4月時点のものです。

追記：gistからソースファイルをダウンロードできます
単回帰分析：https://gist.github.com/koshian2/c26ee4d139fa1083cabeb8aa2a53696b
重回帰分析：https://gist.github.com/koshian2/8b45ff6cf6d65bf30da7935689790402

#単回帰分析
##（おすすめ1）sklearn.linear_modelのLinearRegressionを使う

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm

# データ
X = np.array([6.1101,5.5277,8.5186,7.0032,5.8598,8.3829,7.4764,8.5781,6.4862,5.0546,5.7107,14.164,5.734,8.4084,5.6407,5.3794,6.3654,5.1301,6.4296,7.0708,6.1891,20.27,5.4901,6.3261,5.5649,18.945,12.828,10.957,13.176,22.203,5.2524,6.5894,9.2482,5.8918,8.2111,7.9334,8.0959,5.6063,12.836,6.3534,5.4069,6.8825,11.708,5.7737,7.8247,7.0931,5.0702,5.8014,11.7,5.5416,7.5402,5.3077,7.4239,7.6031,6.3328,6.3589,6.2742,5.6397,9.3102,9.4536,8.8254,5.1793,21.279,14.908,18.959,7.2182,8.2951,10.236,5.4994,20.341,10.136,7.3345,6.0062,7.2259,5.0269,6.5479,7.5386,5.0365,10.274,5.1077,5.7292,5.1884,6.3557,9.7687,6.5159,8.5172,9.1802,6.002,5.5204,5.0594,5.7077,7.6366,5.8707,5.3054,8.2934,13.394,5.4369])
y = np.array([17.592,9.1302,13.662,11.854,6.8233,11.886,4.3483,12,6.5987,3.8166,3.2522,15.505,3.1551,7.2258,0.71618,3.5129,5.3048,0.56077,3.6518,5.3893,3.1386,21.767,4.263,5.1875,3.0825,22.638,13.501,7.0467,14.692,24.147,-1.22,5.9966,12.134,1.8495,6.5426,4.5623,4.1164,3.3928,10.117,5.4974,0.55657,3.9115,5.3854,2.4406,6.7318,1.0463,5.1337,1.844,8.0043,1.0179,6.7504,1.8396,4.2885,4.9981,1.4233,-1.4211,2.4756,4.6042,3.9624,5.4141,5.1694,-0.74279,17.929,12.054,17.054,4.8852,5.7442,7.7754,1.0173,20.992,6.6799,4.0259,1.2784,3.3411,-2.6807,0.29678,3.8845,5.7014,6.7526,2.0576,0.47953,0.20421,0.67861,7.5435,5.3436,4.2415,6.7981,0.92695,0.152,2.8214,1.8451,4.2959,7.2029,1.9869,0.14454,9.0551,0.61705])

# Scikit-learnの組み込み最小二乗法
from sklearn.linear_model import LinearRegression
print("LinearRegression on sklearn.linear_model ...")
regr = LinearRegression(normalize=False) #Trueにした場合は平均とL2ノルムでスケール調整が行われる
regr.fit(X.reshape(-1, 1), y) #データが1変数の場合はreshapeするのが大事
print("Intercept:", regr.intercept_) #切片（定数項）
print("Coefficients: ", regr.coef_) #xの係数
predicts = regr.predict(np.array([3.5, 7]).reshape(-1, 1))
print("For population = [35k, 70k], we predict a profit of \n", predicts * 10000)
print()

忙しい人向け。これで全部結果が出ます。SGDRegressorが勾配降下法を使うのに対して、LinearRegressionは正規方程式で計算します。
正規方程式は一発で解が出て便利ですが、逆行列の計算にO(N^3)のオーダーがかかるので、万、十万単位のデータを扱うときなど、データ数が多いときは最急降下法を使ったほうが速くなる、とのことです（講義より）。

LinearRegression on sklearn.linear_model ...
Intercept: -3.89578087831
Coefficients:  [ 1.19303364]
For population = [35k, 70k], we predict a profit of
 [  2798.36876352  44554.54631015]

##（おすすめ2）sklearn.linear_modelのSGDRegressorを使う

# Scikit-learnの組み込み勾配降下法による回帰
from sklearn.linear_model import SGDRegressor
print("SGDRegressor on sklearn.linear_model ...")
#shuffle=Falseにすると最急降下法になるはず。max_iterはScikit-learn0.19以降
regr = SGDRegressor(eta0=0.01, learning_rate="constant", max_iter=1500, shuffle=False)
regr.fit(X.reshape(-1, 1), y)
print("Intercept:", regr.intercept_)
print("Coefficients: ", regr.coef_)
predicts = regr.predict(np.array([3.5, 7]).reshape(-1, 1))
print("For population = [35k, 70k], we predict a profit of \n", predicts * 10000)
print()

このSGDRegressorは確率的勾配降下法なのですが、シャッフルをしなければ最急降下法に近い処理になるので、最急降下法としても使えます。自分で実装する場合と設定値は同じになるよう努めましたが、内部の実装が違うため答えの値は変わります。設定値を変えていろいろ試してみてください。

SGDRegressor on sklearn.linear_model ...
Intercept: [-3.84811446]
Coefficients:  [ 1.05704926]
For population = [35k, 70k], we predict a profit of
 [ -1484.42048023  35512.30359183]

##自分で実装する
暇な人向け。自分で確認したい場合の方法。1回やれば十分なので、あとは上記の組み込みの方法に任せましょう。

###1.データをプロットする

# プロット
plt.plot(X, y, "o", color="r")
plt.show()

###2.コスト関数

# コスト関数
def compute_cost(X, y, theta):
    return np.sum((np.dot(X, theta) - y) ** 2) / 2 / len(y)

print("Testing the cost function ...")
X = np.array((np.ones(len(y)), X)).T
theta = np.zeros(2)
J = compute_cost(X, y, theta)
print("With theta = [0, 0]\nCost computed =", J)
J = compute_cost(X, y, np.array([-1,2]))
print("With theta = [-1, 2]\nCost computed =", J)
print()

Testing the cost function ...
With theta = [0, 0]
Cost computed = 32.0727338775
With theta = [-1, 2]
Cost computed = 54.242455082

###3.最急降下法

def gradient_descent(X, y, theta, alpha, num_iters):
    J_history = []
    theta_out = theta
    for i in range(num_iters):
        theta_out -= alpha / len(y) * np.dot(X.T, np.dot(X, theta_out) - y)
        J_history.append(compute_cost(X, y, theta_out))
    return theta_out, J_history

print("Running Gradient Descent ...")
iterations = 1500;
alpha = 0.01;
theta, J_history = gradient_descent(X, y, theta, alpha, iterations)
print("Theta found by gradient descent:", theta)

Running Gradient Descent ...
Theta found by gradient descent: [-3.63029144  1.16636235]

###4.近似直線のプロット

# 近似直線のプロット
plt.plot(X[:,1], y, "o", label="Training data", color="r")
plt.plot(X[:,1], np.dot(X, theta), label="Linear regression", color="b")
plt.legend()
plt.show()

###5.推定値

# xが 35,000 70,000のときの推定値
predict1 = np.dot(np.array([1, 3.5]), theta)
print("For population = 35,000, we predict a profit of ", predict1 * 10000)
predict2 = np.dot(np.array([1, 7]), theta)
print("For population = 70,000, we predict a profit of ", predict2 * 10000)

For population = 35,000, we predict a profit of  4519.7678677
For population = 70,000, we predict a profit of  45342.4501294

###6.誤差関数をプロット

# Jのグラフ化
theta0_vals = np.linspace(-10, 10, 100)
theta1_vals = np.linspace(-1, 4, 100)
theta0, theta1 = np.meshgrid(theta0_vals, theta1_vals)
J_vals = np.zeros((len(theta0_vals), len(theta1_vals)))
for i in range(len(theta0_vals)):
    for j in range(len(theta1_vals)):
        t = np.array([theta0_vals[i], theta1_vals[j]])
        J_vals[i, j] = compute_cost(X, y, t)
# Surface Plot
fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(theta0, theta1, J_vals, rstride=1, cstride=1, cmap=cm.coolwarm)
plt.show()
print()

#重回帰分析
同様に重回帰分析も行います。
##（おすすめ1）sklearn.linear_modelのLinearRegressionを使う

import numpy as np
import matplotlib.pyplot as plt

# データ
X_data = np.array([[2104,3],[1600,3],[2400,3],[1416,2],[3000,4],[1985,4],[1534,3],[1427,3],[1380,3],[1494,3],[1940,4],[2000,3],[1890,3],[4478,5],[1268,3],[2300,4],[1320,2],[1236,3],[2609,4],[3031,4],[1767,3],[1888,2],[1604,3],[1962,4],[3890,3],[1100,3],[1458,3],[2526,3],[2200,3],[2637,3],[1839,2],[1000,1],[2040,4],[3137,3],[1811,4],[1437,3],[1239,3],[2132,4],[4215,4],[2162,4],[1664,2],[2238,3],[2567,4],[1200,3],[852,2],[1852,4],[1203,3]])
y = np.array([399900,329900,369000,232000,539900,299900,314900,198999,212000,242500,239999,347000,329999,699900,259900,449900,299900,199900,499998,599000,252900,255000,242900,259900,573900,249900,464500,469000,475000,299900,349900,169900,314900,579900,285900,249900,229900,345000,549000,287000,368500,329900,314000,299000,179900,299900,239500])

# Scikit-learnの組み込み最小二乗法
from sklearn.linear_model import LinearRegression
print("LinearRegression on sklearn.linear_model ...")
regr = LinearRegression(normalize=False)
regr.fit(X_data, y)
print("Intercept:", regr.intercept_) #切片（定数項）
print("Coefficients: ", regr.coef_) #xの係数
print("Predicted price of a 1650 sq-ft, 3 br house  ... (using LinearRegression):")
predict = regr.predict(np.array([[1650, 3]]))
print("$", predict)
print()

使い方は単回帰のときと全く変わりません。

LinearRegression on sklearn.linear_model ...
Intercept: 89597.9095428
Coefficients:  [  139.21067402 -8738.01911233]
Predicted price of a 1650 sq-ft, 3 br house  ... (using LinearRegression):
$ [ 293081.4643349]

##（おすすめ2）sklearn.linear_modelのSGDRegressorを使う

# データの標準化
def feature_normalize(X):
    mu = np.mean(X, axis=0)
    sigma = np.std(X, axis=0)
    X_norm = (X-mu)/sigma
    return X_norm, mu, sigma

X, mu, sigma = feature_normalize(X_data)

# Scikit-learnの組み込み勾配降下法による回帰
from sklearn.linear_model import SGDRegressor
print("SGDRegressor on sklearn.linear_model ...")
regr = SGDRegressor(eta0=0.01, learning_rate="constant", max_iter=400, shuffle=False)
regr.fit(X, y)
print("Intercept:", regr.intercept_)
print("Coefficients: ", regr.coef_)
predict = regr.predict((np.array([[1650, 3]] - mu) / sigma))
print("Predicted price of a 1650 sq-ft, 3 br house  ... (using SGDRegressor):")
print("$", predict)
print()

最急降下法で求めるときは、データのスケールをカラム単位で調整すると収束が速くなるそうです。これはお任せでやる場合も、自分で実装する場合もこれは変わりません。スケール調整する場合は、推定値を求めるときに、代入するパラメーターのスケールを調整してから代入しないといけないので注意しましょう。

SGDRegressor on sklearn.linear_model ...
Intercept: [ 339319.80384871]
Coefficients:  [ 106385.95873423   -7991.38807423]
Predicted price of a 1650 sq-ft, 3 br house  ... (using SGDRegressor):
$ [ 293673.79997156]

スケール調整済みなので、スケールを調整せずにLinearRegressionで求めた場合の係数とこの係数では、意味合いが違います。具体的な値を代入して推定値を求めると、LinearRegressionの場合と似たような結果が得られることを確認できます。

##自分で実装する
データのスケール調整は既に行っているものとします。SGDRegressorの例の続きからの想定です。

###1.最急降下法

# Xに切片を追加
X = np.c_[np.ones(len(X)), X]

# コスト関数
def compute_cost_multi(X, y, theta):
    return np.sum((np.dot(X, theta) - y) ** 2) / 2 / len(y)

# 最急降下法
def gradient_descent_multi(X, y, theta, alpha, num_iters):
    J_history = []
    theta_out = theta
    for i in range(num_iters):
        theta_out -= alpha / len(y) * np.dot(X.T, np.dot(X, theta_out) - y)
        J_history.append(compute_cost_multi(X, y, theta_out))
    return theta_out, J_history

print("Running gradient descent ...")

alpha = 0.01
num_iters = 400
theta = np.zeros(3)
theta, J_history = gradient_descent_multi(X, y, theta, alpha, num_iters)

###2.収束のプロット

# 収束グラフのプロット
plt.plot(np.arange(len(J_history)), J_history, color="b")
plt.xlabel("Number of iterations")
plt.ylabel("Cost J")
plt.show()

###3.最急降下法の結果

# 最急降下法の結果の表示（スケール調整しているので調整前とは結果が違うので注意）
print("Theta computed from gradient descent:")
print(theta)
print("Predicted price of a 1650 sq-ft, 3 br house  ... (using gradient descent):")
params = np.append(1, (np.array([1650, 3]) - mu) / sigma)
price = np.dot(theta, params.T)
print("$", price)
print()

Running gradient descent ...
Theta computed from gradient descent:
[ 334302.06399328   99411.44947359    3267.01285407]
Predicted price of a 1650 sq-ft, 3 br house  ... (using gradient descent):
$ 289221.547371

実装が悪かったのか、SGDRegressorのときとだいぶ異なる係数が得られました。

###4.正規方程式

# 正規方程式
def normal_eqn(X, y):
    return np.dot(np.dot(np.linalg.pinv(np.dot(X.T, X)), X.T), y)
X = np.c_[np.ones(len(X_data)), X_data]
theta = normal_eqn(X, y)
print("Theta computed from the normal equations:")
print(theta)
print("Predicted price of a 1650 sq-ft, 3 br house  ... (using normal equations):")
price = np.dot(np.array([1, 1650, 3]), theta)
print("$", price)

やっていることはLinearRegressionと同じなので、同一の結果が得られたことに注目（誤差の関係で多少ずれる）。

Theta computed from the normal equations:
[ 89597.90954361    139.21067402  -8738.01911255]
Predicted price of a 1650 sq-ft, 3 br house  ... (using normal equations):
$ 293081.464335

次回に進む
Coursera Machine LearningをPythonで実装 - [Week3]ロジスティック回帰

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up