More than 5 years have passed since last update.

Coursera Machine LearningをPythonで実装 - [Week3]ロジスティック回帰

Last updated at 2018-05-08Posted at 2018-04-25

Week2に引き続き、Week3をPythonで実装してみました。Octaveでの出力結果とできる限り同じになるように再現しています。

追記 gistからソースファイルをダウンロードできます

正則化なし、自分で実装　https://gist.github.com/koshian2/8fb75a0eecc89acd9357587dfd32487a
正則化なし、組み込み　https://gist.github.com/koshian2/2417fee334185624b8d72362afddae79
正則化あり、自分で実装　https://gist.github.com/koshian2/9969f09e987c7d6b54519fb5d4a4a3b2
正則化あり、組み込み　https://gist.github.com/koshian2/4d96660fd3bd27401884a220689fed45

これまでの目次

Coursera Machine LearningをPythonで実装 - [Week2]単回帰分析、重回帰分析

正則化なしロジスティック回帰

組み込みのsklearn.linear_model.LogisticRegressionを使う（おすすめ）

実践的にはこちらを使います。デフォルトで正則化ありなので注意しましょう！ 引数のCが正則化のパラメーターλの逆数なので、十分大きな値を代入すると正則化なしと同等になります。

import numpy as np
from sklearn.linear_model import LogisticRegression

# データ
X_data = np.array([[34.6236596245169,78.0246928153624],[30.286710768226,43.894997524001],[35.8474087699387,72.9021980270836],[60.1825993862097,86.3085520954682],[79.0327360507101,75.3443764369103],[45.0832774766833,56.3163717815305],[61.1066645368476,96.5114258848962],[75.0247455673888,46.5540135411653],[76.0987867022625,87.420569719268],[84.4328199612003,43.533393310721],[95.8615550709357,38.2252780579509],[75.0136583895824,30.6032632342801],[82.3070533739948,76.481963302356],[69.3645887597093,97.718691961886],[39.5383391436722,76.0368108511588],[53.9710521485623,89.207350137502],[69.0701440628302,52.7404697301676],[67.9468554771161,46.6785741067312],[70.6615095549943,92.9271378936483],[76.9787837274749,47.5759636497553],[67.3720275457087,42.8384383202917],[89.6767757507207,65.7993659274523],[50.534788289883,48.855811527642],[34.2120609778678,44.2095285986628],[77.9240914545704,68.9723599933059],[62.2710136700463,69.9544579544758],[80.1901807509566,44.8216289321835],[93.114388797442,38.800670337132],[61.8302060231259,50.2561078924462],[38.7858037967942,64.9956809553957],[61.379289447425,72.8078873131709],[85.4045193941164,57.0519839762712],[52.1079797319398,63.1276237688171],[52.0454047683182,69.4328601204522],[40.2368937354511,71.1677480218487],[54.6351055542481,52.2138858806112],[33.9155001090688,98.8694357422061],[64.1769888749448,80.9080605867081],[74.7892529594154,41.5734152282443],[34.1836400264419,75.2377203360134],[83.9023936624915,56.3080462160532],[51.5477202690618,46.8562902634997],[94.4433677691785,65.5689216055905],[82.3687537571391,40.6182551597061],[51.0477517712886,45.82270145776],[62.2226757612018,52.0609919483667],[77.1930349260136,70.4582000018095],[97.7715992800023,86.7278223300282],[62.0730637966764,96.7688241241398],[91.5649744980744,88.6962925454659],[79.9448179406693,74.1631193504375],[99.2725269292572,60.9990309984498],[90.5467141139985,43.3906018065002],[34.5245138532,60.3963424583717],[50.2864961189907,49.8045388132305],[49.5866772163203,59.8089509945326],[97.6456339600776,68.861572724206],[32.577200168093,95.5985476138787],[74.2486913672159,69.8245712265719],[71.7964620586337,78.4535622451505],[75.3956114656803,85.7599366733161],[35.2861128152619,47.0205139472341],[56.2538174971162,39.2614725105801],[30.0588224466979,49.5929738672368],[44.6682617248089,66.4500861455891],[66.5608944724295,41.0920980793697],[40.4575509837516,97.5351854890993],[49.0725632190884,51.8832118207396],[80.2795740146699,92.1160608134408],[66.7467185694403,60.9913940274098],[32.7228330406032,43.3071730643006],[64.0393204150601,78.0316880201823],[72.3464942257992,96.227592967614],[60.4578857391895,73.0949980975803],[58.840956217268,75.8584483127904],[99.8278577969212,72.3692519338388],[47.2642691084817,88.4758649955978],[50.4581598028598,75.8098595298245],[60.4555562927153,42.5084094357221],[82.2266615778556,42.7198785371645],[88.9138964166533,69.8037888983547],[94.8345067243019,45.6943068025075],[67.3192574691752,66.5893531774791],[57.2387063156986,59.5142819801295],[80.3667560017127,90.9601478974695],[68.4685217859111,85.5943071045201],[42.0754545384731,78.8447860014804],[75.477702005339,90.4245389975396],[78.6354243489801,96.6474271688564],[52.348003987941,60.7695052560259],[94.0943311251679,77.1591050907389],[90.4485509709636,87.508791764847],[55.4821611406958,35.5707034722886],[74.4926924184304,84.8451368493013],[89.8458067072097,45.3582836109165],[83.4891627449823,48.3802857972817],[42.2617008099817,87.1038509402545],[99.3150088051039,68.7754094720661],[55.340017560037,64.9319380069486],[74.7758930009276,89.5298128951327]])
y = np.array([0,0,0,1,1,0,1,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,0,0,1,1,1,0,0,0,1,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1,1,1,1,1,1,0,0,0,1,0,1,1,1,0,0,0,0,0,1,0,1,1,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,0,1,1,0,1,1,0,1,1,1,1,1,1,1])

# 組み込みロジスティック回帰
# デフォルトで正則化が入るのでCに十分大きな値を代入すると正則化なしと同等になる
regr = LogisticRegression(max_iter=400, C=1e10)
regr.fit(X_data, y)
print("Intercept : ", regr.intercept_)
print("Coef : ", regr.coef_)
print("Expected theta (approx):")
print(" -25.161\n 0.206\n 0.201\n")

# 具体的な値で予測
# predict_probaで確率の推定になる、predictは0か1の推定
prob = regr.predict_proba(np.array([45, 85]).reshape(1, -1)) 
print("For a student with scores 45 and 85, we predict an admission probability of ")
# 1つ目の値はy=0である確率（要は不合格の確率）
print("[Not admitted, Admitted] = ", prob) 
print("Expected value of admitted: 0.775 +/- 0.002\n")

# 精度
print("Train Accuracy:")
print(regr.score(X_data, y)*100)
print("Expected accuracy (approx): 89.0")

使い方は重回帰のときと同じです。少し違うのは、確率の予測をしたいときはpredict_proba()を使います。predict()は0か1の予測になります。predict_proba()は値を2つ返し、1番目はy=0である確率、2番目はy=1の確率です。2つを足すと1になるはずです。ここではy=1（試験に合格するであろう）確率を出したいので2番目を見ます。

出力（solver=デフォルトのliblinear）

Intercept :  [-24.9560464]
Coef :  [[ 0.20459008  0.19981009]]
Expected theta (approx):
 -25.161
 0.206
 0.201

For a student with scores 45 and 85, we predict an admission probability of
[Not admitted, Admitted] =  [[ 0.22541844  0.77458156]]
Expected value of admitted: 0.775 +/- 0.002

Train Accuracy:
89.0
Expected accuracy (approx): 89.0

ちなみにLogisticRegressionのsolverの設定をデフォルトからのnewton-cgにすると切片や係数の値が変わります。この例の場合、精度は変わりませんが、Octaveでの結果により近い値となりました。

出力（solver="newton-cg"）

Intercept :  [-25.16132692]
Coef :  [[ 0.20623166  0.20147155]]
Expected theta (approx):
 -25.161
 0.206
 0.201

For a student with scores 45 and 85, we predict an admission probability of
[Not admitted, Admitted] =  [[ 0.22370936  0.77629064]]
Expected value of admitted: 0.775 +/- 0.002

Train Accuracy:
89.0
Expected accuracy (approx): 89.0

精度が60%近くまで落ちたアルゴリズムもありましたが、おそらく収束の速さや正則化のパラメーターに対する感度が違うのではないかと思われます。アルゴリズムや最大反復回数などを変えていろいろ試してみてください。

自分で実装

演習問題と同じように自分で実装する場合です。1回やれば十分なので、これを実践的に使うことはまずないと思います。

1.データの読み込みとプロット

import numpy as np
import matplotlib.pyplot as plt

from scipy.optimize import fmin

X_data = np.array([[34.6236596245169,78.0246928153624],[30.286710768226,43.894997524001],[35.8474087699387,72.9021980270836],[60.1825993862097,86.3085520954682],[79.0327360507101,75.3443764369103],[45.0832774766833,56.3163717815305],[61.1066645368476,96.5114258848962],[75.0247455673888,46.5540135411653],[76.0987867022625,87.420569719268],[84.4328199612003,43.533393310721],[95.8615550709357,38.2252780579509],[75.0136583895824,30.6032632342801],[82.3070533739948,76.481963302356],[69.3645887597093,97.718691961886],[39.5383391436722,76.0368108511588],[53.9710521485623,89.207350137502],[69.0701440628302,52.7404697301676],[67.9468554771161,46.6785741067312],[70.6615095549943,92.9271378936483],[76.9787837274749,47.5759636497553],[67.3720275457087,42.8384383202917],[89.6767757507207,65.7993659274523],[50.534788289883,48.855811527642],[34.2120609778678,44.2095285986628],[77.9240914545704,68.9723599933059],[62.2710136700463,69.9544579544758],[80.1901807509566,44.8216289321835],[93.114388797442,38.800670337132],[61.8302060231259,50.2561078924462],[38.7858037967942,64.9956809553957],[61.379289447425,72.8078873131709],[85.4045193941164,57.0519839762712],[52.1079797319398,63.1276237688171],[52.0454047683182,69.4328601204522],[40.2368937354511,71.1677480218487],[54.6351055542481,52.2138858806112],[33.9155001090688,98.8694357422061],[64.1769888749448,80.9080605867081],[74.7892529594154,41.5734152282443],[34.1836400264419,75.2377203360134],[83.9023936624915,56.3080462160532],[51.5477202690618,46.8562902634997],[94.4433677691785,65.5689216055905],[82.3687537571391,40.6182551597061],[51.0477517712886,45.82270145776],[62.2226757612018,52.0609919483667],[77.1930349260136,70.4582000018095],[97.7715992800023,86.7278223300282],[62.0730637966764,96.7688241241398],[91.5649744980744,88.6962925454659],[79.9448179406693,74.1631193504375],[99.2725269292572,60.9990309984498],[90.5467141139985,43.3906018065002],[34.5245138532,60.3963424583717],[50.2864961189907,49.8045388132305],[49.5866772163203,59.8089509945326],[97.6456339600776,68.861572724206],[32.577200168093,95.5985476138787],[74.2486913672159,69.8245712265719],[71.7964620586337,78.4535622451505],[75.3956114656803,85.7599366733161],[35.2861128152619,47.0205139472341],[56.2538174971162,39.2614725105801],[30.0588224466979,49.5929738672368],[44.6682617248089,66.4500861455891],[66.5608944724295,41.0920980793697],[40.4575509837516,97.5351854890993],[49.0725632190884,51.8832118207396],[80.2795740146699,92.1160608134408],[66.7467185694403,60.9913940274098],[32.7228330406032,43.3071730643006],[64.0393204150601,78.0316880201823],[72.3464942257992,96.227592967614],[60.4578857391895,73.0949980975803],[58.840956217268,75.8584483127904],[99.8278577969212,72.3692519338388],[47.2642691084817,88.4758649955978],[50.4581598028598,75.8098595298245],[60.4555562927153,42.5084094357221],[82.2266615778556,42.7198785371645],[88.9138964166533,69.8037888983547],[94.8345067243019,45.6943068025075],[67.3192574691752,66.5893531774791],[57.2387063156986,59.5142819801295],[80.3667560017127,90.9601478974695],[68.4685217859111,85.5943071045201],[42.0754545384731,78.8447860014804],[75.477702005339,90.4245389975396],[78.6354243489801,96.6474271688564],[52.348003987941,60.7695052560259],[94.0943311251679,77.1591050907389],[90.4485509709636,87.508791764847],[55.4821611406958,35.5707034722886],[74.4926924184304,84.8451368493013],[89.8458067072097,45.3582836109165],[83.4891627449823,48.3802857972817],[42.2617008099817,87.1038509402545],[99.3150088051039,68.7754094720661],[55.340017560037,64.9319380069486],[74.7758930009276,89.5298128951327]])
y = np.array([0,0,0,1,1,0,1,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,0,0,1,1,1,0,0,0,1,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1,1,1,1,1,1,0,0,0,1,0,1,1,1,0,0,0,0,0,1,0,1,1,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,0,1,1,0,1,1,0,1,1,1,1,1,1,1])

# データをプロットする
plt.plot(X_data[y==1, 0], X_data[y==1, 1], "b+", label="Admitted")
plt.plot(X_data[y==0, 0], X_data[y==0, 1], "yo", label="Not admitted")
plt.xlabel("Exam 1 score")
plt.ylabel("Exam 2 score")
plt.legend()
plt.show()

2.コスト関数の定義

# コスト関数
def cost_function(theta, X, y):
    h_theta = 1 / (1 + np.exp(np.dot(-X, theta)))
    J = np.sum(-y * np.log(h_theta) - (1 - y) * np.log(1 - h_theta)) / len(y)
    grad = np.zeros(len(theta))
    for j in range(len(theta)):
        grad[j] = sum((h_theta - y) * X[:, j]) / len(y)
    return J, grad

3.コスト関数のテスト

# コストと勾配のテスト
X = np.c_[np.ones(len(y)), X_data]
test_theta = np.array([-24, 0.2, 0.2])
cost, grad = cost_function(test_theta, X, y)
print("Cost at test theta:")
print(cost)
print("Expected cost (approx): 0.218")
print("Gradient at test theta:")
print(grad)
print("Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n")

出力

Cost at test theta:
0.218330193827
Expected cost (approx): 0.218
Gradient at test theta:
[ 0.04290299  2.56623412  2.64679737]
Expected gradients (approx):
 0.043
 2.566
 2.647

4.fminで最小化の例題（脱線）

Octaveのfminuncに相当する関数として、Scipy.optimizeにあるfmin関数があります。独特の使い方をするので少し慣れがいるかもしれません。

参考：https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html

ここでは、fminuncのドキュメントにある例題：関数$f(x) = 3x_1^2+2x_1x_2+x_2^2-4x_1+5x_2$の最小化を、Scipyのfminで解いてみます。

# fminuncの代替のscipy.optimize.fminを使う（解は[2.25,-4.75]）
# https://jp.mathworks.com/help/optim/ug/fminunc.html
def fun(x):
    return 3*x[0]**2 + 2*x[0]*x[1] + x[1]**2 - 4*x[0] + 5*x[1]
print(fmin(fun, np.array([1,1]), xtol=1e-8))
print()

出力

Optimization terminated successfully.
         Current function value: -16.375000
         Iterations: 72
         Function evaluations: 147
[ 2.25       -4.74999996]

「Optimization terminated successfully.」というのはfminが勝手に出してきたメッセージです（disp=Falseとすると非表示にすることもできます）。ほぼ期待通りの解が得られました。

5.コスト関数をfminで最小化

# fminで最適化
initial_theta = np.zeros(X.shape[1])
# 複数の返り値を最適化できないので別の関数でラップ
cost_wrap = lambda theta,X,y : cost_function(theta, X, y)[0]
result = fmin(cost_wrap, initial_theta, args=(X, y, ), full_output=True, disp=False)
theta, cost = result[0], result[1]
print("Cost at theta found by fmin:", cost)
print("Expected cost (approx): 0.203")
print("theta:")
print(theta)
print("Expected theta (approx):")
print(" -25.161\n 0.206\n 0.201\n")

こちらが本題。fminを使って最小化します。普段はθ以外のパラメータを返さないので、コストや反復回数が知りたいときはfull_output=Trueのオプションを加えます。

fmin自体が複数返り値がある関数をそのまま最適化しようとするとエラーになるので、一回別の関数でラップします。これについては拙作で恐縮ですが、以下の記事にまとめました。

参考：返り値が複数ある関数に対してscipy.optimize.fminするとValueErrorが出るときの解決策

出力

Cost at theta found by fmin: 0.20349770159
Expected cost (approx): 0.203
theta:
[-25.16130062   0.20623142   0.20147143]
Expected theta (approx):
 -25.161
 0.206
 0.201

6.決定境界のプロット

# 決定境界のプロット
plt.plot(X_data[y==1, 0], X_data[y==1, 1], "b+", label="Admitted")
plt.plot(X_data[y==0, 0], X_data[y==0, 1], "yo", label="Not admitted")
plot_x = np.array([np.min(X[:, 1]) - 2, np.max(X[:, 1]) + 2])
plot_y = (-1 / theta[2]) * (theta[1] * plot_x + theta[0])
plt.plot(plot_x, plot_y, label="Decision Boundary")
plt.xlabel("Exam 1 score")
plt.ylabel("Exam 2 score")
plt.legend()
plt.show()

7.シグモイド関数と予測用の関数の定義

# シグモイド関数
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# 予測用関数
def predict(theta, X):
    prob = sigmoid(np.dot(X, theta))
    return prob >= 0.5

8.予測と精度

# 予測と精度
prob = sigmoid(np.dot(np.array([1, 45, 85]), theta))
print("For a student with scores 45 and 85, we predict an admission probability of ")
print(prob)
print("Expected value: 0.775 +/- 0.002\n")

p = predict(theta, X)
print("Train Accuracy:")
print(np.mean(p == y)*100)
print("Expected accuracy (approx): 89.0")

出力

For a student with scores 45 and 85, we predict an admission probability of
0.776291590411
Expected value: 0.775 +/- 0.002

Train Accuracy:
89.0
Expected accuracy (approx): 89.0

正則化ありロジスティック回帰

組み込みのsklearn.linear_model.LogisticRegressionを使う（おすすめ）

正則化なしのところで、正則化がデフォルトである関数を使っていたので、そのままCのパラメーターを変えるだけです。ただし、Cはλの逆数なので、Cを大きくした場合は正則化の効果が弱く、小さくした場合は正則化の効果が強くなります。

オーバーフィッティングを体感するために、明らかに余分なパラメーターを作っています。これは実践的には全く踏まなくて良いプロセスです。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

X_data = np.array([[0.051267,0.69956],[-0.092742,0.68494],[-0.21371,0.69225],[-0.375,0.50219],[-0.51325,0.46564],[-0.52477,0.2098],[-0.39804,0.034357],[-0.30588,-0.19225],[0.016705,-0.40424],[0.13191,-0.51389],[0.38537,-0.56506],[0.52938,-0.5212],[0.63882,-0.24342],[0.73675,-0.18494],[0.54666,0.48757],[0.322,0.5826],[0.16647,0.53874],[-0.046659,0.81652],[-0.17339,0.69956],[-0.47869,0.63377],[-0.60541,0.59722],[-0.62846,0.33406],[-0.59389,0.005117],[-0.42108,-0.27266],[-0.11578,-0.39693],[0.20104,-0.60161],[0.46601,-0.53582],[0.67339,-0.53582],[-0.13882,0.54605],[-0.29435,0.77997],[-0.26555,0.96272],[-0.16187,0.8019],[-0.17339,0.64839],[-0.28283,0.47295],[-0.36348,0.31213],[-0.30012,0.027047],[-0.23675,-0.21418],[-0.06394,-0.18494],[0.062788,-0.16301],[0.22984,-0.41155],[0.2932,-0.2288],[0.48329,-0.18494],[0.64459,-0.14108],[0.46025,0.012427],[0.6273,0.15863],[0.57546,0.26827],[0.72523,0.44371],[0.22408,0.52412],[0.44297,0.67032],[0.322,0.69225],[0.13767,0.57529],[-0.0063364,0.39985],[-0.092742,0.55336],[-0.20795,0.35599],[-0.20795,0.17325],[-0.43836,0.21711],[-0.21947,-0.016813],[-0.13882,-0.27266],[0.18376,0.93348],[0.22408,0.77997],[0.29896,0.61915],[0.50634,0.75804],[0.61578,0.7288],[0.60426,0.59722],[0.76555,0.50219],[0.92684,0.3633],[0.82316,0.27558],[0.96141,0.085526],[0.93836,0.012427],[0.86348,-0.082602],[0.89804,-0.20687],[0.85196,-0.36769],[0.82892,-0.5212],[0.79435,-0.55775],[0.59274,-0.7405],[0.51786,-0.5943],[0.46601,-0.41886],[0.35081,-0.57968],[0.28744,-0.76974],[0.085829,-0.75512],[0.14919,-0.57968],[-0.13306,-0.4481],[-0.40956,-0.41155],[-0.39228,-0.25804],[-0.74366,-0.25804],[-0.69758,0.041667],[-0.75518,0.2902],[-0.69758,0.68494],[-0.4038,0.70687],[-0.38076,0.91886],[-0.50749,0.90424],[-0.54781,0.70687],[0.10311,0.77997],[0.057028,0.91886],[-0.10426,0.99196],[-0.081221,1.1089],[0.28744,1.087],[0.39689,0.82383],[0.63882,0.88962],[0.82316,0.66301],[0.67339,0.64108],[1.0709,0.10015],[-0.046659,-0.57968],[-0.23675,-0.63816],[-0.15035,-0.36769],[-0.49021,-0.3019],[-0.46717,-0.13377],[-0.28859,-0.060673],[-0.61118,-0.067982],[-0.66302,-0.21418],[-0.59965,-0.41886],[-0.72638,-0.082602],[-0.83007,0.31213],[-0.72062,0.53874],[-0.59389,0.49488],[-0.48445,0.99927],[-0.0063364,0.99927],[0.63265,-0.030612]])
y = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])

## 以下はオーバーフィッティングを確かめるため用
# オーバーフィッティングさせるためのパラメーター行列を作る
def map_feature(X1, X2):
    degree = 6
    out = np.ones(len(X1))
    for i in range(1, degree+1):
        for j in range(i+1):
            out = np.c_[out, X1**(i-j) * X2**j]
    return out

# 決定境界のプロット
def plot_decision_boundary(regr, X, y, title):
    u = np.linspace(-1, 1.5, 50)
    v = np.linspace(-1, 1.5, 50)
    z = np.zeros((len(v), len(u)))
    for i in range(len(u)):
        for j in range(len(v)):
            #グラフにするときは転置するように代入するのに注意
            z[j, i] = regr.predict_proba(map_feature(np.array([u[i]]), np.array([v[j]])))[0][1]

    plt.plot(X_data[y==1, 0], X_data[y==1, 1], "b+", label="y = 1")
    plt.plot(X_data[y==0, 0], X_data[y==0, 1], "yo", label="y = 0")
    plt.contour(u, v, z, levels=[0.5],label="Decision boundary")
    plt.xlabel("Microchip Test 1")
    plt.ylabel("Microchip Test 2")
    plt.legend()
    plt.title(title)

plt.figure(figsize=(10,8))
plt.subplots_adjust(wspace=0.2, hspace=0.3)

## ここからメイン
X = map_feature(X_data[:,0], X_data[:,1])
CC = np.array([1e10, 100, 1, 0.01])
for i, c in enumerate(CC):
    regr = LogisticRegression(C = c, solver="newton-cg")
    regr.fit(X, y)
    accur = np.around(regr.score(X, y) * 100, 2)
    #Cの値ごとに4分割プロット
    plt.subplot(2, 2, i+1)
    plot_decision_boundary(regr, X, y, f"built-in, λ-1={c}, accuracy={accur}")

plt.show()

左上＝オーバーフィット、λ＝1と100はちょうど良さげ、右下＝アンダーフィットです。predict_probaでの推定値が0.5以上のラインをプロットしています。オーバーフィットの何が問題かというと、訓練データに過度に最適化しているため、テストデータやより一般的なデータで適合しなくなるということです。

ここでのポイントはオーバーフィットしている場合は、見かけ上、訓練データに対する精度が高くなるということです。実際オーバーフィットしているケースでは、accuracyが最も高いです。なので、より厳密には訓練データに対する精度ではなく、テストデータに対する精度を見るべきです。

自分で実装

細かく流れを追いたい人向け。

1.データのプロット

import numpy as np
import matplotlib.pyplot as plt

from scipy.optimize import fmin_bfgs

X_data = np.array([[0.051267,0.69956],[-0.092742,0.68494],[-0.21371,0.69225],[-0.375,0.50219],[-0.51325,0.46564],[-0.52477,0.2098],[-0.39804,0.034357],[-0.30588,-0.19225],[0.016705,-0.40424],[0.13191,-0.51389],[0.38537,-0.56506],[0.52938,-0.5212],[0.63882,-0.24342],[0.73675,-0.18494],[0.54666,0.48757],[0.322,0.5826],[0.16647,0.53874],[-0.046659,0.81652],[-0.17339,0.69956],[-0.47869,0.63377],[-0.60541,0.59722],[-0.62846,0.33406],[-0.59389,0.005117],[-0.42108,-0.27266],[-0.11578,-0.39693],[0.20104,-0.60161],[0.46601,-0.53582],[0.67339,-0.53582],[-0.13882,0.54605],[-0.29435,0.77997],[-0.26555,0.96272],[-0.16187,0.8019],[-0.17339,0.64839],[-0.28283,0.47295],[-0.36348,0.31213],[-0.30012,0.027047],[-0.23675,-0.21418],[-0.06394,-0.18494],[0.062788,-0.16301],[0.22984,-0.41155],[0.2932,-0.2288],[0.48329,-0.18494],[0.64459,-0.14108],[0.46025,0.012427],[0.6273,0.15863],[0.57546,0.26827],[0.72523,0.44371],[0.22408,0.52412],[0.44297,0.67032],[0.322,0.69225],[0.13767,0.57529],[-0.0063364,0.39985],[-0.092742,0.55336],[-0.20795,0.35599],[-0.20795,0.17325],[-0.43836,0.21711],[-0.21947,-0.016813],[-0.13882,-0.27266],[0.18376,0.93348],[0.22408,0.77997],[0.29896,0.61915],[0.50634,0.75804],[0.61578,0.7288],[0.60426,0.59722],[0.76555,0.50219],[0.92684,0.3633],[0.82316,0.27558],[0.96141,0.085526],[0.93836,0.012427],[0.86348,-0.082602],[0.89804,-0.20687],[0.85196,-0.36769],[0.82892,-0.5212],[0.79435,-0.55775],[0.59274,-0.7405],[0.51786,-0.5943],[0.46601,-0.41886],[0.35081,-0.57968],[0.28744,-0.76974],[0.085829,-0.75512],[0.14919,-0.57968],[-0.13306,-0.4481],[-0.40956,-0.41155],[-0.39228,-0.25804],[-0.74366,-0.25804],[-0.69758,0.041667],[-0.75518,0.2902],[-0.69758,0.68494],[-0.4038,0.70687],[-0.38076,0.91886],[-0.50749,0.90424],[-0.54781,0.70687],[0.10311,0.77997],[0.057028,0.91886],[-0.10426,0.99196],[-0.081221,1.1089],[0.28744,1.087],[0.39689,0.82383],[0.63882,0.88962],[0.82316,0.66301],[0.67339,0.64108],[1.0709,0.10015],[-0.046659,-0.57968],[-0.23675,-0.63816],[-0.15035,-0.36769],[-0.49021,-0.3019],[-0.46717,-0.13377],[-0.28859,-0.060673],[-0.61118,-0.067982],[-0.66302,-0.21418],[-0.59965,-0.41886],[-0.72638,-0.082602],[-0.83007,0.31213],[-0.72062,0.53874],[-0.59389,0.49488],[-0.48445,0.99927],[-0.0063364,0.99927],[0.63265,-0.030612]])
y = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])

# データのプロット
plt.plot(X_data[y==1, 0], X_data[y==1, 1], "b+", label="y = 1")
plt.plot(X_data[y==0, 0], X_data[y==0, 1], "yo", label="y = 0")
plt.xlabel("Microchip Test 1")
plt.ylabel("Microchip Test 2")
plt.legend()
plt.show()

2.オーバーフィッティング再現用のパラメーターの作成

# オーバーフィッティングさせるためのパラメーター行列を作る
def map_feature(X1, X2):
    degree = 6
    out = np.ones(len(X1))
    for i in range(1, degree+1):
        for j in range(i+1):
            out = np.c_[out, X1**(i-j) * X2**j]
    return out

X = map_feature(X_data[:,0], X_data[:,1])
initial_theta = np.zeros(X.shape[1])
lambda_ = 1

オーバーフィッティングを再現するための操作なので、本質的に必要なものではありません。

3.正規化ありのコスト関数

# 正規化ありのコスト関数
def cost_function_reg(theta, X, y, lambda_):
    h_theta = 1 / (1 + np.exp(np.dot(-X, theta)))
    J = np.sum(-y * np.log(h_theta) - (1 - y) * np.log(1 - h_theta)) / len(y)
    J += lambda_ / 2 / len(y) * np.sum(theta[1:] ** 2) #θ0を正則化しない

    grad = np.zeros(len(theta))
    #θ0を正則化しない
    grad[0] = np.sum((h_theta - y) * X[:, 0]) / len(y)
    for j in range(1, len(theta)):
        grad[j] = np.sum((h_theta - y) * X[:, j]) / len(y) + lambda_ / len(y) * theta[j]
    return J, grad

$θ_0$を正則化しないのがややこしい。こういうのがあるので組み込みに任せてしまったほうがいいです。

4.コスト関数のテスト

# 正規化ありの初期コストと勾配
cost, grad = cost_function_reg(initial_theta, X, y, lambda_)
print("Cost at initial theta (zeros):", cost)
print("Expected cost (approx): 0.693")
print("Gradient at initial theta (zeros) - first five values only:")
print(grad[:5])
print("Expected gradients (approx) - first five values only:")
print(" 0.0085\n 0.0188\n 0.0001\n 0.0503\n 0.0115\n")

# λ＝10、θが全部1の場合のコストと勾配
test_theta = np.ones(X.shape[1])
cost, grad = cost_function_reg(test_theta, X, y, 10)
print("Cost at test theta (with lambda = 10):", cost)
print("Expected cost (approx): 3.16")
print("Gradient at test theta - first five values only:")
print(grad[:5])
print("Expected gradients (approx) - first five values only:")
print(" 0.3460\n 0.1614\n 0.1948\n 0.2269\n 0.0922\n")

出力

Cost at initial theta (zeros): 0.69314718056
Expected cost (approx): 0.693
Gradient at initial theta (zeros) - first five values only:
[  8.47457627e-03   1.87880932e-02   7.77711864e-05   5.03446395e-02
   1.15013308e-02]
Expected gradients (approx) - first five values only:
 0.0085
 0.0188
 0.0001
 0.0503
 0.0115

Cost at test theta (with lambda = 10): 3.16450933162
Expected cost (approx): 3.16
Gradient at test theta - first five values only:
[ 0.34604507  0.16135192  0.19479576  0.22686278  0.09218568]
Expected gradients (approx) - first five values only:
 0.3460
 0.1614
 0.1948
 0.2269
 0.0922

Octaveと全く同じです（それはそう）

5.最適化

# 最適化
# ここのλのパラメーターを変える
lambda_ = 1e-20
cost_wrap = lambda theta, X, y, lambda_ : cost_function_reg(theta, X, y, lambda_)[0]
# 詳細を見たければdisp=Trueにする
result = fmin_bfgs(cost_wrap, initial_theta, args=(X, y, lambda_, ), full_output=True, disp=False)
theta, cost = result[0], result[1]

fminのシンプレックス法があまりいい感じじゃなかったので、BFGS法で最小化しました。これで組み込みの場合と似た結果になります。

6.決定境界のプロット

# 決定境界のプロット
def plot_decision_boundary(theta, X, y, title):
    u = np.linspace(-1, 1.5, 50)
    v = np.linspace(-1, 1.5, 50)
    z = np.zeros((len(v), len(u)))
    for i in range(len(u)):
        for j in range(len(v)):
            #グラフにするときは転置するように代入するのに注意
            z[j, i] = np.dot(map_feature(np.array([u[i]]), np.array([v[j]])), theta)

    plt.plot(X_data[y==1, 0], X_data[y==1, 1], "b+", label="y = 1")
    plt.plot(X_data[y==0, 0], X_data[y==0, 1], "yo", label="y = 0")
    plt.contour(u, v, z, levels=[0],label="Decision boundary")
    plt.xlabel("Microchip Test 1")
    plt.ylabel("Microchip Test 2")
    plt.legend()
    plt.title(title)

7.精度

# シグモイド関数
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# 予測用関数
def predict(theta, X):
    prob = sigmoid(np.dot(X, theta))
    return prob >= 0.5

p = predict(theta, X)
# 精度
accur = np.around(np.mean(p == y)*100, 2)
print("Train Accuracy:", accur)
print("Expected accuracy (with lambda = 1, Octave): 83.1 (approx)")

plot_decision_boundary(theta, X, y, f"algo=bfgs, lambda={lambda_}, accuracy={accur}")
plt.show()

8.λごとにプロット

結果をλごとにプロットしてみました。組み込みの場合と比較して、λとCが逆数の関係になっているのがわかります。

λ＝1e-10（オーバーフィット：もう少し正則化すべき）

λ＝1（ちょうどいい）

λ＝100（アンダーフィット：もう少し正則化を弱めるべき）

実際はλ＝1だからちょうどいい、λ＝100だからダメということはなく、状況に応じてλの調整が必要になります。以上です。

次回に進む
Coursera Machine LearningをPythonで実装 - [Week4]ニューラルネットワーク(1)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up