More than 3 years have passed since last update.

Lassoでスパースモデリングするために、PythonでRのglmnetみたいなグラフを書いてみた

Last updated at 2020-08-14Posted at 2020-08-14

PythonでRのglmnetで描かれるみたいなグラフが書いてみたかった

PythonでLasso回帰の問題点は

とりあえずPythonでScikit-Learn使ってLasso回帰を計算している人はいますよね。私もPythonで計算しています。でも、Rのglmnetで計算している人のLassoのグラフを見るとうらやましく感じます。
Rのグラフは、こんなグラフです。。R/glmnet パッケージで LASSO によるスパース推定を行う方法から転載。

なんかわかりやすいですよね。$λ$(PyhonのScikit-Learnならalphaです)が変化していくと、どう回帰係数が変化していくか一目瞭然です。でも、こういうのはScikit-Learnには実装されていません。

私もこんなグラフ作りたーーーい！！と思って、今回pythonのスクリプトを作りました。こういうグラフがないとスパースモデリングなんてできないよね。Scikit-Learnで、スコアと回帰係数だけ出しても、Lassoで行うスパースモデリングの実力を見ていないような気がするんですよね。

Pythonでglmnetで出てくるみたいなグラフを作ってみた

そんなこんなで、For文使って、自分で作ってみました。

いろいろ考えましたが、べたなやり方で、numpyで作った数値をFor文で、alpha(lambda)にいれながら、Lasso回帰を繰り返して、回帰係数とスコアを計算しました。

# -*- coding: utf-8 -*-
 
from sklearn import datasets
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso


# ボストンのデータセットを読み込む 
boston = datasets.load_boston()

alpha_lasso = []
coef_list = []
intercept_list = []
train_score = []
test_score = []

# print(boston['feature_names'])
# 特徴量と目的変数をわける
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8,random_state = 0)

# 探索範囲を決めます

lasso_1 = np.linspace(0.01,100,1000)

for i in lasso_1:
    #Lasso回帰モデルを訓練して作成
    Lasso_regr = Lasso(alpha = i, max_iter=10000)

    Lasso_regr.fit(X_train,y_train)
    pre_train = Lasso_regr.predict(X_train)
    pre_test = Lasso_regr.predict(X_test)


    #結果の表示
    print("alpha=",i)
    print("訓練データにフィット")
    print("訓練データの精度 =", Lasso_regr.score(X_train, y_train))
    print("テストデータにフィット")
    print("テストデータの精度 =", Lasso_regr.score(X_test, y_test))

    alpha_lasso.append(i)
    coef_list.append(Lasso_regr.coef_)
    intercept_list.append(Lasso_regr.intercept_)
    train_score.append(Lasso_regr.score(X_train, y_train))
    test_score.append(Lasso_regr.score(X_test, y_test))

df_count = pd.Series(alpha_lasso,name = 'alpha')
df_coef= pd.DataFrame(coef_list,columns = boston.feature_names)
df_inter = pd.Series(intercept_list,name = 'intercept')
df_train_score = pd.Series(train_score,name = 'trian_score')
df_test_score = pd.Series(test_score,name = 'test_score')

# ここでalphaと回帰係数のグラフを作ります
plt.plot(df_count,df_coef)
plt.xscale('log')
plt.legend(labels = df_coef.columns,loc='lower right',fontsize=7)
plt.xlabel('alpha')
plt.ylabel('coefficient')
plt.title('alpha vs cosfficent graph like R/glmnet')

plt.show()

# ここでalphaと回帰係数のグラフを作ります
df_score = pd.concat([df_train_score,df_test_score], axis=1)
plt.plot(df_count,df_score)
plt.xscale('log')
plt.legend(labels = df_score.columns,loc='lower right',fontsize=8)
plt.xlabel('alpha')
plt.ylabel('r2_score')
plt.title('alpha vs score(train/test)')

plt.show()

できたグラフです。

こうやって、回帰係数をスパース化しているグラフとスコアのグラフを比較してみると、どのくらいにalphaをしたらよいか、一目でわかります。デフォルトの１ではなくて、０．５あたりが良いような気がします。これ以上大きいとスコアが落ちるし、これより小さくしてもスコアがあがらないのに、スパース化するときにいらないと判断した特徴量が入ってくる。だから０．５あたりがちょうどよいような感じですね。

Lassoで計算してスパースモデリングするなら！！

Lassoで計算してスパースモデリングするなら、Pythonでも、この2枚のグラフはいることがよくわかりました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up