LoginSignup
17
13

More than 5 years have passed since last update.

sklearnのPipelineを使うとコードをシンプルに書けるらしい

Posted at

PipelineなしとPipelineありのコードを比較しよう

まだPipelineの恩恵があまりわかっていませんが
PipelineなしとPipelineありのコードを比較したいと思います。
Pipelineありのコードのほうが、今後いくつものモデルを作成するときに
便利そうだな……という感覚はあります。

Python 3.6.4

Pipelineなし

test_normal.py
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# set dataframe
dataset = load_boston()
X = pd.DataFrame(dataset.data, columns=dataset.feature_names)
y = pd.DataFrame(dataset.target, columns=['y'])

# 訓練とテストに分割
X_train,X_test,y_train,y_test = train_test_split(X,
                                                 y,
                                                 test_size=0.20,
                                                 random_state=1)

# 整形
y_train = y_train.as_matrix().ravel()
y_test = y_test.as_matrix().ravel()

# 標準化
sc = StandardScaler()
sc.fit(X_train)
X_train = sc.transform(X_train)
X_test = sc.transform(X_test)

# 学習
ols = LinearRegression()
ols.fit(X_train, y_train)

# 寄与率
print(r2_score(y_test, ols.predict(X_test)))

Pipelineあり

test_pipeline.py
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# set dataframe
dataset = load_boston()
X = pd.DataFrame(dataset.data, columns=dataset.feature_names)
y = pd.DataFrame(dataset.target, columns=['y'])

# 訓練とテストに分割
X_train,X_test,y_train,y_test = train_test_split(X,
                                                 y,
                                                 test_size=0.20,
                                                 random_state=1)

# 整形
y_train = y_train.as_matrix().ravel()
y_test = y_test.as_matrix().ravel()

# Pipelineによる置き換え
from sklearn.pipeline import Pipeline

# Pipelineは任意のヘッダがつけられる?
# sc:StandardSclar(標準化)
# es:Estimator(推定器)

pip_ols = Pipeline([('sc',StandardScaler()),
                     ('es',LinearRegression())])

# 学習
pip_ols.fit(X_train, y_train)

# 寄与率
print(r2_score(y_test, pip_ols.predict(X_test)))
17
13
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
17
13