More than 1 year has passed since last update.

ELOコンペにトライ part4_予測モデルの構築

Last updated at 2023-08-18Posted at 2023-08-16

これは何の記事？

ELO MERCHANT CATEGORY RECOMMENDATIONコンペの学習ログです
予測モデルを構築します（▶前回の記事）

環境

VSCode

記事の流れ

必要なライブラリをインストール
重回帰モデル
学習結果
ミニ考察
submit

必要なライブラリをインストール

そもそも途中からVSCodeに変更したので、ライブラリをインストールできていない

pip install statsmodels

pip install scikit-learn

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder

線形回帰モデルを用いる

# 目的変数と説明変数を分割、並行して不要なカラムを削除
x = train_df.drop(["card_id", "first_active_month", "target"], axis=1)
y = train_df["target"]
x.head()

# "hist_category_3_freq"はA,B,Cと3種類あるから、LabelEncordingを実施
le = LabelEncoder()
le = le.fit(x["hist_category_3_freq"])
x["hist_category_3_freq"] = le.transform(x["hist_category_3_freq"])

x.head() # A=0,B=1,C=2で一列で作れる

# データを訓練データとテストデータに分割
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# 線形回帰モデルを学習
model_linear = LinearRegression()
model_linear.fit(x, y)

# テストデータで予測を行う
y_pred = model_linear.predict(X_test)

# 二乗平均平方根誤差で評価
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print("Test Mean Squared Error:", rmse)

import statsmodels.api as sm

# 決定係数、偏回帰係数などを求める
# add_constantを使用して切片の列を追加
X_train_with_constant = sm.add_constant(X_train)

# OLS（最小二乗法）モデルを作成
model_least_squares = sm.OLS(y_train, X_train_with_constant)

results = model_least_squares.fit()
print(results.summary())

学習結果

targetの予測結果は以下。予測精度はかなり低い模様。

評価指標	評価値	補足
RMSE	1.693	0に近いほど精度高い
決定係数	0.009	1に近いほど精度高い
自由度調整済み決定係数	0.009	決定係数の代わり

ミニ考察

95％信頼区間（[0.025 0.975]の箇所）を確認すると、p値が0.05より大きいカラムがいくつも存在している。これらはtargetの予測に対して関係性が極めて低い。

よって、targetに関係の高いカラムは以下…？

targetとの関連性が高い？
feature_1
feature_2
merchant_variety
installments_mean
hist_purchase_date_term
hist_purchase_date_freq
hist_lag_mean
hist_lag_min
hist_lag_max
hist_lag_var
authorized_flag_Y
hist_category_1_ratio_N

上記の特徴量がtargetに関連するので、施策時に参照するKPIとしておけそう

～part5に続く～

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up