More than 1 year has passed since last update.

PythonとLightGBMを使った予測モデルの精度向上：ベイジアン最適化によるパラメータチューニング

Posted at 2023-05-18

こんにちは、今日はPythonのLightGBMモデルの学習におけるパラメータチューニングの一例として、ベイジアン最適化を用いた方法を紹介します。ここでは、Kaggleの「Titanic: Machine Learning from Disaster」というコンペティションを利用したサンプルコードを共有します。

まず、モジュールとデータを読み込みます：

import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.model_selection import cross_validate, GroupKFold
from bayes_opt import BayesianOptimization
from sklearn.metrics import mean_squared_error

train_data = pd.read_csv('./train.csv')
test_data = pd.read_csv('./test.csv')

次に、LightGBMのパラメータを設定します：

lgb_params = {
    'objective': 'binary',
    'learning_rate': 0.01,
    'max_depth': 6,
    'num_leaves': 31,
    'min_child_samples': 20,
    'reg_alpha': 1,
    'reg_lambda': 1,
    'n_estimators': 1000,
    'colsample_bytree': 0.7,
    'subsample': 0.8,
    'random_state': 0
}

続いて、ベイジアン最適化のための関数を定義します：

def lgb_cross_val(learning_rate, num_leaves, max_depth, min_child_samples, reg_alpha, reg_lambda, n_estimators, colsample_bytree, subsample):
    params = {
        'objective': 'binary',
        'learning_rate': learning_rate,
        'max_depth': int(max_depth),
        'num_leaves': int(num_leaves),
        'min_child_samples': int(min_child_samples),
        'reg_alpha': reg_alpha,
        'reg_lambda': reg_lambda,
        'n_estimators': int(n_estimators),
        'colsample_bytree': colsample_bytree,
        'subsample': subsample,
        'random_state': 0
    }
    model = lgb.LGBMClassifier(**params)
    results = cross_validate(model, X, y, cv=5, scoring='accuracy', return_train_score=True)
    return results["test_score"].mean()

そして、ベイジアン最適化を実行します：

optimizer = BayesianOptimization(lgb_cross_val, {
    'learning_rate': (1e-4, 1e-2),
    'num_leaves': (10, 100),
    'max_depth': (2, 15),
    'min_child_samples': (5, 100),
    'reg_alpha': (0, 10),
    'reg_lambda': (0, 100),
    'n_estimators': (500, 3000),
    'colsample_bytree': (0.1, 1),
    'subsample': (0.1, 1)
}, random_state=0)

optimizer.maximize(init_points=10, n_iter=15)

最後に、最適化されたパラメータを使ってモデルを訓練します：

lgb_params.update(optimizer.max['params'])
lgb_params["max_depth"] = int(lgb_params["max_depth"])
lgb_params["num_leaves"] = int(lgb_params["num_leaves"])
lgb_params["min_child_samples"] = int(lgb_params["min_child_samples"])
lgb_params["n_estimators"] = int(lgb_params["n_estimators"])

model = lgb.LGBMClassifier(**lgb_params)
model.fit(X_train, y_train)

preds = model.predict(X_test)
accuracy = (preds == y_test).mean()
print('Accuracy:', accuracy)

以上がPythonのLightGBMモデルを学習する際にベイジアン最適化を使用してパラメータチューニングを行う方法の一例です。モデルの精度はデータセットやタスクによりますので、この方法が必ずしも最良の結果をもたらすとは限りません。それでも、このアプローチはパラメータ探索を効率的に行う一つの手法として、試す価値があると思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up