1
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

PythonとLightGBMを使った予測モデルの精度向上:ベイジアン最適化によるパラメータチューニング

Posted at

こんにちは、今日はPythonのLightGBMモデルの学習におけるパラメータチューニングの一例として、ベイジアン最適化を用いた方法を紹介します。ここでは、Kaggleの「Titanic: Machine Learning from Disaster」というコンペティションを利用したサンプルコードを共有します。

まず、モジュールとデータを読み込みます:

import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.model_selection import cross_validate, GroupKFold
from bayes_opt import BayesianOptimization
from sklearn.metrics import mean_squared_error

train_data = pd.read_csv('./train.csv')
test_data = pd.read_csv('./test.csv')

次に、LightGBMのパラメータを設定します:

lgb_params = {
    'objective': 'binary',
    'learning_rate': 0.01,
    'max_depth': 6,
    'num_leaves': 31,
    'min_child_samples': 20,
    'reg_alpha': 1,
    'reg_lambda': 1,
    'n_estimators': 1000,
    'colsample_bytree': 0.7,
    'subsample': 0.8,
    'random_state': 0
}

続いて、ベイジアン最適化のための関数を定義します:

def lgb_cross_val(learning_rate, num_leaves, max_depth, min_child_samples, reg_alpha, reg_lambda, n_estimators, colsample_bytree, subsample):
    params = {
        'objective': 'binary',
        'learning_rate': learning_rate,
        'max_depth': int(max_depth),
        'num_leaves': int(num_leaves),
        'min_child_samples': int(min_child_samples),
        'reg_alpha': reg_alpha,
        'reg_lambda': reg_lambda,
        'n_estimators': int(n_estimators),
        'colsample_bytree': colsample_bytree,
        'subsample': subsample,
        'random_state': 0
    }
    model = lgb.LGBMClassifier(**params)
    results = cross_validate(model, X, y, cv=5, scoring='accuracy', return_train_score=True)
    return results["test_score"].mean()

そして、ベイジアン最適化を実行します:

optimizer = BayesianOptimization(lgb_cross_val, {
    'learning_rate': (1e-4, 1e-2),
    'num_leaves': (10, 100),
    'max_depth': (2, 15),
    'min_child_samples': (5, 100),
    'reg_alpha': (0, 10),
    'reg_lambda': (0, 100),
    'n_estimators': (500, 3000),
    'colsample_bytree': (0.1, 1),
    'subsample': (0.1, 1)
}, random_state=0)
optimizer.maximize(init_points=10, n_iter=15)

最後に、最適化されたパラメータを使ってモデルを訓練します:

lgb_params.update(optimizer.max['params'])
lgb_params["max_depth"] = int(lgb_params["max_depth"])
lgb_params["num_leaves"] = int(lgb_params["num_leaves"])
lgb_params["min_child_samples"] = int(lgb_params["min_child_samples"])
lgb_params["n_estimators"] = int(lgb_params["n_estimators"])

model = lgb.LGBMClassifier(**lgb_params)
model.fit(X_train, y_train)

preds = model.predict(X_test)
accuracy = (preds == y_test).mean()
print('Accuracy:', accuracy)

以上がPythonのLightGBMモデルを学習する際にベイジアン最適化を使用してパラメータチューニングを行う方法の一例です。モデルの精度はデータセットやタスクによりますので、この方法が必ずしも最良の結果をもたらすとは限りません。それでも、このアプローチはパラメータ探索を効率的に行う一つの手法として、試す価値があると思います。

1
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?