LightGBM初期パラメータの決定境界

Posted at 2024-05-12

ふと「そういえばランダムフォレストとGBDTって仕組みが違うよな」って思ったので決定境界を見てどのように分類されているのか見てみたくなりました。

決定境界の関数

過去の記事から持ってくる

from sklearn import preprocessing
import matplotlib.pyplot as plt
import numpy as np
def showline_clf(x, y, model, modelname, x0="x0", x1="x1"):
    fig, ax = plt.subplots(figsize=(8, 6))
    X, Y = np.meshgrid(np.linspace(*ax.get_xlim(), 1000), np.linspace(*ax.get_ylim(), 1000))
    XY = np.column_stack([X.ravel(), Y.ravel()])
    x = preprocessing.minmax_scale(x)
    model.fit(x, y)
    Z = model.predict(XY).reshape(X.shape)
    plt.contourf(X, Y, Z, alpha=0.1, cmap="brg")
    plt.scatter(x[:, 0], x[:, 1], c=y, cmap="brg")
    plt.xlim(min(x[:, 0]), max(x[:, 0]))
    plt.ylim(min(x[:, 1]), max(x[:, 1]))
    plt.title(modelname)
    plt.colorbar()
    plt.xlabel(x0)
    plt.ylabel(x1)
    plt.show()

LightGBM

from lightgbm import LGBMClassifier
from sklearn.datasets import make_blobs

x, y = make_blobs(n_samples=300, centers=4,random_state=0, cluster_std=0.60)
model = LGBMClassifier()
model.fit(x, y)
showline_clf(x, y, model, "LightGBM", x0="x0", x1="x1")

もっとこう複雑なもんだと思っていたけど決定木とそう変わらないけど仕組み上そんなもんか。

ランダムフォレスト

コードはさっきの続きです。

from sklearn.ensemble import RandomForestClassifier

model2 = RandomForestClassifier()
model.fit(x, y)
showline_clf(x, y, model2, "RandomForest", x0="x0", x1="x1")

まあ学生時代からよく見てた決定境界だな・・・

Scikit-LearnのGradientBoostingClassifier

コードはさっきの続きです。

from sklearn.ensemble import GradientBoostingClassifier

model3 = GradientBoostingClassifier()
model3.fit(x, y)
showline_clf(x, y, model3, "GBDT(scikit-learn)", x0="x0", x1="x1")

多少LightGBMと比べて複雑になっているけど、それでもランダムフォレストと比べたらかなりシンプルに分類されている。
詳しい仕組みはこのサイトが分かりやすいです。

まとめ

可視化しないと分からないものがそこにある

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up