LoginSignup
0
0

More than 3 years have passed since last update.

KaggleでLightGBMを使ってみた

Last updated at Posted at 2020-11-30

KaggleでLightGBMを使ってみたので,その際得た知識をメモしておきたい.

train.py
# 正解ラベル
y_train = train_df['answered_correctly']
X_train = train_df.drop(['answered_correctly', 'user_answer'], axis=1)

models = []

# oof_train = []でも良いと思う.
oof_train = np.zeros((len(X_train),))

# K分割交差検証.汎化性能を上げる.非復元抽出.性能評価のため.
cv = KFold(n_splits=5, shuffle=True, random_state=0)

categorical_features = ['user_id', 'content_type_id', 'task_container_id', 'prior_question_had_explanation']

params = {
    'objective': 'binary', # 最小化したい目的関数.binaryは二値分類で使う.
    'max_bin': 300,
    'learning_rate': 0.05,
    'num_leaves': 40 # 決定木の数
}

# enumerateでインデックスを取り出している.が,要らない.
# KFold.split()で分割したいデータを渡す.と,インデックスが帰ってくる.
for fold_id, (train_index, valid_index) in enumerate(cv.split(X_train)):
# loc: 名前で参照
# 番号で参照
    X_tr = X_train.loc[train_index, :]
    X_val = X_train.loc[valid_index, :]
    y_tr = y_train[train_index]
    y_val = y_train[valid_index]

    lgb_train = lgb.Dataset(X_tr, y_tr, categorical_feature=categorical_features)
    lgb_eval = lgb.Dataset(X_val, y_val, reference=lgb_train, # 訓練データと紐づける
categorical_feature=categorical_features) # 設定しないと自動で設定される.

    model = lgb.train(
        params, lgb_train,
        valid_sets=[lgb_train, lgb_eval],
        verbose_eval=10, # 学習過程を表示する
        num_boost_round=1000, # 計算回数
        early_stopping_rounds=10 # 過学習時に計算を打ち切る
    )

# 各クラスに属する確率を返す.
# アーリーストッピングで最も性能の良かったパラメータが選択される.
    oof_train[valid_index] = model.predict(X_val, num_iteration=model.best_iteration)
    models.append(model)

# AUC... Area Under an ROC Curve. 面積.閾値を変えつつ真陽性と偽陽性の正解率を算出.
  roc_auc_score(y_train, oof_train)
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0