毎日kaggle(spaceship編)

初心者

Posted at 2025-04-22

毎日kaggle頑張る(spaceship編)

目的:transportedされたかどうか予測。

step1 データの俯瞰
メタ読みするとキャビンの位置などが重要？

step2 データの欠損処理など
AGEは平均値、目的地と出発地は最頻値を使用。その他は0に置き換え。
名前とIDのカラムは除去。
キャビンは/ごとに3つの要素にわけ新しいカラムへそれぞれ格納。

train[["Deck", "Cabin_num", "Side"]] = train["Cabin"].str.split("/", expand=True)
try:
    train = train.drop('Cabin', axis=1)
except KeyError:
    print("Field does not exist")

step3データを検証用に分割

#正解ラベルの設定
label = 'Transported'
#20%をテストに使う。
def split_dataset(dataset, test_ratio=0.20):
  test_indices = np.random.rand(len(dataset)) < test_ratio
  #~は反転を意味する。テストに使わない。
  return dataset[~test_indices], dataset[test_indices]

train_ds_pd, valid_ds_pd = split_dataset(train)
print("{} examples in training, {} examples in testing.".format(
    len(train_ds_pd), len(valid_ds_pd)))
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_ds_pd, label=label)
valid_ds = tfdf.keras.pd_dataframe_to_tf_dataset(valid_ds_pd, label=label)

step4ランダムフォレストモデルで分析

evaluation = rf.evaluate(x=valid_ds,return_dict=True)

for name, value in evaluation.items():
  print(f"{name}: {value:.4f}")

どの特徴量が強いのか
cryosleepが121回で一番使用されていました。

inspector = rf.make_inspector()
inspector.variable_importances()["NUM_AS_ROOT"]

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up