0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

毎日kaggle(spaceship編)

Posted at

毎日kaggle頑張る(spaceship編)

目的:transportedされたかどうか予測。

step1 データの俯瞰
メタ読みするとキャビンの位置などが重要?

step2 データの欠損処理など
AGEは平均値、目的地と出発地は最頻値を使用。その他は0に置き換え。
名前とIDのカラムは除去。
キャビンは/ごとに3つの要素にわけ新しいカラムへそれぞれ格納。

train[["Deck", "Cabin_num", "Side"]] = train["Cabin"].str.split("/", expand=True)
try:
    train = train.drop('Cabin', axis=1)
except KeyError:
    print("Field does not exist")

step3データを検証用に分割

#正解ラベルの設定
label = 'Transported'
#20%をテストに使う。
def split_dataset(dataset, test_ratio=0.20):
  test_indices = np.random.rand(len(dataset)) < test_ratio
  #~は反転を意味する。テストに使わない。
  return dataset[~test_indices], dataset[test_indices]

train_ds_pd, valid_ds_pd = split_dataset(train)
print("{} examples in training, {} examples in testing.".format(
    len(train_ds_pd), len(valid_ds_pd)))
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_ds_pd, label=label)
valid_ds = tfdf.keras.pd_dataframe_to_tf_dataset(valid_ds_pd, label=label)

step4ランダムフォレストモデルで分析

evaluation = rf.evaluate(x=valid_ds,return_dict=True)

for name, value in evaluation.items():
  print(f"{name}: {value:.4f}")

どの特徴量が強いのか
cryosleepが121回で一番使用されていました。

inspector = rf.make_inspector()
inspector.variable_importances()["NUM_AS_ROOT"]
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?