LoginSignup
0
0

Kaggle ~タイタニック沈没事故の生存予測~ 決定木#3

Posted at

こんにちは!:confused:
先日に続いて決定木のデータ解析をやっていきます!
前回の投稿では前処理で標準化したことでスコアが0.76になりました。
今回は変数を少しいじってみます!

プログラムの内容

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import missingno as msno
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler

train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

train["Age"] = train["Age"].fillna(train["Age"].median())
train.replace({"male":0,"female":1},inplace=True)
train = pd.concat([train, pd.get_dummies(train["Embarked"], prefix="Embarked")], axis=1).drop(columns=["Embarked"])
train["FamCnt"] = train["SibSp"] + train["Parch"]
train = train.drop(columns=["Name", "Ticket", "Cabin", "SibSp", "Parch"])
columns_to_standardize = ['Age', 'FamCnt', 'Fare']
scaler = StandardScaler()
train[columns_to_standardize] = scaler.fit_transform(train[columns_to_standardize])

SibSPとParchを足してFamCntという列を作成しました。

外れ値を除去

train = train[(train['Age'] > -3) & (train['Age'] < 3)]
train = train[(train['FamCnt'] > -3) & (train['FamCnt'] < 3)]
train = train[(train['Fare'] > -3) & (train['Fare'] < 3)]

testデータの前処理

test['Age'] = test['Age'].fillna(test['Age'].median())
test['Fare'] = test['Fare'].fillna(test['Fare'].median())
test.replace({"male":0,"female":1},inplace=True)
test = pd.concat([test, pd.get_dummies(test["Embarked"], prefix="Embarked")], axis=1).drop(columns=["Embarked"])
test["FamCnt"] = test["SibSp"] + test["Parch"]
test = test.drop(columns=["Name", "Ticket", "Cabin", "SibSp", "Parch"])

予測モデルの作成と予測の実施

train_features = train[['Pclass', 'Sex', 'Age', 'Fare', 'FamCnt', 'Embarked_C', 'Embarked_Q', 'Embarked_S']].values
train_target = train['Survived'].values
model = DecisionTreeClassifier(max_depth = 5, class_weight = 'balanced', random_state=0)
model.fit(train_features, train_target)
test_features = test[['Pclass', 'Sex', 'Age', 'Fare', 'FamCnt', 'Embarked_C', 'Embarked_Q', 'Embarked_S']].values
predict_test_target = model.predict(test_features)

csvファイルの作成

submission = pd.DataFrame({'PassengerId': test['PassengerId'], 'Survived': predict_test_target})
submission.to_csv('submission_Titanic_DecisionTreeClassifier.csv', index = False )

スコアの結果

image.png

悲惨な結果となりました・・・
同乗している家族を学習にかけたことで予測結果に悪影響を及ぼした可能性が考えられます。
また改めて考え直してから投稿したいと思います。
ありがとうございました!

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0