予測モデルの生成
必要なライブラリをインポートします。
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
データファイルを読み込みます。
train = pd.read_csv("train_prep.csv")
test = pd.read_csv("test_prep.csv")
train.head(1)
Unnamed: 0 | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | SexInt | AgeFillNa | FareFillNa | EmbarkedInt | NumFamily | IsAlone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.25 | NaN | S | 0 | 22.0 | 7.25 | 0 | 1 | 0 |
学習用データの説明変数(インプット)と目的変数(アウトプット)を用意します。
expvars = ["Pclass","SexInt","AgeFillNa","FareFillNa","IsAlone"] # 説明変数のリスト
X_train = train.copy()[expvars] # 説明変数
Y_train = train["Survived"]# 目的変数
学習データによって学習させ、ランダムフォレストの予測モデルを構築します。
clf = RandomForestClassifier()
clf.fit(X_train, Y_train)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=10, n_jobs=1, oob_score=False, random_state=None,
verbose=0, warm_start=False)
予測モデルの確認
構築したモデルでテストデータの乗客の生存を予測します。
X_train = test.copy()[expvars] # 説明変数
prediction = clf.predict(X_train) # 生死の判定
pred_proba = clf.predict_proba(X_train) # 生存確率の判定
乗客データと合わせて可視化してみます。
test["PredictedSurvival"] = prediction
test["SurvivalProba"] = pred_proba[:,1]
test[["PassengerId","Pclass","Name","Sex","Age","SibSp","Parch","Fare","Embarked","PredictedSurvival","SurvivalProba"]]
PassengerId | Pclass | Name | Sex | Age | SibSp | Parch | Fare | Embarked | PredictedSurvival | SurvivalProba | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892 | 3 | Kelly, Mr. James | male | 34.5 | 0 | 0 | 7.8292 | Q | 0 | 0.000000 |
1 | 893 | 3 | Wilkes, Mrs. James (Ellen Needs) | female | 47.0 | 1 | 0 | 7.0000 | S | 0 | 0.300000 |
2 | 894 | 2 | Myles, Mr. Thomas Francis | male | 62.0 | 0 | 0 | 9.6875 | Q | 1 | 0.700000 |
3 | 895 | 3 | Wirz, Mr. Albert | male | 27.0 | 0 | 0 | 8.6625 | S | 1 | 0.700000 |
4 | 896 | 3 | Hirvonen, Mrs. Alexander (Helga E Lindqvist) | female | 22.0 | 1 | 1 | 12.2875 | S | 1 | 0.700000 |
5 | 897 | 3 | Svensson, Mr. Johan Cervin | male | 14.0 | 0 | 0 | 9.2250 | S | 0 | 0.200000 |
6 | 898 | 3 | Connolly, Miss. Kate | female | 30.0 | 0 | 0 | 7.6292 | Q | 0 | 0.100000 |
7 | 899 | 2 | Caldwell, Mr. Albert Francis | male | 26.0 | 1 | 1 | 29.0000 | S | 0 | 0.000000 |
8 | 900 | 3 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | 18.0 | 0 | 0 | 7.2292 | C | 1 | 1.000000 |
9 | 901 | 3 | Davies, Mr. John Samuel | male | 21.0 | 2 | 0 | 24.1500 | S | 0 | 0.000000 |
10 | 902 | 3 | Ilieff, Mr. Ylio | male | NaN | 0 | 0 | 7.8958 | S | 0 | 0.200000 |
11 | 903 | 1 | Jones, Mr. Charles Cresson | male | 46.0 | 0 | 0 | 26.0000 | S | 0 | 0.350000 |
12 | 904 | 1 | Snyder, Mrs. John Pillsbury (Nelle Stevenson) | female | 23.0 | 1 | 0 | 82.2667 | S | 1 | 1.000000 |
13 | 905 | 2 | Howard, Mr. Benjamin | male | 63.0 | 1 | 0 | 26.0000 | S | 0 | 0.000000 |
14 | 906 | 1 | Chaffee, Mrs. Herbert Fuller (Carrie Constance... | female | 47.0 | 1 | 0 | 61.1750 | S | 1 | 1.000000 |
15 | 907 | 2 | del Carlo, Mrs. Sebastiano (Argenia Genovesi) | female | 24.0 | 1 | 0 | 27.7208 | C | 1 | 1.000000 |
16 | 908 | 2 | Keane, Mr. Daniel | male | 35.0 | 0 | 0 | 12.3500 | Q | 0 | 0.025000 |
17 | 909 | 3 | Assaf, Mr. Gerios | male | 21.0 | 0 | 0 | 7.2250 | C | 1 | 0.800000 |
18 | 910 | 3 | Ilmakangas, Miss. Ida Livija | female | 27.0 | 1 | 0 | 7.9250 | S | 1 | 0.600000 |
19 | 911 | 3 | Assaf Khalil, Mrs. Mariana (Miriam")" | female | 45.0 | 0 | 0 | 7.2250 | C | 0 | 0.400000 |
20 | 912 | 1 | Rothschild, Mr. Martin | male | 55.0 | 1 | 0 | 59.4000 | C | 0 | 0.200000 |
21 | 913 | 3 | Olsen, Master. Artur Karl | male | 9.0 | 0 | 1 | 3.1708 | S | 0 | 0.200000 |
22 | 914 | 1 | Flegenheim, Mrs. Alfred (Antoinette) | female | NaN | 0 | 0 | 31.6833 | S | 1 | 0.900000 |
23 | 915 | 1 | Williams, Mr. Richard Norris II | male | 21.0 | 0 | 1 | 61.3792 | C | 0 | 0.500000 |
24 | 916 | 1 | Ryerson, Mrs. Arthur Larned (Emily Maria Borie) | female | 48.0 | 1 | 3 | 262.3750 | C | 1 | 1.000000 |
25 | 917 | 3 | Robins, Mr. Alexander A | male | 50.0 | 1 | 0 | 14.5000 | S | 0 | 0.000000 |
26 | 918 | 1 | Ostby, Miss. Helene Ragnhild | female | 22.0 | 0 | 1 | 61.9792 | C | 1 | 1.000000 |
27 | 919 | 3 | Daher, Mr. Shedid | male | 22.5 | 0 | 0 | 7.2250 | C | 1 | 0.800000 |
28 | 920 | 1 | Brady, Mr. John Bertram | male | 41.0 | 0 | 0 | 30.5000 | S | 1 | 0.800000 |
29 | 921 | 3 | Samaan, Mr. Elias | male | NaN | 2 | 0 | 21.6792 | C | 0 | 0.000000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
388 | 1280 | 3 | Canavan, Mr. Patrick | male | 21.0 | 0 | 0 | 7.7500 | Q | 0 | 0.000000 |
389 | 1281 | 3 | Palsson, Master. Paul Folke | male | 6.0 | 3 | 1 | 21.0750 | S | 0 | 0.400000 |
390 | 1282 | 1 | Payne, Mr. Vivian Ponsonby | male | 23.0 | 0 | 0 | 93.5000 | S | 0 | 0.100000 |
391 | 1283 | 1 | Lines, Mrs. Ernest H (Elizabeth Lindsey James) | female | 51.0 | 0 | 1 | 39.4000 | S | 1 | 0.900000 |
392 | 1284 | 3 | Abbott, Master. Eugene Joseph | male | 13.0 | 0 | 2 | 20.2500 | S | 0 | 0.100000 |
393 | 1285 | 2 | Gilbert, Mr. William | male | 47.0 | 0 | 0 | 10.5000 | S | 0 | 0.000000 |
394 | 1286 | 3 | Kink-Heilmann, Mr. Anton | male | 29.0 | 3 | 1 | 22.0250 | S | 0 | 0.300000 |
395 | 1287 | 1 | Smith, Mrs. Lucien Philip (Mary Eloise Hughes) | female | 18.0 | 1 | 0 | 60.0000 | S | 1 | 1.000000 |
396 | 1288 | 3 | Colbert, Mr. Patrick | male | 24.0 | 0 | 0 | 7.2500 | Q | 0 | 0.000000 |
397 | 1289 | 1 | Frolicher-Stehli, Mrs. Maxmillian (Margaretha ... | female | 48.0 | 1 | 1 | 79.2000 | C | 1 | 1.000000 |
398 | 1290 | 3 | Larsson-Rondberg, Mr. Edvard A | male | 22.0 | 0 | 0 | 7.7750 | S | 0 | 0.000000 |
399 | 1291 | 3 | Conlon, Mr. Thomas Henry | male | 31.0 | 0 | 0 | 7.7333 | Q | 0 | 0.000000 |
400 | 1292 | 1 | Bonnell, Miss. Caroline | female | 30.0 | 0 | 0 | 164.8667 | S | 1 | 1.000000 |
401 | 1293 | 2 | Gale, Mr. Harry | male | 38.0 | 1 | 0 | 21.0000 | S | 0 | 0.000000 |
402 | 1294 | 1 | Gibson, Miss. Dorothy Winifred | female | 22.0 | 0 | 1 | 59.4000 | C | 1 | 1.000000 |
403 | 1295 | 1 | Carrau, Mr. Jose Pedro | male | 17.0 | 0 | 0 | 47.1000 | S | 0 | 0.100000 |
404 | 1296 | 1 | Frauenthal, Mr. Isaac Gerald | male | 43.0 | 1 | 0 | 27.7208 | C | 0 | 0.100000 |
405 | 1297 | 2 | Nourney, Mr. Alfred (Baron von Drachstedt")" | male | 20.0 | 0 | 0 | 13.8625 | C | 0 | 0.300000 |
406 | 1298 | 2 | Ware, Mr. William Jeffery | male | 23.0 | 1 | 0 | 10.5000 | S | 0 | 0.000000 |
407 | 1299 | 1 | Widener, Mr. George Dunton | male | 50.0 | 1 | 1 | 211.5000 | C | 0 | 0.500000 |
408 | 1300 | 3 | Riordan, Miss. Johanna Hannah"" | female | NaN | 0 | 0 | 7.7208 | Q | 1 | 1.000000 |
409 | 1301 | 3 | Peacock, Miss. Treasteall | female | 3.0 | 1 | 1 | 13.7750 | S | 1 | 0.800000 |
410 | 1302 | 3 | Naughton, Miss. Hannah | female | NaN | 0 | 0 | 7.7500 | Q | 1 | 0.929808 |
411 | 1303 | 1 | Minahan, Mrs. William Edward (Lillian E Thorpe) | female | 37.0 | 1 | 0 | 90.0000 | Q | 1 | 1.000000 |
412 | 1304 | 3 | Henriksson, Miss. Jenny Lovisa | female | 28.0 | 0 | 0 | 7.7750 | S | 1 | 0.533333 |
413 | 1305 | 3 | Spector, Mr. Woolf | male | NaN | 0 | 0 | 8.0500 | S | 0 | 0.200000 |
414 | 1306 | 1 | Oliva y Ocana, Dona. Fermina | female | 39.0 | 0 | 0 | 108.9000 | C | 1 | 1.000000 |
415 | 1307 | 3 | Saether, Mr. Simon Sivertsen | male | 38.5 | 0 | 0 | 7.2500 | S | 0 | 0.000000 |
416 | 1308 | 3 | Ware, Mr. Frederick | male | NaN | 0 | 0 | 8.0500 | S | 0 | 0.200000 |
417 | 1309 | 3 | Peter, Master. Michael J | male | NaN | 1 | 1 | 22.3583 | C | 0 | 0.100000 |
418 rows × 11 columns