1. 準備
import os
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
# Titanicデータ読込
df = pd.read_csv(Titanic)
# 説明変数X, 目的変数yの作成
# 簡単のため文字列は除外、欠損を含む列も除外
X = df.drop(["PassengerId", "Survived"], axis=1)
X = X.select_dtypes(exclude="object").dropna(how="any", axis=1)
y = df["Survived"]
# train-test分割
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# 予測値と正解値を作成
model = DecisionTreeClassifier(random_state=0)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_true = y_test
2. 混同行列
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_true, y_pred)
print(cm)
#[[110 29]
# [ 35 49]]
TN, FP, FN, TP = cm.flatten()
print(TN, FP, FN, TP)
# 110 29 35 49
正解 | 予測 | 分類 | 例数 |
---|---|---|---|
陰性 | 陰性 (Negative) | 真陰性(True Negative: TN) | 110 |
陰性 | 陽性 (Positive) | 偽陽性(False Positive: FP) | 29 |
陽性 | 陰性 (Negative) | 偽陰性(False Negative: FN) | 35 |
陽性 | 陽性 (Positive) | 真陽性(True Positive: TP) | 49 |
3. 正解率 (accuracy)
from sklearn.metrics import accuracy_score
ac = accuracy_score(y_true, y_pred)
print(ac.round(3))
# 0.713
すべてのうち、正解した割合
$accuracy = (TP+TN)/(TP+TN+FP+FN) = (49+110)/(49+110+29+35) = 0.713$
4. 適合率 (precision)
from sklearn.metrics import precision_score
ps = precision_score(y_true, y_pred)
print(ps.round(3))
# 0.628
陽性と予測されたうち、正解した割合
$precision = TP/(TP+FP) = 49/(49+29) = 0.628$
5. 再現率 (recall)
from sklearn.metrics import recall_score
rs = recall_score(y_true, y_pred)
print(rs.round(3))
# 0.583
実際に陽性のうち、正解した割合
$recall = TP/(TP+FN) = 49/(49+35) = 0.583$
6. F1値
from sklearn.metrics import f1_score
fs = f1_score(y_true, y_pred)
print(fs.round(3))
# 0.605
適合率と再現率の調和平均
$F1 = 2* Precision* Recall/(Precision+Recall) = 2* TP/(2* TP+FP+FN) = 2* 49/(2* 49+29+35) = 0.605$