More than 1 year has passed since last update.

F1値など

Python

Last updated at 2022-03-17Posted at 2022-03-15

1. 準備

import os
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# Titanicデータ読込
df = pd.read_csv(Titanic)

# 説明変数X, 目的変数yの作成
# 簡単のため文字列は除外、欠損を含む列も除外
X = df.drop(["PassengerId", "Survived"], axis=1)
X = X.select_dtypes(exclude="object").dropna(how="any", axis=1)
y = df["Survived"]

# train-test分割
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# 予測値と正解値を作成
model = DecisionTreeClassifier(random_state=0)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_true = y_test

2. 混同行列

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_true, y_pred)
print(cm)
#[[110  29]
# [ 35  49]]
TN, FP, FN, TP = cm.flatten()
print(TN, FP, FN, TP)
# 110 29 35 49

正解	予測	分類	例数
陰性	陰性 (Negative)	真陰性(True Negative: TN)	110
陰性	陽性 (Positive)	偽陽性(False Positive: FP)	29
陽性	陰性 (Negative)	偽陰性(False Negative: FN)	35
陽性	陽性 (Positive)	真陽性(True Positive: TP)	49

3. 正解率 (accuracy)

from sklearn.metrics import accuracy_score
ac = accuracy_score(y_true, y_pred)
print(ac.round(3))
# 0.713

すべてのうち、正解した割合
$accuracy = (TP+TN)/(TP+TN+FP+FN) = (49+110)/(49+110+29+35) = 0.713$

4. 適合率 (precision)

from sklearn.metrics import precision_score
ps = precision_score(y_true, y_pred)
print(ps.round(3))
# 0.628

陽性と予測されたうち、正解した割合
$precision = TP/(TP+FP) = 49/(49+29) = 0.628$

5. 再現率 (recall)

from sklearn.metrics import recall_score
rs = recall_score(y_true, y_pred)
print(rs.round(3))
# 0.583

実際に陽性のうち、正解した割合
$recall = TP/(TP+FN) = 49/(49+35) = 0.583$

6. F1値

from sklearn.metrics import f1_score
fs = f1_score(y_true, y_pred)
print(fs.round(3))
# 0.605

適合率と再現率の調和平均
$F1 = 2* Precision* Recall/(Precision+Recall) = 2* TP/(2* TP+FP+FN) = 2* 49/(2* 49+29+35) = 0.605$

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up