More than 5 years have passed since last update.

機械学習・分類関係テクニック

Last updated at 2020-05-31Posted at 2020-05-31

ロジスティック回帰

from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X = data.data
y = 1 - data.target
# ラベルの0と1を反転

X = X[:, :10]
from sklearn.linear_model import LogisticRegression
model_lor = LogisticRegression(max_iter=1000)
model_lor.fit(X, y)
y_pred = model_lor.predict(X)

混合行列

・2行×2列の行列が表示
・実データと予測データの行列が作成される
・左上が(0, 0)、右下が(1, 1)

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y, y_pred)
print(cm)

正解率

・予測結果全体に対し、正しく予測できたものの割合

from sklearn.metrics import accuracy_score
accuracy_score(y, y_pred)

適合率

・ポジティブと予測したものに対し、正しくポジティブと予測できたものの割合
　(右列側)

from sklearn.metrics import precision_score
precision_score(y, y_pred)

再現率

・実際にポジティブのものに対し、正しくポジティブと予測できたものの割合
　(下行側)


from sklearn.metrics import f1_score
f1_score(y, y_pred)

F値

・再現率と適合率の調和平均
・適合率と再現率はトレードオフの関係

from sklearn.metrics import f1_score
f1_score(y, y_pred)

予測確率

・0に分類されるか、1に分類されるかを0-1の連続値で表す手法(足したら1に等しくなる)
・scilit-learnはデフォルトで0.5が閾値に設定されている


# model_lor.predict_proba(X)

import numpy as np
y_pred2 = (model_lor.predict_proba(X)[:, 1]>0.1).astype(np.int)
print(confusion_matrix(y, y_pred2))

print(accuracy_score(y, y_pred2))
print(recall_score(y, y_pred2))

ROC曲線・AUC(要勉強)

・AUC:Area Under the Curve
・ROC:Recceiver Operating Characteristic
・AUCはROC曲線の下側面積
・ROC曲線・・・
　横軸：偽陽性率(False Positive Rate)、FP
　縦軸：真陽性率(True Positive Rate)、TP


from sklearn.metrics import roc_curve
probas = model_lor.predict_proba(X)
fpr, tpr, thresholds = roc_curve(y, probas[:, 1])

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

fig, ax = plt.subplots()
fig.set_size_inches(4.8, 5)

ax.step(fpr, tpr, 'gray')
ax.fill_between(fpr, tpr, 0, color='skyblue', alpha=0.8)
ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
ax.set_facecolor('xkcd:white')
plt.show()

from sklearn.metrics import roc_auc_score
roc_auc_score(y, probas[:, 1])

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up