3
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

scikit-learnを使用した機械学習でよく使う手法覚書き(初学者用)

Last updated at Posted at 2020-05-04

#はじめに
機械学習をする時によく使う手法をまとめました。随時、加筆修正していきます。

#前処理
###標準化

StandardScaler
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()  #インスタンス作成
scaler.fit(pd_sample)      #パラメータの計算(平均、標準偏差など)
pd_sample_sc = scaler.transform(pd_sample)  #データ変換

#pd_sample_sc = scaler.fit_transform(pd_sample)でまとめて実行できる

###ダミー変数化

get_dummies
#pandas.get_dummies()関数
pd_sample = pd.get_dummies(pd_sample)

###学習データ/評価データ分割

train_test_split
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

#教師なし学習
###クラスタリング

KMeans
from skleran.cluster import KMeans

kmeans = KMeans(n_clusters=4, random_state=0)  #K-meansモデル定義
clusters = kmeans.fit(pd_sample)               #クラスタリング実行
pd_sample['cluster'] = clusters.labels_        #クラスタリング結果の取得

###次元削減

PCA
from sklearn.decomposition import PCA

pca = PCA(n_components=2)         #PCAモデル定義
pca.fit(pd_sample)                #主成分分析
x_pca = pca.transform(pd_sample)  #データ変換(戻り値はarray型オブジェクト)
x_pca = pd.DataFrame(x_pca)       #DataFrame型に格納し直す

#x_pca = pca.fit_transform(pd_sample)でまとめて実行できる

#教師あり学習
###回帰モデル

LinearRegression
from sklearn.linear_model import LinearRegression()

model = LinearRegreession()  #モデル初期化
model.fit(X_train, y_train)  #モデル作成

#学習用データ、評価用データの精度検証
print(model.score(X_train, y_train))
print(model.score(X_test, y_test))

#説明変数ごとに寄与度を表す係数を出力
coef = pd.DataFrame({"feature_names":X.columns, "coefficient":model.coef_})
print(coef)

#未知のデータに対して回帰値を予測
print(model.predict(x_pred))

###分類モデル

DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(random_state=0)  #モデル初期化
model.fit(X_train, y_train)                     #モデル作成

#学習用データ、評価用データの精度検証
print(model.score(X_train, y_train))
print(model.score(X_test, y_test))

#説明変数ごとに寄与度を表す係数を出力
importance = pd.DataFrame({"feature_names":X.columns, "coefficient":model.feature_importances_})
print(importance)

#未知のデータに対して分類値を予測
print(model.predict(x_pred))

#0/1の予測確率を出力
print(model.predict_proba(x_pred))

###分類モデルの検証

#正解率 = (TP+TN)/(TP+FN+FP+TN)
model.score(X_test, y_test)

#混合行列
from skleran.metrics import confusion_matrix
matrix = confusion_matrix(X_test, y_test)

#混合行列のヒートマップ
import seaborn as sns
sns.heatmap(matrix, annot=True, cmap='Blues')
plt.xlabel('Prediction')
plt.ylabel('Target')
plt.show()

#適応率 = TP/(TP+FP)
from sklearn.metrics import precision_score
precision_score(X_test, y_test)

#再現率 = TP/(TP+FN)
from sklearn.metrics import recall_score
recall_score(X_test, y_test)

#F値  = 2*(Precision*Recall)/(Precision+Recall)
from sklearn.metrics import f1_score
f1_score(X_test, y_test)
3
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?