0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

python分析メモ

Posted at

モデルの作成方法デモ

モジュールの読み込み

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score, roc_curve
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
import numpy as np

データの読み込み

data = pd.read_csv("/content/train.csv", encoding = "shift_jis")

データ準備

X = data.drop(columns=['PassengerId', 'Survived'])
y = data['Survived']

訓練データとテストデータに分割

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

XGBoostモデルのトレーニング

xgb_model = xgb.XGBClassifier(
use_label_encoder=False,
eval_metric='logloss',
random_state=42
)
xgb_model.fit(X_train, y_train)

テストデータでの予測

y_pred = xgb_model.predict(X_test)
y_pred_prob = xgb_model.predict_proba(X_test)[:, 1]

評価指標

conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred_prob)

混同行列の出力

print("Confusion Matrix:")
print(pd.DataFrame(
conf_matrix,
index=["Actual Negative", "Actual Positive"],
columns=["Predicted Negative", "Predicted Positive"]
))

Classification Reportを表示

print("\nClassification Report:")
print(class_report)

AUCスコア

print(f"\nROC AUC Score: {roc_auc:.2f}")

ROC曲線の描画

fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label=f"ROC Curve (AUC = {roc_auc:.2f})", linewidth=2)
plt.plot([0, 1], [0, 1], 'k--', label="Random Guessing", linewidth=1)
plt.xlabel("False Positive Rate", fontsize=12)
plt.ylabel("True Positive Rate", fontsize=12)
plt.title("ROC Curve (XGBoost)", fontsize=14)
plt.legend(fontsize=12)
plt.grid(True)
plt.show()

データ加工

文字列の取得

Extract the string after the first comma in the "Name" column

df['Extracted_Name'] = df['Name'].str.split(',', n=1).str[1].str.strip()

Extract the string from the start until the first period ('.')

df['Title'] = df['Extracted_Name'].str.split('.', n=1).str[0].str.strip()

Name列の,の後の文字列を取得する。
→文字列の最初の.の前後に分けて、前を取得する。

標準正規化

標準正規化する列を指定

columns_to_normalize = ['A', 'B']

スケーラーの作成と適用

scaler = StandardScaler()
data[columns_to_normalize] = scaler.fit_transform(data[columns_to_normalize])

標準正規化する列を指定して、複数の列を一括で変換する

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?