matminerでマテリアルズインフォマティックス(3) XgBoostで鋼材の強度予測

Last updated at 2025-04-19Posted at 2025-04-19

強度予測の方法は・・

強度予測した結果をSHAPを使って、結果の解釈をしたいので、決定木系のアンサンブルであるXgBoostで降伏強度を予測します。
データは前処理して、降伏強度と元素組成のデータフレームになっています。

yield strength	Fe	C	Mn	Si	Cr	Ni
2411.5	0.62	0.000953	0.000521	0.00102	0.00011	0.192
1736.3	0.623	0.00854	0.000104	0.000203	0.147	9.71E-05

生成AI(Gemini)で予測プログラムを作成する

Geminiで予測プログラムを作成しました。指示は、こんな感じのお手軽な命令です。

yield strengthをyに、それ以外をxに入れて、yを予測する機械学習を行う。アルゴリズムはXgBoostです。ハイパーパラメーターはoptunaで最適化してください。結果をグラフに表してください。

この命令だけで、このプログラムが出来上がります。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import xgboost as xgb
import optuna

# Prepare the data
X = df.drop(columns="yield strength")
y = df["yield strength"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the objective function for Optuna
def objective(trial):
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 100, 1000),
        "max_depth": trial.suggest_int("max_depth", 3, 10),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3),
        "subsample": trial.suggest_float("subsample", 0.5, 1.0),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0),
        "min_child_weight": trial.suggest_int("min_child_weight", 1, 10),
        "objective": "reg:squarederror",  # Specify the objective function
        "random_state": 42
    }
    model = xgb.XGBRegressor(**params)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    return rmse

# Run Optuna optimization
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=200)  # Adjust n_trials as needed

# Train the model with the best hyperparameters
best_params = study.best_params
best_model = xgb.XGBRegressor(**best_params)
best_model.fit(X_train, y_train)

# Make predictions
y_pred_train = best_model.predict(X_train)
y_pred = best_model.predict(X_test)

# Evaluate the model
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Best RMSE: {rmse}")
print(f"Best Hyperparameters: {best_params}")

# Plot the results
plt.figure(figsize=(8, 6))
plt.scatter(y_train, y_pred_train, alpha=0.5)
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', linestyle='--')  # Add a diagonal line
plt.xlabel("Actual Yield Strength")
plt.ylabel("Predicted Yield Strength")
plt.title("Actual vs. Predicted Yield Strength (XGBoost)")
plt.grid(True)
plt.show()

optunaの繰り返し回数のみ増やしていますが、他は基本的にデフォルトです。
実行すると、こういうグラフが出てきます。

見事に予測できています。
R2 (Training Data): 0.971
R2 (Testing Data) : 0.845

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up