More than 1 year has passed since last update.

ランダムフォレストでirisデータを可視化

Python

Posted at 2022-05-14

特徴量重要度の産出までのコード備忘録として。
ハイパーパラメータのn_estimatorsはどうやら高いほどモデルの精度が上がる印象。実際irisではn_estimatorsが上がると特徴量重要度の1/2位が逆転する。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# データセット読み込み
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
x = iris.data
y = iris.target

# 学習データとテストデータに分割
x_train, x_test, y_train, y_test = train_test_split(x, y)

# ランダムフォレストのモデル構築
# n_estimators : 使用する決定木数(デフォルト10)
model = RandomForestClassifier(n_estimators=5000)
model.fit(x_train, y_train)

#特徴量の重要度
feature = model.feature_importances_
#特徴量の名前
label = df.columns[0:]
#特徴量の重要度順(降順)
indices = np.argsort(feature)[::1]

# プロット
x = range(len(feature))
y = feature[indices]
y_label = label[indices]
plt.barh(x, y, align = 'center')
plt.yticks(x, y_label)
plt.xlabel("importance_num")
plt.ylabel("label")
plt.show()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up