0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

ランダムフォレストでirisデータを可視化

Posted at

特徴量重要度の産出までのコード備忘録として。
ハイパーパラメータのn_estimatorsはどうやら高いほどモデルの精度が上がる印象。実際irisではn_estimatorsが上がると特徴量重要度の1/2位が逆転する。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# データセット読み込み
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
x = iris.data
y = iris.target

# 学習データとテストデータに分割
x_train, x_test, y_train, y_test = train_test_split(x, y)

# ランダムフォレストのモデル構築
# n_estimators : 使用する決定木数(デフォルト10)
model = RandomForestClassifier(n_estimators=5000)
model.fit(x_train, y_train)

#特徴量の重要度
feature = model.feature_importances_
#特徴量の名前
label = df.columns[0:]
#特徴量の重要度順(降順)
indices = np.argsort(feature)[::1]

# プロット
x = range(len(feature))
y = feature[indices]
y_label = label[indices]
plt.barh(x, y, align = 'center')
plt.yticks(x, y_label)
plt.xlabel("importance_num")
plt.ylabel("label")
plt.show()
0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?