@matatabipowerposted at 2023-06-27

xgboostで重要度を可視化しようとしたところ、ValueErrorがでた

Q&A

解決したいこと

追記：なんだか解決しました。
目的変数が３種類あるのにデータの中には２種類しかなかったことが原因だったようです。
２種類にしてrunしたところ重要度の可視化まで行けました。

signateの林型の分類をやっていたのですが、
それをもとに実際に解析したいデータが解析できるかどうか試してみました。

発生している問題・エラー

  File "C:\Users\名前\miniconda3\envs\pythonenv\lib\site-packages\xgboost\plotting.py", line 74, in plot_importance
    raise ValueError(
ValueError: Booster.get_score() results in empty.  This maybe caused by having all trees as decision dumps.

ここからエラーが出ます。

#重要度の可視化
xgb.plot_importance(bst)
plt.show()

print(pred.shape)
print(pred)

print(test_y, pred)

print(r2_score(test_y,pred))

その上流のコードは以下になります。

import xgboost as xgb
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report

forest = pd.read_csv("./python/train2.csv")

import seaborn as sns
forest_corr = forest.corr()
print(forest_corr)
sns.heatmap(forest_corr,vmax=1, vmin=-1, center=0)
import matplotlib.pyplot as plt
plt.show()

#訓練データとテストデータの取得
from sklearn.model_selection import train_test_split

forest_data = pd.DataFrame(forest, columns=["Elevation", "Aspect", "Slope", "Horizontal_Distance_To_Hydrology","Vertical_Distance_To_Hydrology", "Horizontal_Distance_To_Roadways", "Hillshade_9am",
"Hillshade_Noon","Hillshade_3pm","Horizontal_Distance_To_Fire_Points","Wilderness_Area1","Wilderness_Area2","Wilderness_Area3","Wilderness_Area4",
"Soil_Type1",
"Soil_Type2", "Soil_Type3", "Soil_Type4", "Soil_Type5", "Soil_Type6", "Soil_Type7", "Soil_Type8","Soil_Type9", "Soil_Type10", "Soil_Type11", "Soil_Type12", 
"Soil_Type13", "Soil_Type14", "Soil_Type15", "Soil_Type16", "Soil_Type17", "Soil_Type18", "Soil_Type19", "Soil_Type20", "Soil_Type21", "Soil_Type22", "Soil_Type23", "Soil_Type24",
"Soil_Type25", "Soil_Type26", "Soil_Type27", "Soil_Type28", "Soil_Type29", "Soil_Type30", "Soil_Type31", "Soil_Type32", "Soil_Type33", "Soil_Type34", "Soil_Type35", "Soil_Type36",
"Soil_Type37", "Soil_Type38", "Soil_Type39", "Soil_Type40"])
print(forest_data)
forest_target=pd.Series(forest["Cover_Type"])
print(forest_target)

#訓練データとテストデータの取得（テストが0.2,訓練が0.8）
from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(forest_data, 
                                                    forest_target, 
                                                    test_size=0.2, 
                                                    shuffle=True)

#xgboost用の型に変換する
dtrain = xgb.DMatrix(train_x, label=train_y)

#パラメータの設定　max_depth：木の最大深度　eta：学習率　objective：学習目的　num_class：クラス数
param = {'max_depth': 40, 'eta': 1, 'objective': 'multi:softmax', 'num_class':7}

#学習
num_round = 10
bst = xgb.train(param, dtrain, num_round)

#予測
dtest = xgb.DMatrix(test_x)
pred = bst.predict(dtest)


#精度の確認
from sklearn.metrics import accuracy_score
from sklearn.metrics import r2_score

score = accuracy_score(test_y, pred)
print('score:{0:.4f}'.format(score))

ここまでは問題なく動きますが、accuracy scoreは0点です。

自分で試したこと

重要度の可視化、以下を削除したところ動くので、重要度の可視化の部分に問題があると思います。
現在３０セットのデータしかなく、しかも今回は動くかどうか試すためにとりあえず５件のデータでやりました。
（５件あれば４：１に学習データとテストデータを分割できると思ったので）
ただ林型の分類ではちゃんと動くので、やはりデータが少なすぎることに起因するエラーなのでしょうか。

皆さんのお力をお借りできればと思います。
よろしくお願いいたします。

0 likes

Are you sure you want to delete the question?

xgboostで重要度を可視化しようとしたところ、ValueErrorがでた

解決したいこと

発生している問題・エラー

自分で試したこと

No Answers yet.

Your answer might help someone💌