Kerasで作ったNNモデルにおける特徴量の重要度を出す（Permutation Importance）

Last updated at 2020-01-07Posted at 2020-01-07

はじめに

LightGBMでよく特徴量の重要度を見るのですが、NN系のモデルでもやりたいなとずっと思っていました。
知り合いからPermutation Importanceなるものを聞いたので、実際に利用してみました。
半分自分用の覚書です。

Permutation Importanceとは

要はデータ項目を一つづつシャッフルし、「シャッフルしたときにスコアが下がるデータ項目」=「重要度が高い」とし、各データ項目の重要度を出してくれます。

やってみた

使うライブラリはeli5のPermutationImportanceです。

Kerasで作成したモデルをPermutation Importanceで出す場合は、sklearnのラッパーを使う必要があります。
とりあえず回帰でやってみました。
またPermutationImportanceで処理された計算結果から特徴量をリストで表示するために、
SelectFromModelを使いました。

import keras
from eli5.sklearn import PermutationImportance
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor

# 適当なデータを使ってください。
X = ...
y = ...

def create_model():
    model = keras.models.Sequential([
        keras.layers.Input(shape=(X.shape[1],)),
        keras.layers.Dense(50, activation='relu'),
        keras.layers.Dense(25, activation='relu'),
        keras.layers.Dense(1, activation=None)
    ])
    model.compile(optimizer='adam', loss='mse')
    return model

# sklearnのラッパーを使ってモデル作成
my_model = KerasRegressor(build_fn=create_model, epochs=100, batch_size=64, verbose=0)    
my_model.fit(X,y)

# 重要度の計算
perm = PermutationImportance(my_model, random_state=1).fit(X,y)

# 重要度の出力
X_cols = np.array(X.columns.tolist(), dtype='str')
eli5.show_weights(perm, top=2500, feature_names = X_cols)

# 閾値を指定してスコア向上に寄与している特徴量を出力（以下の場合は0.001以上のスコア向上に寄与している特徴量に絞り込んでいる）
from sklearn.feature_selection import SelectFromModel
sel = SelectFromModel(perm, threshold=0.001, prefit=True)
print(list(X.iloc[:, sel.get_support()].columns))

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up