3
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

【R/English】Permutation Feature Importance (PFI)

Last updated at Posted at 2022-02-23

Objective

To explore Permutation Feature Importance (PFI) and run it in R script. This article is based on information in 「機械学習を解釈する技術 ~Techniques for Interpreting Machine Learning~」by Mitsunosuke Morishita. In this book, the author does not go through all the methods by R, so I decided to make a brief note with an R script.

Permutation Feature Importance

Permutation Feature Importance (PFI) is defined to be the decrease in a model score when a single feature value is randomly shuffled 1. This procedure breaks the relationship between the feature and the target, thus the drop in the model score is indicative of how much the model depends on the feature. -scikit-learn Here are simple 5 steps of PFI

  1. Predict the target with ALL explanatory variables and calculate prediction error, which is going to be the baseline
  2. Pick one explanatory variable and permeate/shuffle it on the debatable. Predict the target and calculate prediction error
  3. Calculate the difference of prediction errors from steps 1 and 2. Make the difference the importance of variables piked at step 2
  4. Repeat steps for all explanatory variables
  5. See the importance of all variables and analyze

PFI

Execution with Real Data

Now, let's see how to run PFI with actual dataset.

Get Dataset

# Set up
library(mlbench)
library(tidymodels)
library(DALEX)
library(ranger)
library(Rcpp)
library(corrplot)

data("BostonHousing")
df = BostonHousing
`%notin%` <- Negate(`%in%`)

Obserview of the Dataset

Here are overview of the dataset
dataset_overvirew.png

head(df)

head().png

medv is our response variable, We predict this.

hist(df$medv,breaks = 20, main = 'Histgram of medv', xlab = 'medv ($ in 1,000)') 

hist.png

Build a Model

We won't cover building a model in this article. I used XGBoost for the model.

split = initial_split(df, 0.8)
train = training(split)
test = testing(split)

model = rand_forest(trees = 100, min_n = 1, mtry = 13) %>% 
  set_engine(engine = "ranger", seed(25)) %>% 
  set_mode("regression")

fit = model %>% 
  fit(medv ~., data=train)
fit

fit1.png

Predict medv

result = test %>% 
  select(medv) %>% 
  bind_cols(predict(fit, test))

metrics = metric_set(rmse, rsq)

result %>% 
  metrics(medv, .pred)
.metric .estimator .estimate
rmse standard 3.8857
rsq standard 0.8627

Interpre Feature Importance

Use the function explain to create an explainer object that helps us to interpret the model.

explainer = fit %>% 
  explain(
    data = test %>% select(-medv),
    y = test$medv
  )

expmainer.png

Use model_parts function to get PFI. Here you can see rm and lstat are the top 2 important variables to predict medv. The bark blue box chart show distribution of error loss since we calculate it multiple times.

  • loss_function: Evaluation metrics
  • B: # of shuffles
  • type: method of calculating importance "difference" or "ratio" are applicable
pfi = explainer %>% 
  model_parts(
    loss_function = loss_root_mean_square,
    B = 10,
    type = "difference"
  )

plot(pfi)

pfi1.png

FYI

Method Function
Permutation Feature Importance(PFI) model_parts()
Partial Dependence(PD) model_profile()
Individual Conditional Expectation(ICE) predict_profile()
SHAP predict_parts()

Grouped Permutation Feature Importance (GPFI)

If some explanatory variables are correlated with each other, PFI won't work well. Let's say $X0$ and $X1$ are correlated. While calculating the importance of $X0$, the model still uses $X1$ on prediction. The performance of the model would not decrease much because $X0$ and $X1$ are correlated. Thus, PFI will underestimate the importance of $X1$. In the plot below, rad; index of accessibility to radial highway, and tax;full-value property-tax rate per $10,000. In a situation like this, we should shuffle both variables together. In addition to that, we should use this GPFI when the variables are encoded by one-hot encoding. Or you can use it when you are dealing with data like latitudes and longitudes.

cor  <- df[,unlist(lapply(df, is.numeric))  ]
corrplot(cor(cor), method = 'number', order = 'alphabet')

corrplot.png

Rad and Tax

So let's run GPFI on our dataset. model_parts function have variable_groups method. It takes list objects. So make a list that contains name of explanatory variables in this case rad and tax1. The source code of feature_importance is here.

GPFI.png

# Make list
paired_var = list(c("tax","rad"))

# Male vector of explanatory variables Do not forget to take out your response variable  

all_vars = c(colnames(df)[colnames(df) %notin% c("tax","rad","medv")])

# Gather 
var_list = c(all_vars, paired_var)
                  
pfi = explainer %>% 
  model_parts(
    loss_function = loss_root_mean_square,
    B = 10,
    type = "difference",
    variable_groups = var_list
  )

# Gather explaniner object 
plot(pfi)

pfi3.png

If you keep tax and rad in the plot, you can see that the importance of tax and rad are dispersed.

# Make list
paired_var = list(c("tax","rad"))

# Make vector of explanatory variables Do not forget to take out your response variable  

all_vars = c(colnames(df)[colnames(df) %notin% c("medv")])

# Gather 
var_list = c(all_vars, paired_var)
                  
pfi = explainer %>% 
  model_parts(
    loss_function = loss_root_mean_square,
    B = 10,
    type = "difference",
    variable_groups = var_list
  )

# Gather explaniner object 
plot(pfi)

pfi2.png

Conclution

PFI and GPFI are very sufficient models to calculate the importance of explanatory variables in the model. On the other hand, PFI does not explain how each variable affects the prediction of the model. This could be done by Partial Dependence (PD).

References

https://scikit-learn.org/stable/modules/permutation_importance.html#:~:text=The%20permutation%20feature%20importance%20is,model%20depends%20on%20the%20feature.

Methods of Interpreting Machine Learning Qiita Links

  1. It may not be right to pair up tax and rad variables without decent causal inference.

3
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?