More than 3 years have passed since last update.

【R/English】Permutation Feature Importance (PFI)

Last updated at 2022-04-08Posted at 2022-02-23

Objective

To explore Permutation Feature Importance (PFI) and run it in R script. This article is based on information in 「機械学習を解釈する技術 ~Techniques for Interpreting Machine Learning~」by Mitsunosuke Morishita. In this book, the author does not go through all the methods by R, so I decided to make a brief note with an R script.

Permutation Feature Importance

Permutation Feature Importance (PFI) is defined to be the decrease in a model score when a single feature value is randomly shuffled 1. This procedure breaks the relationship between the feature and the target, thus the drop in the model score is indicative of how much the model depends on the feature. -scikit-learn Here are simple 5 steps of PFI

Predict the target with ALL explanatory variables and calculate prediction error, which is going to be the baseline
Pick one explanatory variable and permeate/shuffle it on the debatable. Predict the target and calculate prediction error
Calculate the difference of prediction errors from steps 1 and 2. Make the difference the importance of variables piked at step 2
Repeat steps for all explanatory variables
See the importance of all variables and analyze

Execution with Real Data

Now, let's see how to run PFI with actual dataset.

Get Dataset

# Set up
library(mlbench)
library(tidymodels)
library(DALEX)
library(ranger)
library(Rcpp)
library(corrplot)

data("BostonHousing")
df = BostonHousing
`%notin%` <- Negate(`%in%`)

Obserview of the Dataset

Here are overview of the dataset

head(df)

medv is our response variable, We predict this.

hist(df$medv,breaks = 20, main = 'Histgram of medv', xlab = 'medv ($ in 1,000)')

Build a Model

We won't cover building a model in this article. I used XGBoost for the model.

split = initial_split(df, 0.8)
train = training(split)
test = testing(split)

model = rand_forest(trees = 100, min_n = 1, mtry = 13) %>% 
  set_engine(engine = "ranger", seed(25)) %>% 
  set_mode("regression")

fit = model %>% 
  fit(medv ~., data=train)
fit

Predict medv

result = test %>% 
  select(medv) %>% 
  bind_cols(predict(fit, test))

metrics = metric_set(rmse, rsq)

result %>% 
  metrics(medv, .pred)

.metric	.estimator	.estimate
rmse	standard	3.8857
rsq	standard	0.8627

Interpre Feature Importance

Use the function explain to create an explainer object that helps us to interpret the model.

explainer = fit %>% 
  explain(
    data = test %>% select(-medv),
    y = test$medv
  )

Use model_parts function to get PFI. Here you can see rm and lstat are the top 2 important variables to predict medv. The bark blue box chart show distribution of error loss since we calculate it multiple times.

loss_function: Evaluation metrics
B: # of shuffles
type: method of calculating importance "difference" or "ratio" are applicable

pfi = explainer %>% 
  model_parts(
    loss_function = loss_root_mean_square,
    B = 10,
    type = "difference"
  )

plot(pfi)

FYI

Method	Function
Permutation Feature Importance(PFI)	model_parts()
Partial Dependence(PD)	model_profile()
Individual Conditional Expectation(ICE)	predict_profile()
SHAP	predict_parts()

Grouped Permutation Feature Importance (GPFI)

If some explanatory variables are correlated with each other, PFI won't work well. Let's say $X0$ and $X1$ are correlated. While calculating the importance of $X0$, the model still uses $X1$ on prediction. The performance of the model would not decrease much because $X0$ and $X1$ are correlated. Thus, PFI will underestimate the importance of $X1$. In the plot below, rad; index of accessibility to radial highway, and tax;full-value property-tax rate per $10,000. In a situation like this, we should shuffle both variables together. In addition to that, we should use this GPFI when the variables are encoded by one-hot encoding. Or you can use it when you are dealing with data like latitudes and longitudes.

cor  <- df[,unlist(lapply(df, is.numeric))  ]
corrplot(cor(cor), method = 'number', order = 'alphabet')

Rad and Tax

So let's run GPFI on our dataset. model_parts function have variable_groups method. It takes list objects. So make a list that contains name of explanatory variables in this case rad and tax¹. The source code of feature_importance is here.

# Make list
paired_var = list(c("tax","rad"))

# Male vector of explanatory variables Do not forget to take out your response variable  

all_vars = c(colnames(df)[colnames(df) %notin% c("tax","rad","medv")])

# Gather 
var_list = c(all_vars, paired_var)
                  
pfi = explainer %>% 
  model_parts(
    loss_function = loss_root_mean_square,
    B = 10,
    type = "difference",
    variable_groups = var_list
  )

# Gather explaniner object 
plot(pfi)

If you keep tax and rad in the plot, you can see that the importance of tax and rad are dispersed.

# Make list
paired_var = list(c("tax","rad"))

# Make vector of explanatory variables Do not forget to take out your response variable  

all_vars = c(colnames(df)[colnames(df) %notin% c("medv")])

# Gather 
var_list = c(all_vars, paired_var)
                  
pfi = explainer %>% 
  model_parts(
    loss_function = loss_root_mean_square,
    B = 10,
    type = "difference",
    variable_groups = var_list
  )

# Gather explaniner object 
plot(pfi)

Conclution

PFI and GPFI are very sufficient models to calculate the importance of explanatory variables in the model. On the other hand, PFI does not explain how each variable affects the prediction of the model. This could be done by Partial Dependence (PD).

References

https://scikit-learn.org/stable/modules/permutation_importance.html#:~:text=The%20permutation%20feature%20importance%20is,model%20depends%20on%20the%20feature.

Methods of Interpreting Machine Learning Qiita Links

It may not be right to pair up tax and rad variables without decent causal inference. ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up