Understanding Machine Learning Model Explanations with R

1. Introduction

In the field of machine learning, the interpretability of models has always been an important research direction. Today, we would like to introduce a powerful R package — explainer, which provides strong support for understanding complex classification and regression models. This package mainly achieves detailed interpretation of complex models through Shapley analysis, which includes data-driven feature descriptions for individual subgroups. Moreover, it can help us conduct multi-metric model evaluations, analyze model fairness, and perform decision curve analysis, while also providing enhanced visualization effects with interactive elements. Whether you are a data scientist, researcher, or machine learning enthusiast, this package can greatly facilitate your journey in model interpretation.
2. Key Features
Detailed Model Explanation: Through Shapley analysis, it provides detailed explanations of the model’s predictions, helping users understand how the model makes predictions.
Multi-Metric Model Evaluation: It offers various metrics to evaluate model performance, including but not limited to accuracy, recall, F1 score, etc.
Model Fairness Assessment: It ensures that the model performs fairly across different subgroups, avoiding discriminatory predictions.
Decision Curve Analysis: By plotting decision curves, it helps users assess the model’s performance at different thresholds, thereby selecting the best decision point.
Interactive Visualization: It provides interactive visualization tools, enabling users to intuitively understand the model’s predictions and explanations.
3. Installation and Usage
install.packages("explainer")#install.packages("devtools")devtools::install_github("PERSIMUNE/explainer")
Load R Packages
library(explainer)library(mlr3verse)library(dplyr)
4. Regression Model
Load Data
df <- read.csv("boston.csv") 
Build Model Based on mlr3 Machine Learning Framework
tk <- as_task_regr(df,target="medv") # Set tasksp <- partition(tk) # Data splittingmod <- lrn("regr.lightgbm") # Set modelsmod$train(tk,sp$train) # Model trainingmod$predict(tk,sp$test) # Prediction
Calculate Shapley Values
SHAP_output <- eSHAP_plot_reg(  task = tk,  trained_model = mod,  splits = sp,  seed = 1)
SHAP_output[[1]] # Interactive Shapley summary
Understanding Machine Learning Model Explanations with R
Model Evaluation
regressmdl_eval_results <- regressmdl_eval(  task = tk,  trained_model = mod,  splits = sp)
# Resultsregressmdl_eval_results
    MSE     RMSE      MAE    R_squared1 13.56915 3.683633 2.396991 0.8222139
Display the results of MSE, RMSE, MAE, and R_squared.
5. Binary Classification Model
Load Data
data("BreastCancer", package = "mlbench")BC <- BreastCancer[, -1] %>% na.omit()
Build Catboost Model
tk_class <- as_task_classif(BC,target="Class")sp_class <- partition(tk_class)mod_class <- lrn("classif.catboost",predict_type="prob")mod_class$train(tk_class,sp_class$train)mod_class$predict(tk_class,sp_class$test)
Model Evaluation
Confusion Matrix
confusionmatrix_plot <- eCM_plot(  task = tk_class,  trained_model = mod_class,  splits = sp_class)
Confusion Matrix for Training Set
confusionmatrix_plot$train_set
Understanding Machine Learning Model Explanations with R
Confusion Matrix for Test Set
confusionmatrix_plot$test_set
Understanding Machine Learning Model Explanations with R
Decision Curve
eDecisionCurve(  task = tk_class,  trained_model = mod_class,  splits = sp_class,  seed = 1)
Understanding Machine Learning Model Explanations with R
ROC Curve and PR Curve
eROC_plot(  task = tk_class,  trained_model = mod_class,  splits = sp_class)
Understanding Machine Learning Model Explanations with R
Calculate ROC and PR Curve Thresholds
ePerformance(  task = tk_class,  trained_model = mod_class,  splits = sp_class)
Understanding Machine Learning Model Explanations with R
Calculate Shapley Values
SHAP_output <- eSHAP_plot(  task = tk_class,  trained_model = mod_class,  splits = sp_class,  sample.size = 30,  seed = 1,  subset = .8)
SHAP Visualization
SHAP_output[[1]]
Understanding Machine Learning Model Explanations with R
Extract Shap Values
shap_Mean_wide <- SHAP_output[[2]]shap_Mean_long <- SHAP_output[[3]]shap <- SHAP_output[[4]]
Feature SHAP Values
ShapFeaturePlot(shap_Mean_long)
Understanding Machine Learning Model Explanations with R
Partial Dependence Plot (PDP)
ShapPartialPlot(shap_Mean_long = shap_Mean_long)
Understanding Machine Learning Model Explanations with R
SHAP Cluster Analysis
SHAP_plot_clusters <- SHAPclust(  task = tk_class,  trained_model = mod_class,  splits = sp_class,  shap_Mean_wide = shap_Mean_wide,  shap_Mean_long = shap_Mean_long,  num_of_clusters = 4,  seed = 1,  subset = .8)
SHAP_plot_clusters[[1]]
Understanding Machine Learning Model Explanations with R
References
https://persimune.github.io/explainer/index.html
For more R language knowledge, please follow our public account【Data Statistics and Machine Learning】. Reply “explainer” in the public account backend to obtain data and code for free. Please【Share + Like + View】

Leave a Comment