Understanding Machine Learning Model Explanations with R

1. Introduction

In the field of machine learning, the interpretability of models has always been an important research direction. Today, we would like to introduce a powerful R package — explainer, which provides strong support for understanding complex classification and regression models. This package mainly achieves detailed interpretation of complex models through Shapley analysis, which includes data-driven feature descriptions for individual subgroups. Moreover, it can help us conduct multi-metric model evaluations, analyze model fairness, and perform decision curve analysis, while also providing enhanced visualization effects with interactive elements. Whether you are a data scientist, researcher, or machine learning enthusiast, this package can greatly facilitate your journey in model interpretation.

2. Key Features

Detailed Model Explanation: Through Shapley analysis, it provides detailed explanations of the model’s predictions, helping users understand how the model makes predictions.

Multi-Metric Model Evaluation: It offers various metrics to evaluate model performance, including but not limited to accuracy, recall, F1 score, etc.

Model Fairness Assessment: It ensures that the model performs fairly across different subgroups, avoiding discriminatory predictions.

Decision Curve Analysis: By plotting decision curves, it helps users assess the model’s performance at different thresholds, thereby selecting the best decision point.

Interactive Visualization: It provides interactive visualization tools, enabling users to intuitively understand the model’s predictions and explanations.

3. Installation and Usage

install.packages("explainer")#install.packages("devtools")devtools::install_github("PERSIMUNE/explainer")

Load R Packages

library(explainer)library(mlr3verse)library(dplyr)

4. Regression Model

Load Data

df <- read.csv("boston.csv")

Build Model Based on mlr3 Machine Learning Framework

tk <- as_task_regr(df,target="medv") # Set tasksp <- partition(tk) # Data splittingmod <- lrn("regr.lightgbm") # Set modelsmod$train(tk,sp$train) # Model trainingmod$predict(tk,sp$test) # Prediction

Calculate Shapley Values

SHAP_output <- eSHAP_plot_reg(  task = tk,  trained_model = mod,  splits = sp,  seed = 1)

SHAP_output[[1]] # Interactive Shapley summary

Understanding Machine Learning Model Explanations with R

Model Evaluation

regressmdl_eval_results <- regressmdl_eval(  task = tk,  trained_model = mod,  splits = sp)
# Resultsregressmdl_eval_results

    MSE     RMSE      MAE    R_squared1 13.56915 3.683633 2.396991 0.8222139

Display the results of MSE, RMSE, MAE, and R_squared.

5. Binary Classification Model

Load Data

data("BreastCancer", package = "mlbench")BC <- BreastCancer[, -1] %>% na.omit()

Build Catboost Model

tk_class <- as_task_classif(BC,target="Class")sp_class <- partition(tk_class)mod_class <- lrn("classif.catboost",predict_type="prob")mod_class$train(tk_class,sp_class$train)mod_class$predict(tk_class,sp_class$test)

Model Evaluation

Confusion Matrix

confusionmatrix_plot <- eCM_plot(  task = tk_class,  trained_model = mod_class,  splits = sp_class)

Confusion Matrix for Training Set

confusionmatrix_plot$train_set

Confusion Matrix for Test Set

confusionmatrix_plot$test_set

Decision Curve

eDecisionCurve(  task = tk_class,  trained_model = mod_class,  splits = sp_class,  seed = 1)

ROC Curve and PR Curve

eROC_plot(  task = tk_class,  trained_model = mod_class,  splits = sp_class)

Calculate ROC and PR Curve Thresholds

ePerformance(  task = tk_class,  trained_model = mod_class,  splits = sp_class)

Calculate Shapley Values

SHAP_output <- eSHAP_plot(  task = tk_class,  trained_model = mod_class,  splits = sp_class,  sample.size = 30,  seed = 1,  subset = .8)

SHAP Visualization

SHAP_output[[1]]

Extract Shap Values

shap_Mean_wide <- SHAP_output[[2]]shap_Mean_long <- SHAP_output[[3]]shap <- SHAP_output[[4]]

Feature SHAP Values

ShapFeaturePlot(shap_Mean_long)

Partial Dependence Plot (PDP)

ShapPartialPlot(shap_Mean_long = shap_Mean_long)

SHAP Cluster Analysis

SHAP_plot_clusters <- SHAPclust(  task = tk_class,  trained_model = mod_class,  splits = sp_class,  shap_Mean_wide = shap_Mean_wide,  shap_Mean_long = shap_Mean_long,  num_of_clusters = 4,  seed = 1,  subset = .8)

SHAP_plot_clusters[[1]]

References

https://persimune.github.io/explainer/index.html

For more R language knowledge, please follow our public account【Data Statistics and Machine Learning】. Reply “explainer” in the public account backend to obtain data and code for free. Please【Share + Like + View】

Leave a Comment Cancel reply