Summary of Explainable Algorithms for Machine Learning Models

Click the "Xiaobai Learns Vision" above, select to add "Star" or "Top"
Heavy content delivered to you first

Summary of Model Explainability

Introduction

Currently, many machine learning models can make very good predictions, but they do not explain how they make these predictions well. Many data scientists find it difficult to understand why an algorithm produces a certain prediction result. This is very critical because if we cannot know how an algorithm makes predictions, it will be challenging to debug the algorithm in other prior issues.

This article introduces several common techniques that can improve the explainability of machine learning models, including their relative advantages and disadvantages. We categorize them into the following:

Partial Dependence Plot (PDP);
Individual Conditional Expectation (ICE)
Permuted Feature Importance
Global Surrogate
Local Surrogate (LIME)
Shapley Value (SHAP)

Six Major Explainability Techniques

Partial Dependence Plot (PDP)

PDP was invented over a decade ago, and it can show the marginal effect of one or two features on the predictions of a machine learning model. It can help researchers determine how model predictions change when many features are adjusted.

In the above figure, the axes represent the values of the features, and the axes represent the predicted values. The solid line in the shaded area shows how the average prediction changes with the values. PDP can intuitively display the average marginal effect, but it may hide heterogeneous effects.

For example, one feature may be positively correlated with predictions for half of the data and negatively correlated with the other half. Therefore, the PDP plot will just be a horizontal line.

Individual Conditional Expectation (ICE)

ICE is very similar to PDP, but unlike PDP, which plots the average case, ICE shows the situation for each instance. ICE can help us explain how the model’s predictions change when a specific feature changes.

As shown in the figure, unlike PDP, the ICE curve can reveal heterogeneous relationships. However, its biggest problem is that it cannot easily see the average effect like PDP, so it may be considered to use both together.

Permuted Feature Importance

Permuted Feature Importance defines feature importance by observing the change in model prediction error after shuffling feature values. In other words, Permuted Feature Importance helps define the extent to which features in the model contribute to the final prediction.

As shown in the figure, feature f2 is at the top of the features and has the most significant impact on model error, while f1 has almost no impact on the model after shuffling, and the remaining features contribute negatively to the model.

Global Surrogate

The Global Surrogate method employs a different approach. It approximates the predictions of a black-box model by training an interpretable model.

First, we use the trained black-box model to predict the dataset;
Then we train an interpretable model on that dataset and predictions.

The trained interpretable model can approximate the original model, and all we need to do is explain that model.

Note: The surrogate model can be any interpretable model: linear models, decision trees, human-defined rules, etc.

Using an interpretable model to approximate a black-box model introduces additional error, but the additional error can be measured by R-squared.

Since the surrogate model is trained only based on the predictions of the black-box model and not the true results, the global surrogate model can only explain the black-box model and not the data.

Local Surrogate (LIME)

LIME (Local Interpretable Model-agnostic Explanations) differs from the global surrogate because it does not attempt to explain the entire model. Instead, it trains interpretable models to approximate individual predictions. LIME tries to understand how predictions change when we perturb the data samples.

The image on the left is divided into interpretable parts. Then, LIME generates a dataset of perturbed instances by “turning off” some interpretable components (in this case, making them gray). For each perturbed instance, the trained model can be used to obtain the probability of the tree frog being present in the image, and then a locally weighted linear model is learned on that dataset. Finally, the components with the highest positive weights are used as explanations.

Shapley Value (SHAP)

The concept of Shapley Value comes from game theory. We can explain predictions by assuming that each feature value of an instance is a “player” in the game. The contribution of each player is measured by adding and removing players from all subsets of the remaining players. A player’s Shapley Value is the weighted sum of all their contributions. Shapley values are additive and locally accurate. If you sum the Shapley values of all features and add the baseline value, which is the average prediction, you will get the accurate prediction value. This is a feature that many other methods do not have.

This figure shows the Shapley values for each feature, representing the contribution of moving the model result from the baseline value to the final prediction. Red indicates a positive contribution, while blue indicates a negative contribution.

Conclusion

The explainability of machine learning models is a very active and important research area in machine learning. In this article, we introduced six commonly used algorithms for understanding machine learning models. You can use them according to your practical scenarios.

Download 1: OpenCV-Contrib Extension Module Chinese Tutorial

Reply "Chinese Tutorial for Extension Modules" in the "Xiaobai Learns Vision" public account backend to download the first Chinese version of the OpenCV extension module tutorial on the internet, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters of content.

Download 2: Python Vision Practical Project 52 Lectures

Reply "Python Vision Practical Project" in the "Xiaobai Learns Vision" public account backend to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, face recognition, etc., to help quickly learn computer vision.

Download 3: OpenCV Practical Project 20 Lectures

Reply "OpenCV Practical Project 20 Lectures" in the "Xiaobai Learns Vision" public account backend to download 20 practical projects based on OpenCV to achieve advanced learning of OpenCV.

Group Chat

Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups on SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided in the future). Please scan the WeChat number below to join the group, and note: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Vision SLAM". Please follow the format, otherwise, you will not be approved. After successful addition, you will be invited to join related WeChat groups based on research direction. Please do not send advertisements in the group, otherwise, you will be removed from the group. Thank you for your understanding~

Leave a Comment Cancel reply