A Survey on Interpretability of Machine Learning Models

Introduction: Research on model interpretability has become a hot topic at research conferences in the past two years, as people are no longer satisfied with just the performance of the model but are also pondering the reasons behind the model’s performance. This kind of thinking helps optimize models and features, enhances understanding of the models themselves, and improves service quality. This article summarizes relevant information on the interpretability of machine learning models.

——Survey——

The goal of machine learning applications is to output decision judgments. Interpretability refers to the extent to which humans can understand the reasons behind decisions. The higher the interpretability of a machine learning model, the easier it is for people to understand why certain decisions or predictions were made. Model interpretability refers to understanding the internal mechanisms of the model as well as understanding the results of the model. Its importance is reflected in: during the modeling phase, it assists developers in understanding the model, comparing and selecting models, and optimizing or adjusting the model when necessary; during the operational phase, it explains the internal mechanisms of the model to business stakeholders and interprets the model results. For example, in a fund recommendation model, it is necessary to explain why a certain fund is recommended to a specific user.

The steps of the machine learning process are: collecting data, cleaning data, training the model, and selecting the best model based on validation or test errors or other evaluation metrics. In the first step, choose a model with a relatively low error rate and a high accuracy rate. In the second step, one faces a trade-off between accuracy and model complexity, but the more complex a model is, the harder it is to explain. A simple linear regression is very easy to explain because it only considers the linear relationship between independent and dependent variables, but because of this, it cannot handle more complex relationships, and the predictive accuracy of the model on the test set is also likely to be lower. Deep neural networks are at the other extreme because they can perform abstract reasoning at multiple levels, allowing them to handle very complex relationships between dependent and independent variables while achieving very high accuracy. However, this complexity also makes the model a black box; we cannot know all the relationships between the features that produce the model’s predictions, so we can only use accuracy and error rates as substitute evaluation criteria to assess the model’s reliability.

In fact, every classification problem’s machine learning process should include model understanding and model explanation, for several reasons:

Model Improvement: Understanding feature metrics, classifications, and predictions helps us understand why a machine learning model makes certain decisions and which features play the most important role in those decisions, allowing us to judge whether the model is reasonable. A deep neural network learns to distinguish between images of wolves and huskies. The model is trained using a large number of images and tested with some additional images. A 90% accuracy rate is worth celebrating. However, without calculating the explainer function, we do not know that the model mainly relies on the background: wolf images usually have a snowy background, while husky images rarely do. Thus, we inadvertently created a snow detection model; if we only look at metrics like accuracy, we would miss this point. Knowing how the model uses features for predictions allows us to intuitively judge whether our model captures meaningful features and whether it can generalize its predictions to other samples.

Model Trustworthiness and Transparency: Understanding machine learning models is essential for improving model trustworthiness and providing transparency in predicting results. It is unrealistic to let black box models decide people’s lives, such as in loans and prison sentencing. Another area that raises questions about the trustworthiness of machine learning results is pharmaceuticals, where model outcomes can directly determine a patient’s life or death. Machine learning models are very accurate in distinguishing malignant tumors from different types of benign tumors, but we still need experts to explain the diagnostic results. Explaining why a machine learning model classifies a patient’s tumor as benign or malignant will greatly help doctors trust and use machine learning models to support their work. In the long run, better understanding machine learning models can save a lot of time and prevent revenue losses. If a model makes unreasonable decisions, we can identify this before applying the model and causing adverse effects.

Identifying and Preventing Bias: Variance and bias are widely discussed topics in machine learning. Biased models are often caused by biased facts; if the data contains subtle biases, the model will learn and think it fits well. A famous example is using machine learning models to recommend sentencing for prisoners, which clearly reflects the inherent biases of the judicial system regarding racial inequality. Other examples include machine learning models used for recruitment, revealing gender biases for specific positions, such as male software engineers and female nurses. Machine learning models are powerful tools in many aspects of our lives, and they are becoming increasingly popular. Therefore, as data scientists and decision-makers, it is our responsibility to understand how the models we train and deploy make decisions, allowing us to prevent the increase of bias and eliminate it.

Interpretability Characteristics:

Importance: Understanding the “why” can help deepen our understanding of the problem, the data, and the reasons why models may fail.

Classification: Interpretability of data before modeling, model interpretability during the modeling phase, and interpretability of results during the operational phase.

Scope: Global interpretability, local interpretability, model transparency, model fairness, model reliability.

❹ Evaluation: Intrinsic or post hoc? Model-specific or model-agnostic? Local or global?

❺ Characteristics: Accuracy, fidelity, usability, reliability, robustness, generalizability, etc.

Human-Readable Explanation: The degree to which humans can understand the reasons behind decisions and the extent to which people can continuously predict model results.

Motivation

In the industry, the primary focus of data science or machine learning is to solve complex real-world problems that are more

Leave a Comment