Machine Learning-Based COVID-19 Mortality Prediction Model

✦

Deep Learning Soup Group Literature Review Study – Article 102

✦

Machine Learning-Based COVID-19 Mortality Prediction Model and Identification of Patients at Low and High Risk of Dying

Deep Learning Soup Group

April 11, 2023

Since 2021, Mohammad M. Banoei and others from the University of Calgary have developed a machine learning algorithm to predict the mortality rate of hospitalized COVID-19 patients, identifying the most important predictors of mortality, and published their findings in the journal Critical Care (IF: 19.3, Medicine Zone 1) in an article titled “Machine-learning-based COVID-19 mortality prediction model and identification of patients at low and high risk of dying”.

DOI：

https://doi.org/ 10.1186/s13054-021-03749-5

I. Research Background

The COVID-19 disease has become a significant cause of morbidity and mortality worldwide, exhibiting a wide range of clinical features that can lead to multi-organ failure. While the virus primarily affects the lungs, it can also cause cardiovascular, neurological, renal, and vascular complications, making it challenging to predict patient prognosis. Previous attempts to classify COVID-19 patients using traditional statistical analyses based on clinical and biochemical markers have been limited by the high complexity of features, failing to accurately predict mortality rates. Therefore, data mining and machine learning methods have been proposed as a solution. Methods based on artificial intelligence and machine learning have been utilized in the diagnosis, stratification, and prediction of COVID-19, developing classification models from multimodal data to better predict the mortality rate of hospitalized COVID-19 patients.

II. Dataset

In this retrospective study, clinical data from 400 COVID-19 patients who were hospitalized at the University of Miami Miller School of Medicine since June 2020 were collected. A total of 250 variables, including demographic information and clinical data, were collected at different times during hospitalization. The training and testing sets were randomly divided, and leave-one-out cross-validation was employed during the training process.

III. Methods

Before modeling, variable reduction was performed using Variable Importance for the Projection (VIP) scores, eliminating factors that were not useful for prediction, and only clinical and biochemical variables with VIP > 1.0 were used to establish the prediction model. The authors then employed the SIMPLS (Statistically Inspired Modification of Partial Least Squares) linear machine learning method for model analysis. Additionally, a decision tree was developed based on the relationship between results and predictors, as shown in Figure 1. This study also utilized latent class analysis and principal component analysis to cluster COVID-19 patients, identifying high-risk and low-risk patients. The model performance was measured using goodness for predictability Q2 and goodness of variability R2Y. Both R2 and Q2 range from 0 to 1, with higher levels indicating greater predictive accuracy. A Q2 value in the range of 0.2-0.4 is considered a model with moderate predictability.

IV. Results and Conclusions

This study developed a machine learning-based prediction model using the SIMPLS algorithm to predict the mortality rate of hospitalized COVID-19 patients. Table 1 shows the predictors obtained through VIP scoring with VIP > 1.0; the most important predictors of mortality are coronary artery disease, diabetes, altered mental status, age over 65, and dementia, while CRP, prothrombin, and lactate are the most important biochemical markers. Ultimately, the model utilized 18 clinical predictors and 3 blood biochemical markers. As shown in Figure 2, the AUC values for the training and validation sets were 0.95 and 0.91, respectively, indicating high accuracy of the prediction model. A Q2 value of 0.24 indicates moderate predictive capability. As seen in Figure 3, clustering of COVID-19 patients can be achieved based on clinical data through latent class analysis and principal component analysis, allowing for the identification of high-risk patients.

These results may help better manage COVID-19 patients and identify those at higher risk of death.

Glossary

1. Leave-One-Out Cross-Validation: A common k-fold cross-validation is a widely used machine learning method that randomly divides the dataset into k parts, where the training set consists of (k-1) parts and the test set consists of 1 part. Leave-one-out cross-validation is a special case of k-fold cross-validation, as it can be seen as n-fold cross-validation when k equals the sample size n. This means that each data point is used for testing, while all remaining (n-1) data points serve as the corresponding training set.

2. Variable Importance for the Projection (VIP): Mainly used for variable selection, based on the advantages of partial least squares regression, VIP technology can be used in situations with small samples and strong correlation among several independent variables, and can be calculated using multivariate statistical analysis software such as SIMCA-P.

Machine Learning-Based COVID-19 Mortality Prediction Model

Figure 1: Decision Tree for Predicting COVID-19 Patient Mortality

Table 1: 21 Most Important Predictors

Figure 2: AUC of Training and Validation Sets

Figure 3: Clustering of COVID-19 Patients Based on Latent Class Analysis and Principal Component Analysis

Pepper soup transformed by: Geng Shi

The Deep Learning Soup AI Group is composed of a group of AI enthusiasts from Xuzhou Medical University and its affiliated hospital. We welcome everyone to communicate and learn with us!

Scan to Follow Us

Welcome to join us!

Member WeChat ID:cy2011mcu

When adding friends, please note:

Your Unit-Department-Name-Research Direction

Leave a Comment Cancel reply