*For Medical Professionals Only

Published in: European Association for Cardio-Thoracic Surgery
Impact Factor: IF=3.1
Publication Date: 2024.10.13
Innovations:
This study investigates the performance of an in-hospital/30-day mortality risk prediction model using alternative machine learning algorithms (XGBoost) in adults undergoing cardiac surgery.
Background:
Predictive models are used in international guidelines to determine the most appropriate treatment methods and enable clinicians to counsel patients. With decreasing mortality rates and the emergence of minimally invasive and interventional surgeries, there is a need for updated and more accurate models. Accurate predictions also help set benchmarks for individual surgeries and institutional outcomes. Older models, such as the European cardiac surgery risk assessment logic systems (Euro SCORE and Euro SCORE II), require updates. The main limitation of previous models was poor calibration and overestimation of risk in the highest risk groups. Their performance varies with time, center, and inherent surgical risks, suggesting the adoption of risk-adjusted mortality rates (RAMR).
The current model uses logistic regression (LR). There may be complex interactions between input variables that model developers need to consider. Alternative machine learning (ML) models employ algorithms that utilize large amounts of data to interpret these interactions. Previous research by the team has shown that models based on alternative ML have statistically better discrimination and clinical utility than retrained LR models when using Euro SCORE II variables, although calibration effects are similar.
Here, the team builds on previous work by utilizing all available variables routinely collected in the National Adult Cardiac Surgery Audit (NACSA) of the UK, which is the largest reviewed dataset to date, and employs comprehensive variable selection methods to create a novel XGBoost-based model.
Results:
1. Patients
During the study period, 224,318 adults underwent cardiac surgery at 42 centers, with 6,100 deaths (2.72%) (Figure 1). Baseline variable differences between survivors and non-survivors are shown in Table 1.

[Figure 1: Consort diagram showing the flow of participants during the study]
Table 1: Baseline patient demographics






















2. Final Model
The final model consists of the variables shown in Table 2.
Table 2: Variables used in the final model


3. XGBoost-23
The XGBoost model with the maximum AUC in the training set includes 27 variables—AUC of 0.837 (95% CI: 0.837-0.838), F1 score of 0.277 (95% CI [0.276-0.279]). However, visual inspection of the AUC curve and output values shows that fewer variables result in less loss of discriminative ability. Therefore, after discussion, the team conducted a “sensitivity” analysis to evaluate the impact of reducing the number of variables (20, 23, and 25) on model performance. Reducing from 23 to 20 variables led to a significant drop in NB (treatment volume). The model with 20 variables overestimated risk, while models with 23 and 25 variables underestimated risk, with minimal differences in clinical utility between the 23 and 25 variable models. Weighing the pros and cons, the team chose to use the XGBoost model with 23 variables (i.e., XGBoost-23)—AUC of 0.846 (95% CI: 0.845-0.846), F1 score of 0.288 (95% CI: 0.287-0.290) (Figure 2A), which also showed good calibration even in patients with predicted risks above 30% (Figure 2B). The number of patients undergoing surgery with predicted risks above 30% was low, which was expected. DCA (Figure 2C) shows that patients receiving treatment gain net benefits at all threshold probabilities below 60%.



[Figure 2: Performance of the final model developed using XGBoost with feature selection. A) Identification. B) Calibration. C) Decision curve analysis.]
4. Variable Importance
Figure 3 shows the variable importance of the XGBoost-23 model established on the training/validation set. The most influential factors include type of surgery, age, creatinine clearance rate, emergency status, and New York Heart Association (NYHA) score.

[Figure 3: Variable importance of the XGBoost-23 final model. CrCl: creatinine clearance rate, NYHA: New York Heart Association score, CPS: preoperative emergency status, PVD: peripheral vascular disease, PrevOp: any previous cardiac surgery, PrevMI: number of previous myocardial infarctions, CardiacRhythm: preoperative rhythm, BMI: body mass index, HospCode: hospital identification code; Stroke: preoperative stroke, LVF2: left ventricular function, using the same criteria as EuroSCORE II, PrevValve: previous valve surgery, PrevCABG: previous coronary artery bypass grafting, Ao.Arch.Procedure: aortic arch surgery, mechanicalSupport: preoperative need for mechanical support]
Conclusion:
Feature-selected XGBoost demonstrates good discrimination, calibration, and clinical utility in predicting mortality after cardiac surgery. Prospective external validation of the performance of XGBoost-derived models is necessary.
(Translation and verification: Sun Hua, Zhang Ying)
Scan the QR code to view the original text

END

Long press the QR code to add
Follow us for more literature