Interpretation of TRIPOD+AI for Multivariable Prediction Models

Click the “blue WeChat name” below the title to quickly follow

This article was published in:Chinese Journal of Internal Medicine, 2025, 64(1): 4-10.

Authors: Yan Minghai, Zhao Yanyan, Liu Xin, Li Wei, Wang Yang

Interpretation of TRIPOD+AI for Multivariable Prediction Models

Cite this article:：Yan Minghai, Zhao Yanyan, Liu Xin, et al. Interpretation of the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis based on Regression or Machine Learning Methods (TRIPOD+AI) [J]. Chinese Journal of Internal Medicine, 2025, 64(1): 4-10. DOI: 10.3760/cma.j.cn112138-20240926-00609.

Abstract

With the widespread application of artificial intelligence (AI) and machine learning methods in the development of predictive models, the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) 2015 guidelines can no longer fully meet current research needs. This article aims to introduce and interpret the new reporting standard based on regression or machine learning methods (TRIPOD+AI) by thoroughly analyzing 27 checklist items, helping researchers better understand and apply this updated guideline by comparing TRIPOD 2015 with TRIPOD+AI, thereby improving the reporting quality, transparency, and reproducibility of predictive model research.

The Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) was first published in 2015, providing researchers with minimum reporting recommendations that have been widely recognized and applied^[1-2]. However, related reviews indicate that the actual application of this reporting standard is not ideal, with issues such as incorrect statistical methods and lack of research details (e.g., handling of missing data, code availability) increasing readers’ doubts about the accuracy and effectiveness of the models^[3-5]. Furthermore, with the rapid development of emerging modeling methods such as artificial intelligence (AI) and machine learning (e.g., deep learning, gradient boosting machines), the demand for predictive models in various medical settings is also increasing, making it difficult for the traditional TRIPOD 2015 guidelines to fully cover the reporting needs of these complex models^[6]. Therefore, it is necessary to update the TRIPOD 2015 guidelines to include more detailed specifications to encompass the latest methodological advancements.

This article aims to interpret the updated TRIPOD 2015 guidelines, specifically the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis based on Regression or Machine Learning Methods (TRIPOD+AI)^[6], and discuss its usage. TRIPOD+AI not only integrates reporting recommendations for traditional regression models but also includes specific reporting items for machine learning methods, ensuring that different types of predictive models can be reported systematically and transparently. Through this update, TRIPOD+AI is expected to establish a unified reporting standard in the field of predictive model research, promoting transparency and reproducibility in scientific research and enhancing the model’s application value in medical practice.

1. Background and Development of TRIPOD+AI

To address the limitations of the existing TRIPOD reporting guidelines and to solve the issues of non-standard reporting in current predictive model research, especially in studies involving AI methods, TRIPOD+AI has emerged. Supported by the EQUATOR Network, the development of TRIPOD+AI followed the recommended development steps, including literature reviews, expert consensus discussions, and Delphi surveys through multiple phases^[6-7].

The initial items of TRIPOD+AI were derived from the TRIPOD 2015 checklist and other related predictive model reporting guidelines. On this basis, the reporting revision team also incorporated results from systematic assessments of machine learning predictive model research to ensure coverage of current best practices in the field. This report was first refined through two rounds of Delphi surveys by 370 experts from 27 countries, and then finalized through consensus meetings with 28 experts voting to form the final TRIPOD+AI checklist^[6]. Additionally, the reporting revision team has provided a dedicated reporting checklist for journal or conference abstracts to ensure consistency with TRIPOD+AI.

2. TRIPOD+AI Reporting Standards

The TRIPOD+AI reporting standard (https://www.tripidstatement.org/) lists 27 items (Table 1) divided into 7 sections, including title and abstract (Items 1-2), introduction (Items 3-4), methods (Items 5-17), open science (Item 18), patient and public involvement (Item 19), results (Items 20-24), and discussion (Items 25-27). Some items include multiple sub-items, totaling 52 checklist sub-items, divided into model development or validation sections.

3. Key Interpretations of TRIPOD+AI Reporting Standards

The TRIPOD+AI reporting standards collaboration group has provided detailed explanations of the checklist^[6], and this article selects the following key items for in-depth interpretation (Table 1), especially the newly added items compared to TRIPOD 2015.

1. Title (Item 1):The title should indicate whether the study is “developing” a new model, “validating” an existing model, or “developing + validating” a model. To clarify the applicability of the study, the title can concisely describe the target population of the research, which can be reflected through characteristics such as age, gender, or disease status. Additionally, the title should also reflect the predicted outcome indicators to allow readers to quickly understand the research purpose and potential application scenarios.

2. Abstract (Item 2):To meet the needs for abstract reporting, TRIPOD+AI additionally lists a complete set of abstract reporting standards (Table 2). This includes research objectives, background, population, modeling steps, validation, result reporting, and model interpretation, aiming to present research results clearly and comprehensively in the most concise language.

Interpretation of TRIPOD+AI for Multivariable Prediction Models

3. Introduction (Items 3-4):In writing the introduction section of the study, the authors should clearly articulate the clinical need behind model development, the target audience, and its expected clinical value. The introduction typically needs to explain why this model is needed, how it fills gaps in current clinical practice, and analyze the advantages and disadvantages of existing models. Moreover, it is crucial to clearly specify the specific application scenarios of the model, such as preliminary screening, auxiliary diagnosis, or prognosis assessment. Lastly, authors should also discuss whether health disparities among different sociodemographic groups (such as race, gender, or socioeconomic status) might impact the development and application of the predictive model.

4. Data Source (Item 5):Thoroughly describing the data source is fundamental to ensuring the credibility of the research, as different study designs offer various advantages and disadvantages. For example, randomized controlled trials, prospective or retrospective cohort studies, routine healthcare records, and registry data. Researchers should fully explain the rationale for selecting specific data sources, particularly emphasizing whether the data can represent the target population, which is crucial for ensuring the effectiveness of the model in practical applications. Additionally, researchers should clarify the time frame for data collection, including the enrollment time, follow-up time, and end time, as these factors are closely related to the model’s prediction time window.

5. Study Subjects (Item 6):Similar to most clinical studies, predictive model research must also specify inclusion and exclusion criteria for participants. Clearly defining inclusion and exclusion criteria not only helps ensure the internal validity of the research but also assists readers in assessing the consistency between the individual characteristics of the study sample and the target population, thus judging the generalizability of the risk assessment results^[9]. Furthermore, researchers should report in detail any treatments received by participants during the study. If the data is sourced from randomized controlled trials, researchers should specify how to handle intervention factors during model development or validation, such as whether to include them as predictors in the model or to adjust them as confounding factors during analysis.

6. Predicted Outcomes (Item 8):Researchers need to thoroughly explain the rationale for selecting research outcomes, discussing outcome indicators from two dimensions: first, the specific form of the outcome, such as death, disease progression, hospitalization, etc.; second, the evaluation time points for the outcomes, such as 6 months, 1 year, or 10 years. Additionally, the specific evaluation methods for outcome indicators and their accessibility across different sociodemographic groups should be discussed to ensure fairness.

If the outcome assessment is subjective, such as imaging assessments or self-reported clinical symptoms, researchers should clarify the qualifications and demographic characteristics of the assessors, such as education, work experience, and hospital level. This helps to comprehensively understand the assessors’ professional capabilities, whether the assessment process is influenced by human factors, and the generalizability and applicability of the results across different groups. If feasible, it is recommended to use third-party blind evaluation or implement blinding to ensure the objectivity of the assessment process and reduce evaluation bias.

7. Predictive Indicators (Item 9):Choosing predictive variables reasonably is essential to ensuring the accuracy of predictive models. For each predictive variable, researchers should explain its definition, measurement methods, measurement time points, measurement units, and quality control strategies (such as blinding) to minimize bias introduced by subjective information. On this basis, researchers should elaborate on the rationale for selecting initial predictive variables, indicating whether prior literature or existing predictive models were referenced, or if all available variables in the dataset were solely relied upon. The specific methods for selecting predictive variables should be detailed; many researchers often select variables based on univariate regression models with P values less than 0.05 or 0.1, but this approach may be misleading when there is multicollinearity among predictive variables. Using LASSO regression to increase the penalty term for variable selection is also common. It is recommended to use multiple methods for variable selection and to use model prediction accuracy as a measure, such as the Akaike Information Criterion or Bayesian Information Criterion, to determine which method yields the most suitable variables for prediction.

8. Sample Size (Item 10):Researchers should clarify the methods for determining sample size for model development and validation phases, arguing whether the sample size is sufficient to answer the research question and ensure the statistical power of the model. Based on the traditional principle of including at least 10 events per candidate variable, Richard D Riley’s research team proposed a more refined sample size calculation method applicable to both model development and validation phases, as well as for continuous or categorical outcome variables^[10-12]. This method consists of 4 steps: (1) Estimate the average risk: ensure accurate estimation of the model intercept. A simple method is to calculate the sample size required to accurately estimate the intercept of the “empty model” (a model without predictors). (2) Control the average prediction error: ensure that the error between predicted and true values is within an acceptable range, such as using the mean absolute percentage error as a measure. (3) Use shrinkage methods (such as penalties or regularization) to reduce the risk of overfitting. It is recommended to use a small shrinkage value (≤ 10%) to calculate the required sample size and adjust it based on the number of candidate predictive variables and model performance indicators (such as Cox-Snell R-squared, Cox-Snell goodness of fit). (4) Ensure that the difference between the apparent performance of the model (performance on the training set) and the true performance (performance on new data) is sufficiently small, which can be measured using calibration coefficients (such as Nagelkerke R-squared). Calculate the sample size for each of the above 4 steps and select the maximum as the minimum sample size. Specific operations can be implemented via the “pmsampsize” R package^[13].

9. Statistical Analysis Methods (Item 12):The way data is used directly impacts the effectiveness of model development and validation. Researchers should provide detailed descriptions of how data is used and processed, including whether the data is divided into training and testing sets for model development and validation, and whether statistical requirements for sample size are considered.

To ensure the correct application of predictive variables in the model, researchers should explain how these variables are processed, such as the functional form of the variables, rescaling, transformations, or standardization. If predictive variables have different dimensions or distributions, standardization should be performed to avoid bias arising from dimensional differences.

The choice of model type and its construction steps are also crucial to the final prediction results. Researchers should clearly specify the selected model type and explain the rationale for choosing that model. Additionally, the steps for constructing the model should be detailed, and hyperparameter tuning should be performed when necessary to enhance the model’s predictive capability. Internal validation methods (such as ten-fold cross-validation) should be used to assess the stability and reliability of the model.

For multicenter or international studies, researchers should consider the heterogeneity between different populations (such as hospitals or countries) and explain how to handle and quantify the model’s parameter values and performance to evaluate heterogeneity.

When evaluating model performance, it is recommended to present the model’s discrimination ability (such as receiver operating characteristic curve or C-index), calibration (such as calibration curve), or clinical applicability (such as decision curve analysis) in graphical form whenever possible. Additionally, it is suggested to construct multiple predictive models and compare their performance to select the optimal model.

Finally, researchers should clearly describe how predicted results are calculated, providing relevant formulas or code to ensure that other researchers can reproduce the results and use the model in practical applications.

10. Results (Items 20-24):Results reporting should provide relevant details as comprehensively as possible. Researchers should describe the changes in the number of participants at various stages of the study, such as population recruitment, enrollment, inclusion in the study, and follow-up. If the study involves multiple data sources, the number and characteristics of participants in each data source should be reported separately.

During the model validation phase, researchers should describe the distribution of key predictive variables and compare them with the data from the development phase to ensure data consistency. For the evaluation of model performance, confidence intervals should be provided for different key subgroups (such as gender, age, socioeconomic status) to demonstrate performance differences across different populations. Similarly, for multicenter or international studies, the performance of the model in different hospital or country subgroups should also be reported to evaluate potential differences and impacts.

11. Discussion (Items 25-27):The discussion section should comprehensively focus on the interpretation of the predictive model, specific application scenarios, and limitations. First, a brief summary of the main findings should be provided, especially regarding the model’s performance and practical application effects. At the same time, attention should be paid to the fairness of the model in application, exploring whether there are inconsistencies in its performance across different sociodemographic subgroups. Secondly, based on clarifying the specific application scenarios of the predictive model, attention should be paid to the actual needs of users when using the predictive model, such as whether professional assistance is needed or if data needs to be manually inputted. Finally, the limitations of the predictive model should be specified, focusing on issues such as sample representativeness, sample size, risk of overfitting, and data missingness, discussing their impact on model stability and result generalizability.

Additionally, future research directions for predictive models should be explored in depth. This includes but is not limited to expanding applicable populations, further validating models, and optimizing algorithms. Through deeper research, the practical utility and application value of predictive models can be better enhanced.

12. Fairness (Items 3c, 5a, 7, 8a, 8b, 9c, 12f, 14, 20b, 23a, 25, and 26):In this update, fairness has become a core element throughout the checklist, which was not sufficiently emphasized in TRIPOD 2015. TRIPOD+AI emphasizes the importance of fairness in data collection, method selection, and performance evaluation. For example, in data collection, efforts should be made to ensure that data includes individuals from different ages, genders, races, and health statuses; in results reporting, the performance of the model in key subgroups (such as sociodemographic characteristics) should not be overlooked. Additionally, the newly added patient and public involvement item encourages the inclusion of opinions from various stakeholders, including patients, the public, and clinicians, throughout the model development and application process to ensure fairness in model design.

Conclusion

The release of TRIPOD+AI marks a comprehensive upgrade and replacement of TRIPOD 2015, providing a more comprehensive and modern reporting framework for predictive model research, enabling researchers, reviewers, and journal editors to accurately convey the entire process and results of predictive model research. Although TRIPOD+AI still provides minimum reporting recommendations, it covers predictive model research using any regression or machine learning methods, establishing a new standard for research transparency, reproducibility, and effectiveness, promoting the development and application of clinical predictive model research.

References (omitted)

Interpretation of TRIPOD+AI for Multivariable Prediction Models

Leave a Comment Cancel reply