Round 1 Review

JMIRx Med

xmed

JMIRx Med

2563-6316

60428

10.2196/60428

Peer Review of “Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis”

Anonymous

Meinert

Edward

2024

1262024

e60428

1005202410052024

2024

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIRx Med, is properly cited. The complete bibliographic information, a link to the original publication on https://med.jmirx.org/, as well as this copyright and license information must be included.

https://preprints.jmir.org/preprint/45973

https://www.medrxiv.org/content/10.1101/2023.01.21.23284795v1

https://med.jmirx.org/2024/1/e60384

https://med.jmirx.org/2024/1/e45973

cardiac surgeryartificial intelligencerisk predictionmachine learningoperative mortalitydata set driftperformance driftnational data setadultdatacardiacsurgerycardiologyheartriskpredictionUnited Kingdommortalityperformancemodel

This is the peer-review report for “Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis.”

Round 1 ReviewGeneral Comments

Overall, I think this is a really interesting paper [1]. It is a concept I had never heard of, and I can see very clearly how this is an important consideration. I also think the authors have done excellently to consider a host of different aspects, including feature importance change, beyond the most obvious measurements.

Specific CommentsAbstract

1. “It has been suggested that using Machine Learning (ML) techniques, a branch of Artificial intelligence (AI), may improve the accuracy of risk prediction.” Improve them over what? Specify what the status quo is with regard to first principles and data-driven modeling. This statement is also repeated in the first line of the introduction—what is “conventional” about these models?

2. “five ML mortality prediction models”—it should be highlighted that these are novel models that you have developed for this paper.

3. “geometric average results of all metrics”—it is not all metrics, just the 5 that you have calculated. It is better to just say here “a novel metric called the CEM” or something.

Introduction

Why is data set drift a problem? I think you could do more here to highlight how important this is to an audience who might not be dealing with the data themselves and, thus, might not naturally think of examples: for example, changes in treatment guidelines, demographics, new risk factors emerging, or changes in coding practices. You could mention “new” comorbidities such as long COVID.

Methods

1. Could the same individuals be in both the training and validation set and holdout set, if they had multiple surgeries? If so, this may have introduced some bias into the performance estimates. I do not think you need to redo the analyses, but if you can highlight the degree of overlap, then that would be good. Otherwise, say it was not possible and list it as a limitation.

2. “As a sensitivity analysis, we excluded the True Negative Rate from the performance evaluation, by calculating the F1 score.” This sentence does not quite make sense to me. The F₁-score is based on the sensitivity (true negative rate) and the precision (positive predictive value), right? It does not exclude the true negative rate per se; it just does not use it.

None declared.

References1

Dong

Sinha

Zhai

Performance drift in machine learning models for cardiac surgery risk prediction: retrospective analysis

JMIRx Med20245e45973

10.2196/45973