<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "journalpublishing.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="2.0" xml:lang="en" article-type="reviewer-report"><front><journal-meta><journal-id journal-id-type="nlm-ta">JMIRx Med</journal-id><journal-id journal-id-type="publisher-id">xmed</journal-id><journal-id journal-id-type="index">34</journal-id><journal-title>JMIRx Med</journal-title><abbrev-journal-title>JMIRx Med</abbrev-journal-title><issn pub-type="epub">2563-6316</issn></journal-meta><article-meta><article-id pub-id-type="publisher-id">60428</article-id><article-id pub-id-type="doi">10.2196/60428</article-id><title-group><article-title>Peer Review of &#x201C;Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis&#x201D;</article-title></title-group><contrib-group><contrib contrib-type="author"><collab>Anonymous</collab></contrib></contrib-group><contrib-group><contrib contrib-type="editor"><name name-style="western"><surname>Meinert</surname><given-names>Edward</given-names></name></contrib></contrib-group><pub-date pub-type="collection"><year>2024</year></pub-date><pub-date pub-type="epub"><day>12</day><month>6</month><year>2024</year></pub-date><volume>5</volume><elocation-id>e60428</elocation-id><history><date date-type="received"><day>10</day><month>05</month><year>2024</year></date><date date-type="accepted"><day>10</day><month>05</month><year>2024</year></date></history><copyright-statement>&#x00A9; Anonymous. Originally published in JMIRx Med (<ext-link ext-link-type="uri" xlink:href="https://med.jmirx.org">https://med.jmirx.org</ext-link>), 12.6.2024. </copyright-statement><copyright-year>2024</copyright-year><license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIRx Med, is properly cited. The complete bibliographic information, a link to the original publication on <ext-link ext-link-type="uri" xlink:href="https://med.jmirx.org/">https://med.jmirx.org/</ext-link>, as well as this copyright and license information must be included.</p></license><self-uri xlink:type="simple" xlink:href="https://xmed.jmir.org/2024/1/e60428"/><related-article related-article-type="companion" ext-link-type="doi" xlink:href="10.2196/preprints.45973" xlink:title="Preprint (JMIR Preprints)" xlink:type="simple">https://preprints.jmir.org/preprint/45973</related-article><related-article related-article-type="companion" ext-link-type="doi" xlink:href="10.1101/2023.01.21.23284795" xlink:title="Preprint (MedRxiv)" xlink:type="simple">https://www.medrxiv.org/content/10.1101/2023.01.21.23284795v1</related-article><related-article related-article-type="companion" ext-link-type="doi" xlink:href="10.2196/60384" xlink:title="Authors' Response to Peer-Review Reports" xlink:type="simple">https://med.jmirx.org/2024/1/e60384</related-article><related-article related-article-type="companion" ext-link-type="doi" xlink:href="10.2196/45973" xlink:title="Published Article" xlink:type="simple">https://med.jmirx.org/2024/1/e45973</related-article><kwd-group><kwd>cardiac surgery</kwd><kwd>artificial intelligence</kwd><kwd>risk prediction</kwd><kwd>machine learning</kwd><kwd>operative mortality</kwd><kwd>data set drift</kwd><kwd>performance drift</kwd><kwd>national data set</kwd><kwd>adult</kwd><kwd>data</kwd><kwd>cardiac</kwd><kwd>surgery</kwd><kwd>cardiology</kwd><kwd>heart</kwd><kwd>risk</kwd><kwd>prediction</kwd><kwd>United Kingdom</kwd><kwd>mortality</kwd><kwd>performance</kwd><kwd>model</kwd></kwd-group></article-meta></front><body><p><italic>This is the peer-review report for &#x201C;Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis.&#x201D;</italic></p><sec id="s2"><title>Round 1 Review</title><sec id="s1-1"><title>General Comments</title><p>Overall, I think this is a really interesting paper [<xref ref-type="bibr" rid="ref1">1</xref>]. It is a concept I had never heard of, and I can see very clearly how this is an important consideration. I also think the authors have done excellently to consider a host of different aspects, including feature importance change, beyond the most obvious measurements.</p></sec><sec id="s1-2"><title>Specific Comments</title><sec id="s1-2-1"><title>Abstract</title><p>1. &#x201C;It has been suggested that using Machine Learning (ML) techniques, a branch of Artificial intelligence (AI), may improve the accuracy of risk prediction.&#x201D; Improve them over what? Specify what the status quo is with regard to first principles and data-driven modeling. This statement is also repeated in the first line of the introduction&#x2014;what is &#x201C;conventional&#x201D; about these models?</p><p>2. &#x201C;five ML mortality prediction models&#x201D;&#x2014;it should be highlighted that these are novel models that you have developed for this paper.</p><p>3. &#x201C;geometric average results of all metrics&#x201D;&#x2014;it is not all metrics, just the 5 that you have calculated. It is better to just say here &#x201C;a novel metric called the CEM&#x201D; or something.</p></sec><sec id="s1-2-2"><title>Introduction</title><p>Why is data set drift a problem? I think you could do more here to highlight how important this is to an audience who might not be dealing with the data themselves and, thus, might not naturally think of examples: for example, changes in treatment guidelines, demographics, new risk factors emerging, or changes in coding practices. You could mention &#x201C;new&#x201D; comorbidities such as long COVID.</p></sec><sec id="s1-2-3"><title>Methods</title><p>1. Could the same individuals be in both the training and validation set and holdout set, if they had multiple surgeries? If so, this may have introduced some bias into the performance estimates. I do not think you need to redo the analyses, but if you can highlight the degree of overlap, then that would be good. Otherwise, say it was not possible and list it as a limitation.</p><p>2. &#x201C;As a sensitivity analysis, we excluded the True Negative Rate from the performance evaluation, by calculating the F1 score.&#x201D; This sentence does not quite make sense to me. The <italic>F</italic><sub>1</sub>-score is based on the sensitivity (true negative rate) and the precision (positive predictive value), right? It does not exclude the true negative rate per se; it just does not use it.</p></sec></sec></sec></body><back><fn-group><fn fn-type="conflict"><p>None declared.</p></fn></fn-group><ref-list><title>References</title><ref id="ref1"><label>1</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Dong</surname><given-names>T</given-names> </name><name name-style="western"><surname>Sinha</surname><given-names>S</given-names> </name><name name-style="western"><surname>Zhai</surname><given-names>B</given-names> </name><etal/></person-group><article-title>Performance drift in machine learning models for cardiac surgery risk prediction: retrospective analysis</article-title><source>JMIRx Med</source><year>2024</year><volume>5</volume><fpage>e45973</fpage><pub-id pub-id-type="doi">10.2196/45973</pub-id></nlm-citation></ref></ref-list></back></article>