<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "journalpublishing.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="2.0" xml:lang="en" article-type="reviewer-report"><front><journal-meta><journal-id journal-id-type="nlm-ta">JMIRx Med</journal-id><journal-id journal-id-type="publisher-id">xmed</journal-id><journal-id journal-id-type="index">34</journal-id><journal-title>JMIRx Med</journal-title><abbrev-journal-title>JMIRx Med</abbrev-journal-title><issn pub-type="epub">2563-6316</issn><publisher><publisher-name>JMIR Publications</publisher-name><publisher-loc>Toronto, Canada</publisher-loc></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">v6i1e84175</article-id><article-id pub-id-type="doi">10.2196/84175</article-id><article-categories><subj-group subj-group-type="heading"><subject>Peer-Review Report</subject></subj-group></article-categories><title-group><article-title>Peer Review of &#x201C;Assessing the Limitations of Large Language Models in Clinical Practice Guideline&#x2013;Concordant Treatment Decision-Making on Real-World Data: Retrospective Study&#x201D;</article-title></title-group><contrib-group><contrib contrib-type="author"><name name-style="western"><surname>Singh</surname><given-names>Reenu</given-names></name><xref ref-type="aff" rid="aff1"/></contrib></contrib-group><aff id="aff1"><institution>Indian Institute of Management Mumbai</institution><addr-line>Vihar Lake Rd</addr-line><addr-line>Mumbai</addr-line><country>India</country></aff><contrib-group><contrib contrib-type="editor"><name name-style="western"><surname>Grover</surname><given-names>Abhinav</given-names></name></contrib></contrib-group><pub-date pub-type="collection"><year>2025</year></pub-date><pub-date pub-type="epub"><day>3</day><month>11</month><year>2025</year></pub-date><volume>6</volume><elocation-id>e84175</elocation-id><history><date date-type="received"><day>15</day><month>09</month><year>2025</year></date><date date-type="accepted"><day>15</day><month>09</month><year>2025</year></date></history><copyright-statement>&#x00A9; Reenu Singh. Originally published in JMIRx Med (<ext-link ext-link-type="uri" xlink:href="https://med.jmirx.org">https://med.jmirx.org</ext-link>), 3.11.2025. </copyright-statement><copyright-year>2025</copyright-year><license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIRx Med, is properly cited. The complete bibliographic information, a link to the original publication on <ext-link ext-link-type="uri" xlink:href="https://med.jmirx.org/">https://med.jmirx.org/</ext-link>, as well as this copyright and license information must be included.</p></license><self-uri xlink:type="simple" xlink:href="https://xmed.jmir.org/2025/1/e84175"/><related-article related-article-type="companion" ext-link-type="doi" xlink:href="10.2196/74899" xlink:title="Preprint (JMIR Preprints)" xlink:type="simple">http://preprints.jmir.org/preprint/74899</related-article><related-article related-article-type="companion" ext-link-type="doi" xlink:href="10.2196/84173" xlink:title="Authors' Response to Peer-Review Reports" xlink:type="simple">https://med.jmirx.org/2025/1/e84173</related-article><related-article related-article-type="companion" ext-link-type="doi" xlink:href="10.2196/74899" xlink:title="Published Article" xlink:type="simple">https://med.jmirx.org/2025/1/e74899</related-article><kwd-group><kwd>large language model</kwd><kwd>foundation model</kwd><kwd>reasoning model</kwd><kwd>treatment decision-making</kwd><kwd>aortic stenosis</kwd><kwd>clinical practice guidelines</kwd><kwd>medical data processing</kwd></kwd-group></article-meta></front><body><p><italic>This is the peer-review report for &#x201C;Assessing the Limitations of Large Language Models in Clinical Practice Guideline&#x2013;Concordant Treatment Decision-Making on Real-World Data: Retrospective Study.&#x201D;</italic></p><sec id="s2"><title>Round 1 Review</title><sec id="s1-1"><title>Specific Comments</title><sec id="s1-1-1"><title>Major Comments</title><p>1. To improve the discussion on bias in large language models (LLMs) for clinical decision-making, the study [<xref ref-type="bibr" rid="ref1">1</xref>] should include the following aspects:</p><p>If LLMs are trained predominantly on Western medical literature or specific demographic groups, their recommendations may not generalize well to diverse patient populations. If the data used to fine-tune the model lack representation from certain ethnic, gender, or socioeconomic groups, the artificial intelligence may produce recommendations that are not universally applicable. Even with a diverse dataset, biases can arise due to model architecture, reinforcement learning strategies, or human-in-the-loop feedback mechanisms that shape model responses.</p><p>2. What datasets were used? If real patient data were used, specify its source (eg, electronic health records, clinical trial data, or synthetic datasets). Provide the total number of cases or records used for testing the LLMs. If synthetic data were generated, describe the method used to create the data. Were diverse age groups, genders, and ethnic backgrounds represented? A lack of diversity in data can affect the generalizability of results.</p><p>3. What datasets were used? If real patient data were used, specify its source (eg, electronic health records, clinical trial data, or synthetic datasets). Provide the total number of cases or records used for testing the LLMs. If synthetic data were generated, describe the method used to create the data. Were diverse age groups, genders, and ethnic backgrounds represented? A lack of diversity in data can affect the generalizability of results.</p><p>The study&#x2019;s impact can be significantly enhanced by addressing the following challenges: Raw medical reports often include free-text narratives, physician notes, abbreviations, and inconsistencies, requiring advanced natural language processing techniques such as entity recognition, text normalization, and standardization. These reports may also contain irrelevant information, redundancies, or nonessential clinical details. Effective preprocessing is essential to filter out unnecessary content while preserving critical medical insights. A key consideration is how to optimize this preprocessing to mitigate these challenges efficiently.</p><p>4. The study&#x2019;s impact can be significantly enhanced by addressing the following challenges: Raw medical reports often include free-text narratives, physician notes, abbreviations, and inconsistencies, requiring advanced natural language processing techniques such as entity recognition, text normalization, and standardization. These reports may also contain irrelevant information, redundancies, or nonessential clinical details. Effective preprocessing is essential to filter out unnecessary content while preserving critical medical insights. A key consideration is how to optimize this preprocessing to mitigate these challenges efficiently.</p></sec></sec></sec><sec id="s3"><title>Round 2 Review</title><p>1. The authors have addressed the comments satisfactorily.</p></sec></body><back><fn-group><fn fn-type="conflict"><p>None declared.</p></fn></fn-group><glossary><title>Abbreviations</title><def-list><def-item><term id="abb1">LLM</term><def><p>large language model</p></def></def-item></def-list></glossary><ref-list><title>References</title><ref id="ref1"><label>1</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Roeschl</surname><given-names>T</given-names> </name><name name-style="western"><surname>Hoffmann</surname><given-names>M</given-names> </name><name name-style="western"><surname>Hashemi</surname><given-names>D</given-names> </name><etal/></person-group><article-title>Assessing the limitations of large language models in clinical practice guideline&#x2013;concordant treatment decision-making on real-world data: retrospective study</article-title><source>JMIRx Med</source><year>2025</year><volume>6</volume><fpage>e84173</fpage><pub-id pub-id-type="doi">10.2196/84173</pub-id></nlm-citation></ref></ref-list></back></article>