Peer Review of “Finding Potential Adverse Events in the Unstructured Text of Electronic Health Care Records: Development of the Shakespeare Method”

doi:10.2196/31548

Peer-Review Report

Anonymous

Related ArticlesPreprint (medRxiv): https://www.medrxiv.org/content/10.1101/2021.01.05.21249239v1
Preprint (JMIR Preprints): https://preprints.jmir.org/preprint/27017
Authors' Response to Peer-Review Reports: https://med.jmirx.org/2021/3/e31568/
Published Article: https://med.jmirx.org/2021/3/e27017/

JMIRx Med 2021;2(3):e31548

doi:10.2196/31548

This is a peer-review report submitted for the paper “Finding Potential Adverse Events in the Unstructured Text of Electronic Health Care Records: Development of the Shakespeare Method”

General Comments

This paper [1] described the “Shakespeare method,” which was designed to discover associations between adverse events (AEs) caused by blood transfusion from unstructured electronic health record (EHR) notes. The authors applied this method on the MIMIC-III data set and seemed to be able to find transfusion AEs (TAEs) and potential TAEs (PTAEs) that were unknown when those EHR notes were developed.

Specific Comments

Major Comments

Is there any plan to release all the code/scripts used in this study? The method seems to be complex involving multiple steps; it will be very difficult to reproduce the results if the code is not available.
The manuscript should include more details on how the transfusion and comparison groups were created.
The author mentioned that the latent Dirichlet allocation (LDA) method they used in topic modeling requires the number of topics to be selected a priori. In this study, they set it to 45. Some questions:
- How robust is the “Shakespeare method” with respect to this value? If a different value is chosen, will the method find similar topics? Similar notes for manual document review? Similar TAEs/PTAEs?
- How would you determine this value if the method is applied to detect AEs for other treatments?
- A brief introduction to the LDA method should be included in the manuscript.
In the Results section, the authors mentioned “Despite the inclusion of 1 to 5 grams in the vectorization, the terms that we extracted during classification were unigrams.” That seems to be quite a coincidence; is there any explanation? If only unigrams are used in the bag-of-word representation, will the results be different? Does it mean only unigrams are needed in the future application of this method?
If possible, applying the method in other data sets or for other types of treatment will help to understand how generalizable the method is.
On page 4, section The Shakespeare Method: “Trim the n-gram vectors in the target group to those that are significant for the target group.” How is the trimming performed? How important is it for the final result?

Minor Comments

In the Abstract section, the authors wrote “We chose the case of transfusion adverse events (TAEs) and potential TAEs (PTAEs) because real dates were obscured in the study data, and new TAE types were becoming recognized during the study data period.” The causal relationship here is a little confusing.
On page 3, the authors wrote, “The Shakespeare method has three parts,” but the following bullet-point list has 5 items.
On page 8: “The Shakespeare method would likely generalize to other her notes and possibly other types of medical texts.” An additional “her” is inserted.

The revision addressed my previous concerns. I have no further comments.

Conflicts of Interest

None declared.

Bright RA, Dowdy K, Rankin SK, Blok SV, Palmer LAM, Bright SJ. Finding Potential Adverse Events in the Unstructured Text of Electronic Health Care Records: Development of the Shakespeare Method. JMIRx Med 2021 Aug 11;2(3):e27017 [FREE Full text] [CrossRef]

‎

AE: adverse event

EHR: electronic health record

LDA: latent Dirichlet allocation

PTAE: potential transfusion adverse event

TAE: transfusion adverse event

Edited by E Meinert; This is a non–peer-reviewed article. submitted 24.06.21; accepted 24.06.21; published 11.08.21

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIRx Med, is properly cited. The complete bibliographic information, a link to the original publication on https://med.jmirx.org/, as well as this copyright and license information must be included.

This paper is in the following e-collection/theme issue:

Peer Review of “Finding Potential Adverse Events in the Unstructured Text of Electronic Health Care Records: Development of the Shakespeare Method”