Preprint (JMIR Preprints): https://preprints.jmir.org/preprint/27017
Authors' Response to Peer-Review Reports: https://med.jmirx.org/2021/3/e31568/
Published Article: https://med.jmirx.org/2021/3/e27017/
This is a peer-review report submitted for the paper “Finding Potential Adverse Events in the Unstructured Text of Electronic Health Care Records: Development of the Shakespeare Method”
Round 1 Review
This study  is trying to develop a new method to identify attributed and unattributed potential adverse events (AEs) using the unstructured text of electronic health records (EHRs).
- After reading the manuscript, I feel the title does not match the study contents. First, the title seems to repeat a fact that is already self-evident.
- The core of the so-called Shakespeare method is still the latent Dirichlet allocation (LDA) method; I cannot see that any novel methods have been developed.
- There is no related literature review, as many studies have used LDA methods in EHR data. To really find any AE in unstructured text, natural language processing (NLP) is indispensable.
- What is the difference between the so-called “Shakespeare method” and LDA topic modeling?
- What are the three parts in the following statement:The Shakespeare method has three parts:
- Convert each document into a vector of n-gram frequencies.
- Create two groups of vectors: target and comparison.
- Trim the n-gram vectors in the target group to those that are significant for the target group.
- Apply topic analysis to the trimmed target group vectors.
- Interpret the original documents with topic scores of interest.
- The description of the method is hard to understand. As stated, “Crucially, events can be described in text but not necessarily attributed to being medical care AEs [14,25,41]; we wanted to develop an unstructured method that would identify them.” What is this unstructured method?
Round 2 Review
This revision provided more details of the Shakespeare method. However, it seems the authors do not quite understand the alternative method: NLP. This may lead to mistaken conclusions. The questions below need reconsideration.
- It is claimed that “Many methods for finding AEs in text rely on predefining possible AEs before searching for prespecified words and phrases or manual labeling (standardization) by investigators.” The dictionary method in the NLP tool could extract most terms, for example, included in the Unified Medical Language System, which can be limited to a “disorder” semantic group as a potential transfusion AE (PTAE) group.
- The PTAE terms identified through the Shakespeare method actually are a mixture of reasons for transfusion, consequences of the reasons for transfusion, or alternate reasons for PTAEs. The Shakespeare method is not able to identify specific AEs with a causal relationship with transfusion. Then, what is the difference between this method and the NLP dictionary method?
- It is advisable to include potential use scenarios of the method (eg, will more manual reviews be needed for the results?).
Conflicts of Interest
|AE: adverse event|
|EHR: electronic health record|
|LDA: latent Dirichlet allocation|
|NLP: natural language processing|
|PTAE: potential transfusion adverse event|
Edited by E Meinert; This is a non–peer-reviewed article. submitted 24.06.21; accepted 24.06.21; published 11.08.21Copyright
© Anonymous. Originally published in JMIRx Med (https://med.jmirx.org), 11.08.2021.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIRx Med, is properly cited. The complete bibliographic information, a link to the original publication on https://med.jmirx.org/, as well as this copyright and license information must be included.