Preprint (JMIR Preprints): https://preprints.jmir.org/preprint/27017
Authors' Response to Peer-Review Reports: https://med.jmirx.org/2021/3/e31568/
Published Article: https://med.jmirx.org/2021/3/e27017/
This is a peer-review report submitted for the paper “Finding Potential Adverse Events in the Unstructured Text of Electronic Health Care Records: Development of the Shakespeare Method”
Round 1 Review
This paper  described the “Shakespeare method,” which was designed to discover associations between adverse events (AEs) caused by blood transfusion from unstructured electronic health record (EHR) notes. The authors applied this method on the MIMIC-III data set and seemed to be able to find transfusion AEs (TAEs) and potential TAEs (PTAEs) that were unknown when those EHR notes were developed.
- Is there any plan to release all the code/scripts used in this study? The method seems to be complex involving multiple steps; it will be very difficult to reproduce the results if the code is not available.
- The manuscript should include more details on how the transfusion and comparison groups were created.
- The author mentioned that the latent Dirichlet allocation (LDA) method they used in topic modeling requires the number of topics to be selected a priori. In this study, they set it to 45. Some questions:
- How robust is the “Shakespeare method” with respect to this value? If a different value is chosen, will the method find similar topics? Similar notes for manual document review? Similar TAEs/PTAEs?
- How would you determine this value if the method is applied to detect AEs for other treatments?
- A brief introduction to the LDA method should be included in the manuscript.
- In the Results section, the authors mentioned “Despite the inclusion of 1 to 5 grams in the vectorization, the terms that we extracted during classification were unigrams.” That seems to be quite a coincidence; is there any explanation? If only unigrams are used in the bag-of-word representation, will the results be different? Does it mean only unigrams are needed in the future application of this method?
- If possible, applying the method in other data sets or for other types of treatment will help to understand how generalizable the method is.
- On page 4, section The Shakespeare Method: “Trim the n-gram vectors in the target group to those that are significant for the target group.” How is the trimming performed? How important is it for the final result?
- In the Abstract section, the authors wrote “We chose the case of transfusion adverse events (TAEs) and potential TAEs (PTAEs) because real dates were obscured in the study data, and new TAE types were becoming recognized during the study data period.” The causal relationship here is a little confusing.
- On page 3, the authors wrote, “The Shakespeare method has three parts,” but the following bullet-point list has 5 items.
- On page 8: “The Shakespeare method would likely generalize to other her notes and possibly other types of medical texts.” An additional “her” is inserted.
Round 2 Review
The revision addressed my previous concerns. I have no further comments.
Conflicts of Interest
|AE: adverse event
|EHR: electronic health record
|LDA: latent Dirichlet allocation
|PTAE: potential transfusion adverse event
|TAE: transfusion adverse event
Edited by E Meinert; This is a non–peer-reviewed article. submitted 24.06.21; accepted 24.06.21; published 11.08.21Copyright
© Anonymous. Originally published in JMIRx Med (https://med.jmirx.org), 11.08.2021.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIRx Med, is properly cited. The complete bibliographic information, a link to the original publication on https://med.jmirx.org/, as well as this copyright and license information must be included.