Introduction: Falls are a leading cause of unintentional injury in the elderly. Electronic health records (EHRs) offer the unique opportunity to develop models that can identify fall events. However, identifying fall events in clinical notes requires advanced natural language processing (NLP) to simultaneously address multiple issues because the word "fall" is a typical homonym.
Methods: We implemented a context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) to identify falls from the EHR text and further fused the BERT model into a hybrid architecture coupled with post-hoc heuristic rules to enhance the performance. The models were evaluated on real world EHR data and were compared to conventional rule-based and deep learning models (CNN and Bi-LSTM). To better understand the ability of each approach to identify falls, we further categorize fall-related concepts (i.e., risk of fall, prevention of fall, homonym) and performed a detailed error analysis.
Results: The hybrid model achieved the highest f1-score on sentence (0.971), document (0.985), and patient (0.954) level. At the sentence level (basic data unit in the model), the hybrid model had 0.954, 1.000, 0.988, and 0.999 in sensitivity, specificity, positive predictive value, and negative predictive value, respectively. The error analysis showed that that machine learning-based approaches demonstrated higher performance than a rule-based approach in challenging cases that required contextual understanding. The context-aware language model (BERT) slightly outperformed the word embedding approach trained on Bi-LSTM. No single model yielded the best performance for all fall-related semantic categories.
Conclusion: A context-aware language model (BERT) was able to identify challenging fall events that requires context understanding in EHR free text. The hybrid model combined with post-hoc rules allowed a custom fix on the BERT outcomes and further improved the performance of fall detection.
Keywords: BERT; EHR; Fall; NLP.
Copyright © 2022 Elsevier B.V. All rights reserved.