Ensembles of NLP Tools for Data Element Extraction from Clinical Notes

AMIA Annu Symp Proc. 2017 Feb 10:2016:1880-1889. eCollection 2016.

Abstract

Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort.

MeSH terms

  • Data Collection
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval / methods*
  • Natural Language Processing*