Integrating data-driven and knowledge-driven approaches to analyze clinical notes with structured data for sarcopenia detection

Health Informatics J. 2024 Oct-Dec;30(4):14604582241300025. doi: 10.1177/14604582241300025.

Abstract

Background: Patients with sarcopenia often go undetected in busy clinical practices since the muscle measurements are not easily incorporated into routine clinical practice. The current research fills the gap by utilizing unstructured clinical notes combined with structured data from electronic health records (EHR), to increase sarcopenia detection. Methods: We developed and evaluated four approaches to first extract clinical note features, then integrate with structured data for sarcopenia detection models. Case studies were used to demonstrate the interpretation of the results and show the important association between predictors and outcomes. Results: Out of 1304 participants, 1055 were controls, 249 met at least one criterion for Sarcopenia. The best performing model which incorporated both data-driven and knowledge-driven approaches to integrate clinical note features demonstrated a higher mean area under the curve (AUC = 73.93%, (95% CI, 73.83-74.02)) compared to the baseline model (AUC 71.59%, (95%CI, 71.56-71.61)). The case study shows that the important clinical note predictors may contribute to detection of sarcopenia such as "cane", "walker", "unsteady", etc. Conclusions: Incorporating clinical note features in sarcopenia detection models can identify a greater number of patients at risk for sarcopenia, potentially leading to targeted muscle testing assessments and corresponding treatments to address sarcopenia.

Keywords: electronic health records; feature selection; natural language processing; predictive modeling; sarcopenia.

MeSH terms

  • Aged
  • Aged, 80 and over
  • Electronic Health Records* / statistics & numerical data
  • Female
  • Humans
  • Male
  • Middle Aged
  • Sarcopenia* / diagnosis