Background: Electronic health records (EHRs) have yet to fully capture social determinants of health (SDOH) due to challenges such as nonexistent or inconsistent data capture tools across clinics, lack of time, and the burden of extra steps for the clinician. However, patient clinical notes (unstructured data) may be a better source of patient-related SDOH information.
Objective: It is unclear how accurately EHR data reflect patients' lived experience of SDOH. The manual process of retrieving SDOH information from clinical notes is time-consuming and not feasible. We leveraged two high-throughput tools to identify SDOH mappings to structured and unstructured patient data: PatientExploreR and Electronic Medical Record Search Engine (EMERSE).
Methods: We included adult patients (≥18 years of age) receiving primary care for their diabetes at the University of California, San Francisco (UCSF), from January 1, 2018, to December 31, 2019. We used expert raters to develop a corpus using SDOH in the compendium as a knowledge base as targets for the natural language processing (NLP) text string mapping to find string stems, roots, and syntactic similarities in the clinical notes of patients with diabetes. We applied advanced built-in EMERSE NLP query parsers implemented with JavaCC.
Results: We included 4283 adult patients receiving primary care for diabetes at UCSF. Our study revealed that SDOH may be more significant in the lives of patients with diabetes than is evident from structured data recorded on EHRs. With the application of EMERSE NLP rules, we uncovered additional information from patient clinical notes on problems related to social connectionsisolation, employment, financial insecurity, housing insecurity, food insecurity, education, and stress.
Conclusions: We discovered more patient information related to SDOH in unstructured data than in structured data. The application of this technique and further investment in similar user-friendly tools and infrastructure to extract SDOH information from unstructured data may help to identify the range of social conditions that influence patients' disease experiences and inform clinical decision-making.
Keywords: EHR; NLP; diabetes; diabetes mellitus; diabetic; electronic health record; free text; machine learning; medical informatics applications; natural language processing; search engine; social determinants of health; text string; unstructured data.
© Shivani Mehta, Courtney R Lyles, Anna D Rubinsky, Kathryn E Kemper, Judith Auerbach, Urmimala Sarkar, Laura Gottlieb, William Brown III. Originally published in JMIR Medical Informatics (https://medinform.jmir.org).