Comparing medical history data derived from electronic health records and survey answers in the All of Us Research Program

Lina Sulieman; Robert M Cronin; Robert J Carroll; Karthik Natarajan; Kayla Marginean; Brandy Mapes; Dan Roden; Paul Harris; Andrea Ramirez

doi:10.1093/jamia/ocac046

Comparing medical history data derived from electronic health records and survey answers in the All of Us Research Program

J Am Med Inform Assoc. 2022 Jun 14;29(7):1131-1141. doi: 10.1093/jamia/ocac046.

Authors

Lina Sulieman¹, Robert M Cronin^{1

2}, Robert J Carroll¹, Karthik Natarajan³, Kayla Marginean⁴, Brandy Mapes⁴, Dan Roden^{1

5}, Paul Harris^{1

4}, Andrea Ramirez^{5

6}

Affiliations

¹ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
² Department of Medicine, The Ohio State University, Columbus, Ohio, USA.
³ Department of Biomedical Informatics, Columbia University, New York, New York, USA.
⁴ Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
⁵ Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
⁶ Office of data and analytics, All of Us Research Program, National Institutes of Health, Bethesda, Maryland, USA.

Abstract

Objective: A participant's medical history is important in clinical research and can be captured from electronic health records (EHRs) and self-reported surveys. Both can be incomplete, EHR due to documentation gaps or lack of interoperability and surveys due to recall bias or limited health literacy. This analysis compares medical history collected in the All of Us Research Program through both surveys and EHRs.

Materials and methods: The All of Us medical history survey includes self-report questionnaire that asks about diagnoses to over 150 medical conditions organized into 12 disease categories. In each category, we identified the 3 most and least frequent self-reported diagnoses and retrieved their analogues from EHRs. We calculated agreement scores and extracted participant demographic characteristics for each comparison set.

Results: The 4th All of Us dataset release includes data from 314 994 participants; 28.3% of whom completed medical history surveys, and 65.5% of whom had EHR data. Hearing and vision category within the survey had the highest number of responses, but the second lowest positive agreement with the EHR (0.21). The Infectious disease category had the lowest positive agreement (0.12). Cancer conditions had the highest positive agreement (0.45) between the 2 data sources.

Discussion and conclusion: Our study quantified the agreement of medical history between 2 sources-EHRs and self-reported surveys. Conditions that are usually undocumented in EHRs had low agreement scores, demonstrating that survey data can supplement EHR data. Disagreement between EHR and survey can help identify possible missing records and guide researchers to adjust for biases.

Keywords: All of Us; electronic health records; medical history; phenotype; survey.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Documentation
Electronic Health Records*
Humans
Information Storage and Retrieval
Population Health*
Surveys and Questionnaires

Grants and funding

U2C OD023196/OD/NIH HHS/United States