Background and objective: Usage of data from electronic health records (EHRs) in clinical research is increasing, but there is little empirical knowledge of the data needed to support multiple types of research these sources support. This study seeks to characterize the types and patterns of data usage from EHRs for clinical research.
Materials and methods: We analyzed the data requirements of over 100 retrospective studies by mapping the selection criteria and study variables to data elements of two standard data dictionaries, one from the healthcare domain and the other from the clinical research domain. We also contacted study authors to validate our results.
Results: The majority of variables mapped to one or to both of the two dictionaries. Studies used an average of 4.46 (range 1-12) data element types in the selection criteria and 6.44 (range 1-15) in the study variables. The most frequently used items (e.g., procedure, condition, medication) are often available in coded form in EHRs. Study criteria were frequently complex, with 49 of 104 studies involving relationships between data elements and 22 of the studies using aggregate operations for data variables. Author responses supported these findings.
Discussion and conclusion: The high proportion of mapped data elements demonstrates the significant potential for clinical data warehousing to facilitate clinical research. Unmapped data elements illustrate the difficulty in developing a complete data dictionary.
Keywords: Data models; Data standards; Queries.
Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.