This is the first paper of a series of two regarding the construction of data quality (DQ) assured repositories for the reuse of information on infant feeding from birth until two years old. This first paper justifies the need for such repositories and describes the design of a process to construct them from Electronic Health Records (EHR). As a result, Part 1 proposes a computational process to obtain quality-assured datasets represented by a canonical structure extracted from raw data from multiple EHR. For this, 13 steps were defined to ensure the harmonization, standardization, completion, de-duplication, and consistency of the dataset content. Moreover, the quality of the input and output data for each of these steps is controlled according to eight DQ dimensions: predictive value, correctness, duplication, consistency, completeness, contextualization, temporal-stability and spatial-stability. The second paper of the series will describe the application of this computational process to construct the first quality-assured repository for the reuse of information on infant feeding in the perinatal period aimed at the monitoring of clinical activities and research.
Keywords: Breastfeeding; Data analysis; Data quality; Electronic health record; Perinatal care.
Copyright © 2015 Elsevier Ltd. All rights reserved.