Purpose: We investigated time trends in validation performance characteristics for six sources of death data available within the Healthcare Integrated Research Database (HIRD) over 8 years.
Methods: We conducted a secondary analysis of a cohort of advanced cancer patients with linked National Death Index (NDI) data identified in the HIRD between 2010 and 2018. We calculated sensitivity, specificity, positive predictive value, and negative predictive value for six sources of death status data and an algorithm combining data from available sources using NDI data as the reference standard. Measures were calculated for each year of the study including all members in the cohort for at least 1 day in that year.
Results: We identified 27 396 deaths from any source among 40 692 cohort members. Between 2010 and 2018, the sensitivity of the Death Master File (DMF) decreased from 0.77 (95% CI = 0.76, 0.79) to 0.12 (95% CI = 0.11, 0.14). In contrast, the sensitivity of online obituary data increased from 0.43 (95% CI = 0.41, 0.45) in 2012 to 0.71 (95% CI = 0.68, 0.73) in 2018. The sensitivity of the composite algorithm remained above 0.83 throughout the study period. PPV was observed to be high from 2010 to 2016 and decrease thereafter for all sources. Specificity and NPV remained at high levels throughout the study.
Conclusions: We observed that the sensitivity of mortality data sources compared with the NDI could change substantially between 2010 and 2018. Other validation characteristics were less variable. Combining multiple sources of mortality data may be necessary to achieve adequate performance particularly for multiyear studies.
Keywords: mortality; observational data; real world data; validation.
© 2024 John Wiley & Sons Ltd.