Purpose: The aim of this study was to assess the impact of missing death data on survival analyses conducted in an oncology EHR-derived database.
Methods: The study was conducted using the Flatiron Health oncology database and the National Death Index (NDI) as a gold standard. Three analytic frameworks were evaluated in advanced non-small cell lung cancer (aNSCLC) patients: median overall survival [mOS]), relative risk estimates conducted within the EHR-derived database, and "external control arm" analyses comparing an experimental group augmented with mortality data from the gold standard to a control group from the EHR-derived database only. The hazard ratios (HRs) obtained within the EHR-derived database (91% sensitivity) and the external control arm analyses, were compared with results when both groups were augmented with mortality data from the gold standard. The above analyses were repeated using simulated lower mortality sensitivities to understand the impact of more extreme levels of missing deaths.
Results: Bias in mOS ranged from modest (0.6-0.9 mos.) in the EHR-derived cohort with (91% sensitivity) to substantial when lower sensitivities were generated through simulation (3.3-9.7 mos.). Overall, small differences were observed in the HRs for the EHR-derived cohort across comparative analyses when compared with HRs obtained using the gold standard data source. When only one treatment arm was subject to estimation bias, the bias was slightly more pronounced, but increased substantially when lower sensitivities were simulated.
Conclusions: The impact on survival analysis is minimal with high mortality sensitivity with only modest impact associated within external control arm applications.
Keywords: lung cancer; missing deaths; overall survival; pharmacoepidemiology; survival analyses.
© 2019 The Authors Pharmacoepidemiology and Drug Safety Published by John Wiley & Sons Ltd.