Background: Acute lymphoblastic leukemia (ALL) accounts for almost one quarter of pediatric cancer in the United States. Despite cooperative group therapeutic trials, there remains a paucity of large cohort data on which to conduct epidemiology and comparative effectiveness research studies.
Research design: We designed a 3-step process utilizing International Classification of Diseases-9 Clinical Modification (ICD-9) discharge diagnoses codes and chemotherapy exposure data contained in the Pediatric Health Information System administrative database to establish a cohort of children with de novo ALL. This process was validated by chart review at 1 of the pediatric centers.
Results: An ALL cohort of 8733 patients was identified with a sensitivity of 88% [95% confidence interval (CI), 83%-92%] and a positive predictive value of 93% (95% CI, 89%-96%). The 30-day all cause inpatient case fatality rate using this 3-step process was 0.80% (95% CI, 0.63%-1.01%), which was significantly different than the case fatality rate of 1.40% (95% CI, 1.23%-1.60%) when ICD-9 codes alone were used.
Conclusions: This is the first report of assembly and validation of a cohort of de novo ALL patients from a database representative of free-standing children's hospitals across the United States. Our data demonstrate that the use of ICD-9 codes alone to establish cohorts will lead to substantial patient misclassification and result in biased outcome estimates. Systematic methods beyond the use of just ICD-9 codes must be used before analysis to establish accurate cohorts of patients with malignancy. A similar approach should be followed when establishing future cohorts from administrative data.