The handling of missing data in molecular epidemiology studies

Manisha Desai; Jessica Kubo; Denise Esserman; Mary Beth Terry

doi:10.1158/1055-9965.EPI-10-1311

The handling of missing data in molecular epidemiology studies

Cancer Epidemiol Biomarkers Prev. 2011 Aug;20(8):1571-9. doi: 10.1158/1055-9965.EPI-10-1311. Epub 2011 Jul 12.

Authors

Manisha Desai¹, Jessica Kubo, Denise Esserman, Mary Beth Terry

Affiliation

¹ Quantitative Sciences Unit, Department of Medicine, Stanford University, Palo Alto, CA 94304, USA. manishad@stanford.edu

Abstract

Molecular epidemiology studies face a missing data problem, as biospecimen or imaging data are often collected on only a proportion of subjects eligible for study. We investigated all molecular epidemiology studies published as Research Articles, Short Communications, or Null Results in Brief in Cancer Epidemiology, Biomarkers & Prevention from January 1, 2009, to March 31, 2010, to characterize the extent that missing data were present and to elucidate how the issue was addressed. Of 278 molecular epidemiology studies assessed, most (95%) had missing data on a key variable (66%) and/or used availability of data (often, but not always the biomarker data) as inclusion criterion for study entry (45%). Despite this, only 10% compared subjects included in the analysis with those excluded from the analysis and 88% with missing data conducted a complete-case analysis, a method known to yield biased and inefficient estimates when the data are not missing completely at random. Our findings provide evidence that missing data methods are underutilized in molecular epidemiology studies, which may deleteriously affect the interpretation of results. We provide practical guidelines for the analysis and interpretation of molecular epidemiology studies with missing data.

Publication types

Review

MeSH terms

Bias
Data Collection / methods*
Data Interpretation, Statistical
Epidemiologic Methods*
Female
Humans
Male
Molecular Epidemiology / methods*
Research Design

Grants and funding

P30 CA124435/CA/NCI NIH HHS/United States