Statistical analysis with missing exposure data measured by proxy respondents: a misclassification problem within a missing-data problem

Stat Med. 2014 Nov 10;33(25):4437-52. doi: 10.1002/sim.6238. Epub 2014 Jun 17.

Abstract

In studies of older adults, researchers often recruit proxy respondents, such as relatives or caregivers, when study participants cannot provide self-reports (e.g., because of illness). Proxies are usually only sought to report on behalf of participants with missing self-reports; thus, either a participant self-report or proxy report, but not both, is available for each participant. Furthermore, the missing-data mechanism for participant self-reports is not identifiable and may be nonignorable. When exposures are binary and participant self-reports are conceptualized as the gold standard, substituting error-prone proxy reports for missing participant self-reports may produce biased estimates of outcome means. Researchers can handle this data structure by treating the problem as one of misclassification within the stratum of participants with missing self-reports. Most methods for addressing exposure misclassification require validation data, replicate data, or an assumption of nondifferential misclassification; other methods may result in an exposure misclassification model that is incompatible with the analysis model. We propose a model that makes none of the aforementioned requirements and still preserves model compatibility. Two user-specified tuning parameters encode the exposure misclassification model. Two proposed approaches estimate outcome means standardized for (potentially) high-dimensional covariates using multiple imputation followed by propensity score methods. The first method is parametric and uses maximum likelihood to estimate the exposure misclassification model (i.e., the imputation model) and the propensity score model (i.e., the analysis model); the second method is nonparametric and uses boosted classification and regression trees to estimate both models. We apply both methods to a study of elderly hip fracture patients.

Keywords: exposure misclassification; gerontology; missing data; proxy respondents.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Aged, 80 and over
  • Baltimore
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Female
  • Hip Fractures / physiopathology
  • Humans
  • Male
  • Models, Statistical*
  • Propensity Score*
  • Regression Analysis*
  • Walking / physiology