In evaluating discriminatory performance of a new modality in a screening setting, a logistical constraint is that the prevalence of the disease of interest is typically very low. This implies that under a standard study design large numbers of subjects have to be evaluated using the new modality. However, if a predicate modality exists in clinical practice, one can base inclusion into the study of the new modality on the clinical results from the predicate to 'enrich' the population of diseased subjects in the study. If this enrichment is not accounted for when estimating sensitivity, specificity, and area under the ROC curve, these 'naive' estimates may be substantially biased compared with expected performance in the intended use population. We derive expressions for the magnitude of this bias in terms of correlations of modality scores. When such estimates are 'corrected' for the sampling weights using inverse probability weighting, the variances of the estimates of the above quantities are affected. We derive here analytic expressions for these variances. For a fixed number of diseased subjects, differential sampling increases the variance of the (corrected) estimates, all other things being equal. However, differential sampling also increases the number with disease for fixed total study size, which decreases the variance of the sensitivity and area under the ROC curve estimates, all other things being equal. The balance of these two effects determines the gain in efficiency when using enrichment and corrected estimates. These principles are illustrated with a simulation study motivated by the Digital Mammographic Imaging Screening Trial study, a trial of digital versus screen film mammography.
Copyright © 2011 John Wiley & Sons, Ltd.