Bias due to composite reference standards in diagnostic accuracy studies

Stat Med. 2016 Apr 30;35(9):1454-70. doi: 10.1002/sim.6803. Epub 2015 Nov 10.

Abstract

Composite reference standards (CRSs) have been advocated in diagnostic accuracy studies in the absence of a perfect reference standard. The rationale is that combining results of multiple imperfect tests leads to a more accurate reference than any one test in isolation. Focusing on a CRS that classifies subjects as disease positive if at least one component test is positive, we derive algebraic expressions for sensitivity and specificity of this CRS, sensitivity and specificity of a new (index) test compared with this CRS, as well as the CRS-based prevalence. We use as a motivating example the problem of evaluating a new test for Chlamydia trachomatis, an asymptomatic disease for which no gold-standard test exists. As the number of component tests increases, sensitivity of this CRS increases at the expense specificity, unless all tests have perfect specificity. Therefore, such a CRS can lead to significantly biased accuracy estimates of the index test. The bias depends on disease prevalence and accuracy of the CRS. Further, conditional dependence between the CRS and index test can lead to over-estimation of index test accuracy estimates. This commonly-used CRS combines results from multiple imperfect tests in a way that ignores information and therefore is not guaranteed to improve over a single imperfect reference unless each component test has perfect specificity, and the CRS is conditionally independent of the index test. When these conditions are not met, as in the case of C. trachomatis testing, more realistic statistical models should be researched instead of relying on such CRSs.

Keywords: composite; conditional dependence; imperfect reference; sensitivity; specificity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bias*
  • Chlamydia Infections / diagnosis
  • Chlamydia trachomatis
  • Diagnosis*
  • Diagnostic Tests, Routine / standards*
  • Diagnostic Tests, Routine / statistics & numerical data
  • Humans
  • Models, Statistical
  • Reference Standards*
  • Reproducibility of Results
  • Sensitivity and Specificity