Background: Hard-to-reach population subgroups are typically investigated using convenience sampling, which may give biased estimates. Combining information from such surveys, a probability survey and clinic surveillance, can potentially minimize the bias. We developed a methodology to estimate the prevalence of undiagnosed HIV infection among men who have sex with men (MSM) in England and Wales aged 16-44 years in 2003, making fuller use of the available data than earlier work.
Methods: We performed a synthesis of three data sources: genitourinary medicine clinic surveillance (11 380 tests), a venue-based convenience survey including anonymous HIV testing (3702 MSM) and a general population sexual behaviour survey (134 MSM). A logistic regression model to predict undiagnosed infection was fitted to the convenience survey data and then applied to the MSMs in the population survey to estimate the prevalence of undiagnosed infection in the general MSM population. This estimate was corrected for selection biases in the convenience survey using clinic surveillance data. A sensitivity analysis addressed uncertainty in our assumptions.
Results: The estimated prevalence of undiagnosed HIV in MSM was 2.4% [95% confidence interval (95% CI 1.7-3.0%)], and between 1.6% (95% CI 1.1-2.0%) and 3.3% (95% CI 2.4-4.1%) depending on assumptions; corresponding to 5500 (3390-7180), 3610 (2180-4740) and 7570 (4790-9840) men, and undiagnosed fractions of 33, 24 and 40%, respectively.
Conclusions: Our estimates are consistent with earlier work that did not make full use of data sources. Reconciling data from multiple sources, including probability-, clinic- and venue-based convenience samples can reduce bias in estimates. This methodology could be applied in other settings to take full advantage of multiple imperfect data sources.