Comparisons between self-report and clinical psychiatric measures have revealed considerable disagreement. It is unsafe to consider these measures as directly equivalent, so it would be valuable to have a reliable recalibration of one measure in terms of the other. We evaluated multiple imputation incorporating a Bayesian approach, and a fully Bayesian method, to recalibrate diagnoses from a self-report survey interview in terms of those from a clinical interview with data from a two-phase national household survey for a practical application, and artificial data for simulation studies. The most important factors in obtaining a precise and accurate 'clinical' prevalence estimate from self-report data were (a) good agreement between the two diagnostic measures and (b) a sufficiently large set of calibration data with diagnoses based on both kinds of interview from the same group of subjects. From the case study, calibration data on 612 subjects were sufficient to yield estimates of the total prevalence of anxiety, depression or neurosis with a precision in the region of +/-2%. The limitations of the calibration method demonstrate the need to increase agreement between survey and reference measures by improving lay interviews and their diagnostic algorithms.