Are these data real? Statistical methods for the detection of data fabrication in clinical trials

Sanaa Al-Marzouki; Stephen Evans; Tom Marshall; Ian Roberts

doi:10.1136/bmj.331.7511.267

Are these data real? Statistical methods for the detection of data fabrication in clinical trials

BMJ. 2005 Jul 30;331(7511):267-70. doi: 10.1136/bmj.331.7511.267.

Authors

Sanaa Al-Marzouki¹, Stephen Evans, Tom Marshall, Ian Roberts

Affiliation

¹ Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London WC1E 7HT.

Abstract

Objectives: To test the application of statistical methods to detect data fabrication in a clinical trial.

Setting: Data from two clinical trials: a trial of a dietary intervention for cardiovascular disease and a trial of a drug intervention for the same problem.

Outcome measures: Baseline comparisons of means and variances of cardiovascular risk factors; digit preference overall and its pattern by group.

Results: In the dietary intervention trial, variances for 16 of the 22 variables available at baseline were significantly different, and 10 significant differences were seen in means for these variables. Some of these P values were extraordinarily small. Distributions of the final recorded digit were significantly different between the intervention and the control group at baseline for 14/22 variables in the dietary trial. In the drug trial, only five variables were available, and no significant differences between the groups for baseline values in means or variances or digit preference were seen.

Conclusions: Several statistical features of the data from the dietary trial are so strongly suggestive of data fabrication that no other explanation is likely.

MeSH terms

Adult
Cardiovascular Diseases / prevention & control
Chi-Square Distribution
Clinical Trials as Topic / standards*
Clinical Trials as Topic / statistics & numerical data
Data Collection / standards*
Data Collection / statistics & numerical data
Data Interpretation, Statistical
Diet
Humans
Middle Aged
Multicenter Studies as Topic
Random Allocation
Randomized Controlled Trials as Topic / standards
Scientific Misconduct / statistics & numerical data*