Judging a plethora of p-values: how to contend with the problem of multiple testing--part 10 of a series on evaluation of scientific publications

Dtsch Arztebl Int. 2010 Jan;107(4):50-6. doi: 10.3238/arztebl.2010.0050. Epub 2010 Jan 29.

Abstract

Background: When reading reports of medical research findings, one is usually confronted with p-values. Publications typically contain not just one p-value, but an abundance of them, mostly accompanied by the word "significant." This article is intended to help readers understand the problem of multiple p-values and how to deal with it.

Methods: When multiple p-values appear in a single study, this is usually a problem of multiple testing. A number of valid approaches are presented for dealing with the problem. This article is based on classical statistical methods as presented in many textbooks and on selected specialized literature.

Results: Conclusions from publications with many "significant" results should be judged with caution if the authors have not taken adequate steps to correct for multiple testing. Researchers should define the goal of their study clearly at the outset and, if possible, define a single primary endpoint a priori. If the study is of an exploratory or hypothesis-generating nature, it should be clearly stated that any positive results might be due to chance and will need to be confirmed in further targeted studies.

Conclusions: It is recommended that the word "significant" be used and interpreted with care. Readers should assess articles critically with regard to the problem of multiple testing. Authors should state the number of tests that were performed. Scientific articles should be judged on their scientific merit rather than by the number of times they contain the word "significant."

MeSH terms

  • Clinical Trials as Topic*
  • Confidence Intervals*
  • Data Interpretation, Statistical*
  • Endpoint Determination / methods*
  • Evidence-Based Medicine / methods*
  • Periodicals as Topic*