Illustrations on Using the Distribution of a P-value in High Dimensional Data Analyses

Adv Appl Stat Sci. 2010 Feb;1(2):191-213.

Abstract

Several statistical methods have recently been developed that use the distribution of P-values from multiple tests of hypotheses to analyze data from high-dimensional experiments. These methods are only as valid as the P-values that were derived from test statistics. If an incorrect distribution for a test statistic was used, the P-value will not be valid and the distribution of P-values from multiple test statistics could give misleading results. Moreover, if the correct distribution of a test statistic is used, a distribution of P-values may still give misleading results if P-values are correlated. A primary focus of this paper is on the distribution of a P-value under a null hypothesis, and the test statistic that is considered is the number of rejected null hypotheses. Two issues are demonstrated using six data examples, two that are simulated and four from actual microarray experiments. The results provide some insight into how much of an effect might be introduced into a distribution of P-values if invalid P-values are computed or if P-values are correlated. Additional illustration is given regarding the distribution of a P-value under an alternative hypothesis and some approaches to modeling it are presented.

Keywords: FDR; correlation; microarray; multiple testing; type I error.