Estimating effect sizes in genome-wide association studies

Behav Genet. 2010 May;40(3):394-403. doi: 10.1007/s10519-009-9321-9. Epub 2010 Jan 6.

Abstract

Knowledge about the proportion of markers without effects (p( 0 )) and the effect sizes in large scale genetic studies is important to understand the basic properties of the data and for applications such as the control of false discoveries and designing adequately powered replication studies. Many p(0) estimators have been proposed. However, high dimensional data sets typically comprise a large range of effect sizes and it is unclear whether the estimated p(0) is related to the whole range, including markers with very small effects, or just the markers with large effects. In this article we develop an estimation procedure that can be used in all scenarios where the test statistic distribution under the alternative can be characterized by a single parameter (e.g. non-centrality parameter of the non-central chi-square or F distribution). The estimation procedure starts with estimating the largest effect in the data set, then the second largest effect, then the third largest effect, etc. We stop when the effect sizes become so small that they cannot be estimated precisely anymore for the given sample size. Once the individual effect sizes are estimated, they can be used to calculate an interpretable estimate of p(0). Thus, our method results in both an interpretable estimate of p(0) as well as estimates of the effect sizes present in the whole marker set by repeatedly estimating a single parameter. Simulations suggest that the effects are estimated precisely with only a small upward bias. The R codes that compute the effect estimates are freely downloadable from the website: http://www.people.vcu.edu/~jbukszar/.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Alleles
  • Case-Control Studies
  • Computer Simulation
  • Genome-Wide Association Study*
  • Genomics
  • Humans
  • Likelihood Functions
  • Models, Genetic
  • Models, Statistical*
  • Odds Ratio
  • Regression Analysis
  • Reproducibility of Results
  • Research Design
  • Sample Size