A Scalable Bayesian Method for Integrating Functional Information in Genome-wide Association Studies

Am J Hum Genet. 2017 Sep 7;101(3):404-416. doi: 10.1016/j.ajhg.2017.08.002. Epub 2017 Aug 24.

Abstract

Genome-wide association studies (GWASs) have identified many complex loci. However, most loci reside in noncoding regions and have unknown biological functions. Integrative analysis that incorporates known functional information into GWASs can help elucidate the underlying biological mechanisms and prioritize important functional variants. Hence, we develop a flexible Bayesian variable selection model with efficient computational techniques for such integrative analysis. Different from previous approaches, our method models the effect-size distribution and probability of causality for variants with different annotations and jointly models genome-wide variants to account for linkage disequilibrium (LD), thus prioritizing associations based on the quantification of the annotations and allowing for multiple associated variants per locus. Our method dramatically improves both computational speed and posterior sampling convergence by taking advantage of the block-wise LD structures in human genomes. In simulations, our method accurately quantifies the functional enrichment and performs more powerfully for prioritizing the true associations than alternative methods, where the power gain is especially apparent when multiple associated variants in LD reside in the same locus. We applied our method to an in-depth GWAS of age-related macular degeneration with 33,976 individuals and 9,857,286 variants. We find the strongest enrichment for causality among non-synonymous variants (54× more likely to be causal, 1.4× larger effect sizes) and variants in transcription, repressed Polycomb, and enhancer regions, as well as identify five additional candidate loci beyond the 32 known AMD risk loci. In conclusion, our method is shown to efficiently integrate functional information in GWASs, helping identify functional associated-variants and underlying biology.

Keywords: AMD; BVSR; Bayesian variable selection regression; EM; GWAS; MCMC; Markov chain Monte Carlo; age-related macular degeneration; expectation-maximization; functional information; genome-wide association study.

MeSH terms

  • Bayes Theorem*
  • Genetic Markers / genetics
  • Genome-Wide Association Study / methods*
  • Humans
  • Linkage Disequilibrium*
  • Macular Degeneration / genetics*
  • Macular Degeneration / pathology
  • Skin Neoplasms / genetics*
  • Skin Neoplasms / pathology

Substances

  • Genetic Markers