Rare variant association testing for next-generation sequencing data via hierarchical clustering

Hum Hered. 2012;74(3-4):165-71. doi: 10.1159/000346022. Epub 2013 Apr 11.

Abstract

Objectives: It is thought that a proportion of the genetic susceptibility to complex diseases is due to low-frequency and rare variants. Next-generation sequencing in large populations facilitates the detection of rare variant associations to disease risk. In order to achieve adequate power to detect association at low-frequency and rare variants, locus-specific statistical methods are being developed that combine information across variants within a functional unit and test for association with this enriched signal through so-called burden tests.

Methods: We propose a hierarchical clustering approach and a similarity kernel-based association test for continuous phenotypes. This method clusters individuals into groups, within which samples are assumed to be genetically similar, and subsequently tests the group effects among the different clusters.

Results: The power of this approach is comparable to that of collapsing methods when causal variants have the same direction of effect, but its power is significantly higher compared to burden tests when both protective and risk variants are present in the region of interest. Overall, we observe that the Sequence Kernel Association Test (SKAT) is the most powerful approach under the allelic architectures considered.

Conclusions: In our overall comparison, we find the analytical framework within which SKAT operates to yield higher power and to control type I error appropriately.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Genetic Association Studies
  • Genetic Predisposition to Disease*
  • Genetic Variation*
  • Humans
  • Models, Genetic*
  • Models, Statistical*
  • Phenotype
  • Sequence Analysis, DNA