Efficient study design for next generation sequencing

Genet Epidemiol. 2011 May;35(4):269-77. doi: 10.1002/gepi.20575.

Abstract

Next Generation Sequencing represents a powerful tool for detecting genetic variation associated with human disease. Because of the high cost of this technology, it is critical that we develop efficient study designs that consider the trade-off between the number of subjects (n) and the coverage depth (µ). How we divide our resources between the two can greatly impact study success, particularly in pilot studies. We propose a strategy for selecting the optimal combination of n and µ for studies aimed at detecting rare variants and for studies aimed at detecting associations between rare or uncommon variants and disease. For detecting rare variants, we find the optimal coverage depth to be between 2 and 8 reads when using the likelihood ratio test. For association studies, we find the strategy of sequencing all available subjects to be preferable. In deriving these combinations, we provide a detailed analysis describing the distribution of depth across a genome and the depth needed to identify a minor allele in an individual. The optimal coverage depth depends on the aims of the study, and the chosen depth can have a large impact on study success.

MeSH terms

  • Alleles
  • Genetic Predisposition to Disease
  • Genome, Human
  • Genome-Wide Association Study / methods*
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Polymorphism, Single Nucleotide
  • Research Design*
  • Sample Size
  • Sequence Analysis, DNA*