Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies

Bjarni V Halldórsson; Vineet Bafna; Ross Lippert; Russell Schwartz; Francisco M De La Vega; Andrew G Clark; Sorin Istrail

doi:10.1101/gr.2570004

Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies

Genome Res. 2004 Aug;14(8):1633-40. doi: 10.1101/gr.2570004.

Authors

Bjarni V Halldórsson¹, Vineet Bafna, Ross Lippert, Russell Schwartz, Francisco M De La Vega, Andrew G Clark, Sorin Istrail

Affiliation

¹ Celera/Applied Biosystems, Rockville, Maryland 20850, USA.

Abstract

It is widely hoped that the study of sequence variation in the human genome will provide a means of elucidating the genetic component of complex diseases and variable drug responses. A major stumbling block to the successful design and execution of genome-wide disease association studies using single-nucleotide polymorphisms (SNPs) and linkage disequilibrium is the enormous number of SNPs in the human genome. This results in unacceptably high costs for exhaustive genotyping and presents a challenging problem of statistical inference. Here, we present a new method for optimally selecting minimum informative subsets of SNPs, also known as "tagging" SNPs, that is efficient for genome-wide selection. We contrast this method to published methods including haplotype block tagging, that is, grouping SNPs into segments of low haplotype diversity and typing a subset of the SNPs that can discriminate all common haplotypes within the blocks. Because our method does not rely on a predefined haplotype block structure and makes use of the weaker correlations that occur across neighboring blocks, it can be effectively applied across chromosomal regions with both high and low local linkage disequilibrium. We show that the number of tagging SNPs selected is substantially smaller than previously reported using block-based approaches and that selecting tagging SNPs optimally can result in a two- to threefold savings over selecting random SNPs.

MeSH terms

Algorithms
Chromosomes, Human, Pair 22
Genetic Variation
Haplotypes*
Humans
Linkage Disequilibrium
Models, Genetic
Polymorphism, Single Nucleotide*
Research Design