M(3)-S: a genotype calling method incorporating information from samples with known genotypes

BMC Bioinformatics. 2015 Dec 3:16:403. doi: 10.1186/s12859-015-0824-5.

Abstract

Background: A key challenge in analyzing high throughput Single Nucleotide Polymorphism (SNP) arrays is the accurate inference of genotypes for SNPs with low minor allele frequencies. A number of calling algorithms have been developed to infer genotypes for common SNPs, but they are limited in their performance in calling rare SNPs. The existing algorithms can be broadly classified into three categories, including: population-based methods, SNP-based methods, and a hybrid of the two approaches. Despite the relatively better performance of the hybrid approach, it is still challenging to analyze rare SNPs.

Results: We propose to utilize information from samples with known genotypes to develop a two stage genotyping procedure, namely M(3)-S, for rare SNP calling. This new approach can improve genotyping accuracy through clearly defining the boundaries of genotype clusters from samples with known genotypes, and enlarge the call rate by combining the simulated data based on the inferred genotype clusters information with the study population.

Conclusions: Applications to real data demonstrates that this new approach M(3)-S outperforms existing methods in calling rare SNPs.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Gene Frequency / genetics*
  • Genome-Wide Association Study
  • Genotype
  • Genotyping Techniques / methods*
  • Humans
  • Polymorphism, Single Nucleotide / genetics*