Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing

Nat Commun. 2019 Oct 11;10(1):4660. doi: 10.1038/s41467-019-12493-y.

Abstract

Whole-genome sequencing using sequencing technologies such as Illumina enables the accurate detection of small-scale variants but provides limited information about haplotypes and variants in repetitive regions of the human genome. Single-molecule sequencing (SMS) technologies such as Pacific Biosciences and Oxford Nanopore generate long reads that can potentially address the limitations of short-read sequencing. However, the high error rate of SMS reads makes it challenging to detect small-scale variants in diploid genomes. We introduce a variant calling method, Longshot, which leverages the haplotype information present in SMS reads to accurately detect and phase single-nucleotide variants (SNVs) in diploid genomes. We demonstrate that Longshot achieves very high accuracy for SNV detection using whole-genome Pacific Biosciences data, outperforms existing variant calling methods, and enables variant detection in duplicated regions of the genome that cannot be mapped using short reads.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Diploidy
  • Genome, Human
  • Haplotypes
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Polymorphism, Single Nucleotide*
  • Repetitive Sequences, Nucleic Acid
  • Sequence Analysis, DNA / methods
  • Software*
  • Whole Genome Sequencing / methods*