Pluribus-Exploring the Limits of Error Correction Using a Suffix Tree

IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1378-1388. doi: 10.1109/TCBB.2016.2586060. Epub 2016 Jun 29.

Abstract

Next generation sequencing technologies enable efficient and cost-effective genome sequencing. However, sequencing errors increase the complexity of the de novo assembly process, and reduce the quality of the assembled sequences. Many error correction techniques utilizing substring frequencies have been developed to mitigate this effect. In this paper, we present a novel and effective method called Pluribus, for correcting sequencing errors using a generalized suffix trie. Pluribus utilizes multiple manifestations of an error in the trie to accurately identify errors and suggest corrections. We show that Pluribus produces the least number of false positives across a diverse set of real sequencing datasets when compared to other methods. Furthermore, Pluribus can be used in conjunction with other contemporary error correction methods to achieve higher levels of accuracy than either tool alone. These increases in error correction accuracy are also realized in the quality of the contigs that are generated during assembly. We explore, in-depth, the behavior of Pluribus , to explain the observed improvement in accuracy and assembly performance. Pluribus is freely available at http://compbio.

Case: edu/pluribus/.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology / methods*
  • Databases, Genetic
  • High-Throughput Nucleotide Sequencing / methods*
  • Sequence Analysis, DNA / methods*