Enhanced de novo assembly of high throughput pyrosequencing data using whole genome mapping

PLoS One. 2013 Apr 17;8(4):e61762. doi: 10.1371/journal.pone.0061762. Print 2013.

Abstract

Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla NDM-1 plasmid, and a novel bacteriophage, without separately purifying them to homogeneity.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Base Sequence
  • Chromosome Mapping / methods*
  • DNA, Bacterial / genetics
  • Genome, Bacterial / genetics*
  • High-Throughput Nucleotide Sequencing / methods*
  • Molecular Sequence Data
  • Operon / genetics
  • Providencia / genetics*
  • RNA, Ribosomal / genetics
  • Sequence Alignment
  • Sequence Analysis, DNA
  • Temperature*

Substances

  • DNA, Bacterial
  • RNA, Ribosomal

Associated data

  • GENBANK/JN687470
  • GENBANK/JX296113

Grants and funding

This study was supported by the United States Army Medical Command Policies 09-050 and 11-035, and was partially funded by grants C0709_12_WR and I0361_12_WR from the Global Emerging Infections Surveillance and Response System, a Division of the Armed Forces Health Surveillance Center. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.