A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms

BMC Genomics. 2017 May 24;18(Suppl 4):387. doi: 10.1186/s12864-017-3735-1.

Abstract

Background: With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries.

Results: We develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory.

Conclusions: Our strategy minimizes memory consumption while simultaneously obtaining comparable or improved accuracy over existing algorithms. It provides support for incremental updates of assemblies when new libraries become available.

Keywords: Alternative splicing; Gene expression; RNA-Seq; Transcriptome assembly.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Animals
  • Diptera / genetics
  • Drosophila melanogaster / genetics
  • Gene Expression Profiling / methods*
  • Mole Rats / genetics
  • RNA Splicing
  • Sequence Analysis, RNA