Four high-quality draft genome assemblies of the marine heterotrophic nanoflagellate Cafeteria roenbergensis

Sci Data. 2020 Jan 21;7(1):29. doi: 10.1038/s41597-020-0363-4.

Abstract

The heterotrophic stramenopile Cafeteria roenbergensis is a globally distributed marine bacterivorous protist. This unicellular flagellate is host to the giant DNA virus CroV and the virophage mavirus. We sequenced the genomes of four cultured C. roenbergensis strains and generated 23.53 Gb of Illumina MiSeq data (99-282 × coverage per strain) and 5.09 Gb of PacBio RSII data (13-45 × coverage). Using the Canu assembler and customized curation procedures, we obtained high-quality draft genome assemblies with a total length of 34-36 Mbp per strain and contig N50 lengths of 148 kbp to 464 kbp. The C. roenbergensis genome has a GC content of ~70%, a repeat content of ~28%, and is predicted to contain approximately 7857-8483 protein-coding genes based on a combination of de novo, homology-based and transcriptome-supported annotation. These first high-quality genome assemblies of a bicosoecid fill an important gap in sequenced stramenopile representatives and enable a more detailed evolutionary analysis of heterotrophic protists.

Publication types

  • Dataset
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Composition
  • Genome*
  • Molecular Sequence Annotation
  • Sequence Analysis, DNA
  • Stramenopiles / genetics*
  • Transcriptome