Collection and curation of prokaryotic genome assemblies from type strains at NCBI

Int J Syst Evol Microbiol. 2023 Feb;73(1):005707. doi: 10.1099/ijsem.0.005707.

Abstract

The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for a taxonomic name and play an important role in comparative genomics, especially for taxon verification or reclassification. The National Center for Biotechnology Information (NCBI) collects and curates information on prokaryotic type strains and genomes from type strains. The average nucleotide identity (ANI)-based quality control processes introduced at NCBI to verify the genomes from type strains and improve related sequence records are detailed here. Using the curated genomes from type strains as reference, the taxonomy of over 1.1 million GenBank genomes were verified and the taxonomy of over 7000 new submissions before acceptance to GenBank and over 1800 existing genomes in GenBank were reclassified.

Keywords: ANI; GenBank; genome; taxonomy; type material; type strain.

MeSH terms

  • Bacterial Typing Techniques
  • Base Composition
  • DNA, Bacterial / genetics
  • Databases, Nucleic Acid*
  • Fatty Acids* / chemistry
  • Phylogeny
  • RNA, Ribosomal, 16S / genetics
  • Reproducibility of Results
  • Sequence Analysis, DNA

Substances

  • RNA, Ribosomal, 16S
  • DNA, Bacterial
  • Fatty Acids