OrthoDB and BUSCO update: annotation of orthologs with wider sampling of genomes

Nucleic Acids Res. 2024 Nov 13:gkae987. doi: 10.1093/nar/gkae987. Online ahead of print.

Abstract

OrthoDB (https://www.orthodb.org) offers evolutionary and functional annotations of orthologous genes in the widest sampling of eukaryotes, prokaryotes, and viruses, extending experimental gene function knowledge to newly sequenced genomes. We collect gene annotations, delineate hierarchical gene orthology and annotate the orthologous groups (OGs) with functional and evolutionary traits. OrthoDB is the leading resource for species diversity, striving to sample the most diverse and well-researched organisms with the highest quality genomic data. This update expands to include 5827 eukaryotic genomes. We have also added coding DNA sequences (CDSs) and gene loci coordinates. OrthoDB can be browsed, downloaded, or accessed using REST API, SPARQL/RDF and now also via API packages for Python and R Bioconductor. OrthoLoger (https://orthologer.ezlab.org), the tool used for inferring orthologs in OrthoDB, is now available as a Conda package and through BioContainers. ODB-mapper, a component of OrthoLoger, streamlines annotation of genes from newly sequenced genomes with OrthoDB evolutionary and functional descriptors. The benchmarking sets of universal single-copy orthologs (BUSCO), derived from OrthoDB, had correspondingly a major update. The BUSCO tool (https://busco.ezlab.org) has become a standard in genomics, uniquely capable of assessing both eukaryotic and prokaryotic species. It is applicable to gene sets, transcriptomes, genome assemblies and metagenomic bins.