Metagenomic techniques have facilitated the discovery of thousands of viruses, yet because samples are often highly biodiverse, fundamental data on the specific cellular hosts are usually missing. Numerous gastrointestinal viruses linked to human or animal diseases are affected by this, preventing research into their medical or veterinary importance. Here, we developed a computational workflow for the prediction of viral hosts from complex metagenomic datasets. We applied it to seven lineages of gastrointestinal cressdnaviruses using 1,124 metagenomic datasets, predicting hosts of four lineages. The Redondoviridae, strongly associated to human gum disease (periodontitis), were predicted to infect Entamoeba gingivalis, an oral pathogen itself involved in periodontitis. The Kirkoviridae, originally linked to fatal equine disease, were predicted to infect a variety of parabasalid protists, including Dientamoeba fragilis in humans. Two viral lineages observed in human diarrhoeal disease (CRESSV1 and CRESSV19, i.e. pecoviruses and hudisaviruses) were predicted to infect Blastocystis spp. and Endolimax nana respectively, protists responsible for millions of annual human infections. Our prediction approach is adaptable to any virus lineage and requires neither training datasets nor host genome assemblies. Two host predictions (for the Kirkoviridae and CRESSV1 lineages) could be independently confirmed as virus-host relationships using endogenous viral elements identified inside host genomes, while a further prediction (for the Redondoviridae) was strongly supported as a virus-host relationship using a case-control screening experiment of human oral plaques.
Keywords: Redondoviridae; cressdnavirus; host identification; metagenomics; periodontitis; protist.
© The Author(s) 2022. Published by Oxford University Press.