Predicting novel mosquito-associated viruses from metatranscriptomic dark matter

NAR Genom Bioinform. 2024 Jul 2;6(3):lqae077. doi: 10.1093/nargab/lqae077. eCollection 2024 Sep.

Abstract

The exponential growth of metatranscriptomic studies dedicated to arboviral surveillance in mosquitoes has yielded an unprecedented volume of unclassified sequences referred to as the virome dark matter. Mosquito-associated viruses are classified based on their host range into Mosquito-specific viruses (MSV) or Arboviruses. While MSV replication is restricted to mosquito cells, Arboviruses infect both mosquito vectors and vertebrate hosts. We developed the MosViR pipeline designed to identify complex genomic discriminatory patterns for predicting novel MSV or Arboviruses from viral contigs as short as 500 bp. The pipeline combines the predicted probability score from multiple predictive models, ensuring a robust classification with Area Under ROC (AUC) values exceeding 0.99 for test datasets. To assess the practical utility of MosViR in actual cases, we conducted a comprehensive analysis of 24 published mosquito metatranscriptomic datasets. By mining this metatranscriptomic dark matter, we identified 605 novel mosquito-associated viruses, with eight putative novel Arboviruses exhibiting high probability scores. Our findings highlight the limitations of current homology-based identification methods and emphasize the potentially transformative impact of the MosViR pipeline in advancing the classification of mosquito-associated viruses. MosViR offers a powerful and highly accurate tool for arboviral surveillance and for elucidating the complexities of the mosquito RNA virome.