Naturally occurring isolates of baculoviruses, such as the Bombyx mori nucleopolyhedrovirus (BmNPV), usually consist of numerous genetically different haplotypes. Deciphering the different haplotypes of such isolates is hampered by the large size of the dsDNA genome, as well as the short read length of next generation sequencing (NGS) techniques that are widely applied for baculovirus isolate characterization. In this study, we addressed this challenge by combining the accuracy of NGS to determine single nucleotide variants (SNVs) as genetic markers with the long read length of Nanopore sequencing technique. This hybrid approach allowed the comprehensive analysis of genetically homogeneous and heterogeneous isolates of BmNPV. Specifically, this allowed the identification of two putative major haplotypes in the heterogeneous isolate BmNPV-Ja by SNV position linkage. SNV positions, which were determined based on NGS data, were linked by the long Nanopore reads in a Position Weight Matrix. Using a modified Expectation-Maximization algorithm, the Nanopore reads were assigned according to the occurrence of variable SNV positions by machine learning. The cohorts of reads were de novo assembled, which led to the identification of BmNPV haplotypes. The method demonstrated the strength of the combined approach of short- and long-read sequencing techniques to decipher the genetic diversity of baculovirus isolates.
Keywords: Baculoviridae; alphabaculovirus; genotype; long read; machine learning; short read; viral quasispecies.