A Survey of Virus Recombination Uncovers Canonical Features of Artificial Chimeras Generated During Deep Sequencing Library Preparation

G3 (Bethesda). 2018 Mar 28;8(4):1129-1138. doi: 10.1534/g3.117.300468.

Abstract

Chimeric reads can be generated by in vitro recombination during the preparation of high-throughput sequencing libraries. Our attempt to detect biological recombination between the genomes of dengue virus (DENV; +ssRNA genome) and its mosquito host using the Illumina Nextera sequencing library preparation kit revealed that most, if not all, detected host-virus chimeras were artificial. Indeed, these chimeras were not more frequent than with control RNA from another species (a pillbug), which was never in contact with DENV RNA prior to the library preparation. The proportion of chimera types merely reflected those of the three species among sequencing reads. Chimeras were frequently characterized by the presence of 1-20 bp microhomology between recombining fragments. Within-species chimeras mostly involved fragments in opposite orientations and located less than 100 bp from each other in the parental genome. We found similar features in published datasets using two other viruses: Ebola virus (EBOV; -ssRNA genome) and a herpesvirus (dsDNA genome), both produced with the Illumina Nextera protocol. These canonical features suggest that artificial chimeras are generated by intra-molecular template switching of the DNA polymerase during the PCR step of the Nextera protocol. Finally, a published Illumina dataset using the Flock House virus (FHV; +ssRNA genome) generated with a protocol preventing artificial recombination revealed the presence of 1-10 bp microhomology motifs in FHV-FHV chimeras, but very few recombining fragments were in opposite orientations. Our analysis uncovered sequence features characterizing recombination breakpoints in short-read sequencing datasets, which can be helpful to evaluate the presence and extent of artificial recombination.

Keywords: Illumina; artificial chimeras; high-throughput sequencing; recombination; virus.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aedes
  • Animals
  • Base Sequence
  • Chimera / genetics*
  • Dengue Virus / genetics*
  • Gene Library*
  • High-Throughput Nucleotide Sequencing / methods*
  • Nodaviridae / genetics
  • Nucleotide Motifs / genetics
  • RNA / genetics
  • Recombination, Genetic*

Substances

  • RNA

Supplementary concepts

  • Flock House virus