Comparative genomic signature representations of the emerging COVID-19 coronavirus and other coronaviruses: High identity and possible recombination between Bat and Pangolin coronaviruses

Genomics. 2020 Nov;112(6):4189-4202. doi: 10.1016/j.ygeno.2020.07.003. Epub 2020 Jul 6.

Abstract

Coronaviruses are responsible on respiratory diseases in animal and human. The combination of numerical encoding techniques and digital signal processing methods are becoming increasingly important in handling large genomic data. In this paper, we propose to analyze the SARS-CoV-2 genomic signature using the combination of different nucleotide representations and signal processing tools in the aim to identify its genetic origin. The sequence of SARS-CoV-2 was compared with 21 relevant sequences including Bat, Yak and Pangolin coronavirus sequences. In addition, we developed a new algorithm to locate the nucleotide modifications. The results show that the Bat and Pangolin coronaviruses were the most related to SARS-CoV-2 with 96% and 86% of identity all along the genome. Within the S gene sequence, the Pangolin sequence presents local highest nucleotide identity. Those findings suggest genesis of SARS-Cov-2 through evolution from Bat and Pangolin strains. This study offers new ways to automatically characterize viruses.

Keywords: Bat; COVID19; Genome signature; Pangolin; SARS-CoV-2; Yak.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Chiroptera / virology*
  • Coronavirus / genetics*
  • Genome, Viral / genetics*
  • Genomics / methods
  • Humans
  • Pangolins / virology*
  • Recombination, Genetic*
  • SARS-CoV-2 / genetics*