Inferring Phylogenomic Relationship of Microbes Using Scalable Alignment-Free Methods

Methods Mol Biol. 2021:2242:69-76. doi: 10.1007/978-1-0716-1099-2_5.

Abstract

Inferring phylogenetic relationships among hundreds or thousands of microbial genomes is an increasingly common task. The conventional phylogenetic approach adopts multiple sequence alignment to compare gene-by-gene, concatenated multigene or whole-genome sequences, from which a phylogenetic tree would be inferred. These alignments follow the implicit assumption of full-length contiguity among homologous sequences. However, common events in microbial genome evolution (e.g., structural rearrangements and genetic recombination) violate this assumption. Moreover, aligning hundreds or thousands of sequences is computationally intensive and not scalable to the rate at which genome data are generated. Therefore, alignment-free methods present an attractive alternative strategy. Here we describe a scalable alignment-free strategy to infer phylogenetic relationships using complete genome sequences of bacteria and archaea, based on short, subsequences of length k (k-mers). We describe how this strategy can be extended to infer evolutionary relationships beyond a tree-like structure, to better capture both vertical and lateral signals of microbial evolution.

Keywords: Alignment-free; Microbial evolution; Phylogenetic network; Phylogenetic tree; Phylogenetics; Phylogenomics; k-mers.

MeSH terms

  • Archaea / classification
  • Archaea / genetics*
  • Bacteria / classification
  • Bacteria / genetics*
  • DNA, Archaeal / genetics*
  • DNA, Bacterial / genetics*
  • Databases, Genetic
  • Evolution, Molecular
  • Genome, Archaeal*
  • Genome, Bacterial*
  • Genomics*
  • Phylogeny*
  • Research Design
  • Workflow

Substances

  • DNA, Archaeal
  • DNA, Bacterial