Comparing Phylogenetic Approaches to Reconstructing Cell Lineage From Microsatellites With Missing Data

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2291-2301. doi: 10.1109/TCBB.2020.2992813. Epub 2021 Dec 8.

Abstract

Due to the imperfect fidelity of DNA replication, somatic cells acquire DNA mutations at each division which record their lineage history. Microsatellites, tandem repeats of DNA nucleotide motifs, mutate more frequently than other genomic regions and by observing microsatellite lengths in single cells and implementing suitable inference procedures, the cell lineage tree of an organism can be reconstructed. Due to recent advances in single cell Next Generation Sequencing (NGS) and the phylogenetic methods used to infer lineage trees, this work investigates which computational approaches best exploit the lineage information found in single cell NGS data. We simulated trees representing cell division with mutating microsatellites, and tested a range of available phylogenetic algorithms to reconstruct cell lineage. We found that distance-based approaches are fast and accurate with fully observed data. However, Maximum Parsimony and the computationally intensive probabilistic methods are more robust to missing data and therefore better suited to reconstructing cell lineage from NGS datasets. We also investigated how robust reconstruction algorithms are to different tree topologies and mutation generation models. Our results show that the flexibility of Maximum Parsimony and the probabilistic approaches mean they can be adapted to allow good reconstruction across a range of biologically relevant scenarios.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Cell Lineage / genetics*
  • Computational Biology / methods*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Mice
  • Microsatellite Repeats / genetics*
  • Mutation
  • Phylogeny*
  • Sequence Analysis, DNA
  • Single-Cell Analysis