Joint inference of discrete cell types and continuous type-specific variability in single-cell datasets with MMIDAS

Nat Comput Sci. 2024 Sep;4(9):706-722. doi: 10.1038/s43588-024-00683-8. Epub 2024 Sep 23.

Abstract

Reproducible definition and identification of cell types is essential to enable investigations into their biological function and to understand their relevance in the context of development, disease and evolution. Current approaches model variability in data as continuous latent factors, followed by clustering as a separate step, or immediately apply clustering on the data. We show that such approaches can suffer from qualitative mistakes in identifying cell types robustly, particularly when the number of such cell types is in the hundreds or even thousands. Here we propose an unsupervised method, Mixture Model Inference with Discrete-coupled AutoencoderS (MMIDAS), which combines a generalized mixture model with a multi-armed deep neural network to jointly infer the discrete type and continuous type-specific variability. Using four recent datasets of brain cells spanning different technologies, species and conditions, we demonstrate that MMIDAS can identify reproducible cell types and infer cell type-dependent continuous variability in both unimodal and multimodal datasets.

MeSH terms

  • Algorithms
  • Animals
  • Brain / cytology
  • Cluster Analysis
  • Computational Biology / methods
  • Datasets as Topic
  • Humans
  • Neural Networks, Computer*
  • Single-Cell Analysis* / methods