Joint inference of discrete cell types and continuous type-specific variability in single-cell datasets with MMIDAS

Yeganeh Marghi; Rohan Gala; Fahimeh Baftizadeh; Uygar Sümbül

doi:10.1038/s43588-024-00683-8

Joint inference of discrete cell types and continuous type-specific variability in single-cell datasets with MMIDAS

Nat Comput Sci. 2024 Sep;4(9):706-722. doi: 10.1038/s43588-024-00683-8. Epub 2024 Sep 23.

Authors

Yeganeh Marghi¹, Rohan Gala², Fahimeh Baftizadeh², Uygar Sümbül^{3

4}

Affiliations

¹ Allen Institute, Seattle, WA, USA. yeganeh.marghi@alleninstitute.org.
² Allen Institute, Seattle, WA, USA.
³ Allen Institute, Seattle, WA, USA. uygars@alleninstitute.org.
⁴ Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. uygars@alleninstitute.org.

PMID: 39317764
DOI: 10.1038/s43588-024-00683-8

Abstract

Reproducible definition and identification of cell types is essential to enable investigations into their biological function and to understand their relevance in the context of development, disease and evolution. Current approaches model variability in data as continuous latent factors, followed by clustering as a separate step, or immediately apply clustering on the data. We show that such approaches can suffer from qualitative mistakes in identifying cell types robustly, particularly when the number of such cell types is in the hundreds or even thousands. Here we propose an unsupervised method, Mixture Model Inference with Discrete-coupled AutoencoderS (MMIDAS), which combines a generalized mixture model with a multi-armed deep neural network to jointly infer the discrete type and continuous type-specific variability. Using four recent datasets of brain cells spanning different technologies, species and conditions, we demonstrate that MMIDAS can identify reproducible cell types and infer cell type-dependent continuous variability in both unimodal and multimodal datasets.

MeSH terms

Algorithms
Animals
Brain / cytology
Cluster Analysis
Computational Biology / methods
Datasets as Topic
Humans
Neural Networks, Computer*
Single-Cell Analysis* / methods

Abstract

MeSH terms

Grants and funding