MIMIC: an optimization method to identify cell type-specific marker panel for cell sorting

Brief Bioinform. 2021 Nov 5;22(6):bbab235. doi: 10.1093/bib/bbab235.

Abstract

Multi-omics data allow us to select a small set of informative markers for the discrimination of specific cell types and study of cellular heterogeneity. However, it is often challenging to choose an optimal marker panel from the high-dimensional molecular profiles for a large amount of cell types. Here, we propose a method called Mixed Integer programming Model to Identify Cell type-specific marker panel (MIMIC). MIMIC maintains the hierarchical topology among different cell types and simultaneously maximizes the specificity of a fixed number of selected markers. MIMIC was benchmarked on the mouse ENCODE RNA-seq dataset, with 29 diverse tissues, for 43 surface markers (SMs) and 1345 transcription factors (TFs). MIMIC could select biologically meaningful markers and is robust for different accuracy criteria. It shows advantages over the standard single gene-based approaches and widely used dimensional reduction methods, such as multidimensional scaling and t-SNE, both in accuracy and in biological interpretation. Furthermore, the combination of SMs and TFs achieves better specificity than SMs or TFs alone. Applying MIMIC to a large collection of 641 RNA-seq samples covering 231 cell types identifies a panel of TFs and SMs that reveal the modularity of cell type association networks. Finally, the scalability of MIMIC is demonstrated by selecting enhancer markers from mouse ENCODE data. MIMIC is freely available at https://github.com/MengZou1/MIMIC.

Keywords: TFs; cell type-specific marker; dimension reduction; hierarchical topology; surface markers.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Biomarkers*
  • Computational Biology* / methods
  • Databases, Genetic
  • Flow Cytometry / methods*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation
  • Humans
  • Organ Specificity* / genetics
  • Reproducibility of Results
  • Software*

Substances

  • Biomarkers