Two-Tier Mapper, an unbiased topology-based clustering method for enhanced global gene expression analysis

Rachel Jeitziner; Mathieu Carrière; Jacques Rougemont; Steve Oudot; Kathryn Hess; Cathrin Brisken

doi:10.1093/bioinformatics/btz052

Two-Tier Mapper, an unbiased topology-based clustering method for enhanced global gene expression analysis

Bioinformatics. 2019 Sep 15;35(18):3339-3347. doi: 10.1093/bioinformatics/btz052.

Authors

Rachel Jeitziner¹, Mathieu Carrière², Jacques Rougemont³, Steve Oudot², Kathryn Hess⁴, Cathrin Brisken¹

Affiliations

¹ School of Life Sciences, Swiss Institute for Experimental Cancer Research, Ecole Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland.
² INRIA Saclay, Palaiseau FR-91120, France.
³ DP Physique théorique, Université de Genève, Genève CH-1205, Switzerland.
⁴ Brain and Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland.

PMID: 30753284
DOI: 10.1093/bioinformatics/btz052

Abstract

Motivation: Unbiased clustering methods are needed to analyze growing numbers of complex datasets. Currently available clustering methods often depend on parameters that are set by the user, they lack stability, and are not applicable to small datasets. To overcome these shortcomings we used topological data analysis, an emerging field of mathematics that discerns additional feature and discovers hidden insights on datasets and has a wide application range.

Results: We have developed a topology-based clustering method called Two-Tier Mapper (TTMap) for enhanced analysis of global gene expression datasets. First, TTMap discerns divergent features in the control group, adjusts for them, and identifies outliers. Second, the deviation of each test sample from the control group in a high-dimensional space is computed, and the test samples are clustered using a new Mapper-based topological algorithm at two levels: a global tier and local tiers. All parameters are either carefully chosen or data-driven, avoiding any user-induced bias. The method is stable, different datasets can be combined for analysis, and significant subgroups can be identified. It outperforms current clustering methods in sensitivity and stability on synthetic and biological datasets, in particular when sample sizes are small; outcome is not affected by removal of control samples, by choice of normalization, or by subselection of data. TTMap is readily applicable to complex, highly variable biological samples and holds promise for personalized medicine.

Availability and implementation: TTMap is supplied as an R package in Bioconductor.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Cluster Analysis
Gene Expression
Gene Expression Profiling*
Sample Size
Software*