A Nonparametric Bayesian Model for Nested Clustering

Methods Mol Biol. 2016:1362:129-41. doi: 10.1007/978-1-4939-3106-4_8.

Abstract

We propose a nonparametric Bayesian model for clustering where clusters of experimental units are determined by a shared pattern of clustering another set of experimental units. The proposed model is motivated by the analysis of protein activation data, where we cluster proteins such that all proteins in one cluster give rise to the same clustering of patients. That is, we define clusters of proteins by the way that patients group with respect to the corresponding protein activations. This is in contrast to (almost) all currently available models that use shared parameters in the sampling model to define clusters. This includes in particular model based clustering, Dirichlet process mixtures, product partition models, and more. We show results for two typical biostatistical inference problems that give rise to clustering.

Keywords: Dirichlet process; Protein expression; Pólya urn; Random partitions; Reverse phase protein array.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem*
  • Cluster Analysis*
  • Proteomics / methods*