A Survey on Multi-View Clustering

IEEE Trans Artif Intell. 2021 Apr;2(2):146-168. doi: 10.1109/tai.2021.3065894. Epub 2021 Apr 5.

Abstract

Clustering is a machine learning paradigm of dividing sample subjects into a number of groups such that subjects in the same groups are more similar to those in other groups. With advances in information acquisition technologies, samples can frequently be viewed from different angles or in different modalities, generating multi-view data. Multi-view clustering, that clusters subjects into subgroups using multi-view data, has attracted more and more attentions. Although MVC methods have been developed rapidly, there has not been enough survey to summarize and analyze the current progress. Therefore, we propose a novel taxonomy of the MVC approaches. Similar to other machine learning methods, we categorize them into generative and discriminative classes. In discriminative class, based on the way of view integration, we split it further into five groups: Common Eigenvector Matrix, Common Coefficient Matrix, Common Indicator Matrix, Direct Combination and Combination After Projection. Furthermore, we relate MVC to other topics: multi-view representation, ensemble clustering, multi-task clustering, multi-view supervised and semi-supervised learning. Several representative real-world applications are elaborated for practitioners. Some benchmark multi-view datasets are introduced and representative MVC algorithms from each group are empirically evaluated to analyze how they perform on benchmark datasets. To promote future development of MVC approaches, we point out several open problems that may require further investigation and thorough examination.

Keywords: Multi-view learning; canonical correlation analysis; clustering; data mining; k-means; machine learning; nonnegative matrix factorization; spectral clustering; subspace clustering; survey.