Motivation: The analysis of protein-protein interactions allows for detailed exploration of the cellular machinery. The biochemical purification of protein complexes followed by identification of components by mass spectrometry is currently the method, which delivers the most reliable information--albeit that the data sets are still difficult to interpret. Consolidating individual experiments into protein complexes, especially for high-throughput screens, is complicated by many contaminants, the occurrence of proteins in otherwise dissimilar purifications due to functional re-use and technical limitations in the detection. A non-redundant collection of protein complexes from experimental data would be useful for biological interpretation, but manual assembly is tedious and often inconsistent.
Results: Here, we introduce a measure to define similarity within collections of purifications and generate a set of minimally redundant, comprehensive complexes using unsupervised clustering.
Availability: Programs and results are freely available from http://www.bork.embl-heidelberg.de/Docu/purclust/