Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data

Proc Natl Acad Sci U S A. 2004 Mar 2;101(9):2981-6. doi: 10.1073/pnas.0308661100. Epub 2004 Feb 18.

Abstract

The dissection of complex biological systems is a challenging task, made difficult by the size of the underlying molecular network and the heterogeneous nature of the control mechanisms involved. Novel high-throughput techniques are generating massive data sets on various aspects of such systems. Here, we perform analysis of a highly diverse collection of genomewide data sets, including gene expression, protein interactions, growth phenotype data, and transcription factor binding, to reveal the modular organization of the yeast system. By integrating experimental data of heterogeneous sources and types, we are able to perform analysis on a much broader scope than previous studies. At the core of our methodology is the ability to identify modules, namely, groups of genes with statistically significant correlated behavior across diverse data sources. Numerous biological processes are revealed through these modules, which also obey global hierarchical organization. We use the identified modules to study the yeast transcriptional network and predict the function of >800 uncharacterized genes. Our analysis framework, SAMBA (Statistical-Algorithmic Method for Bicluster Analysis), enables the processing of current and future sources of biological information and is readily extendable to experimental techniques and higher organisms.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / metabolism
  • Genome, Fungal*
  • Glucose / metabolism
  • Lipid Metabolism
  • Models, Genetic*
  • Saccharomyces cerevisiae / genetics*
  • Saccharomyces cerevisiae / metabolism
  • Software
  • Transcription, Genetic

Substances

  • Amino Acids
  • Glucose