Metagene projection for cross-platform, cross-species characterization of global transcriptional states

Pablo Tamayo; Daniel Scanfeld; Benjamin L Ebert; Michael A Gillette; Charles W M Roberts; Jill P Mesirov

doi:10.1073/pnas.0701068104

Metagene projection for cross-platform, cross-species characterization of global transcriptional states

Proc Natl Acad Sci U S A. 2007 Apr 3;104(14):5959-64. doi: 10.1073/pnas.0701068104. Epub 2007 Mar 27.

Authors

Pablo Tamayo¹, Daniel Scanfeld, Benjamin L Ebert, Michael A Gillette, Charles W M Roberts, Jill P Mesirov

Affiliation

¹ Eli and Edythe L. Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02141, USA.

Abstract

The high dimensionality of global transcription profiles, the expression level of 20,000 genes in a much small number of samples, presents challenges that affect the sensitivity and general applicability of analysis results. In principle, it would be better to describe the data in terms of a small number of metagenes, positive linear combinations of genes, which could reduce noise while still capturing the invariant biological features of the data. Here, we describe how to accomplish such a reduction in dimension by a metagene projection methodology, which can greatly reduce the number of features used to characterize microarray data. We show, in applications to the analysis of leukemia and lung cancer data sets, how this approach can help assess and interpret similarities and differences between independent data sets, enable cross-platform and cross-species analysis, improve clustering and class prediction, and provide a computational means to detect and remove sample contamination.

Publication types

Comparative Study

MeSH terms

Animals
Cell Line, Tumor
Cluster Analysis
Data Interpretation, Statistical
Disease Models, Animal
Gene Expression Profiling*
Humans
Leukemia / classification
Leukemia / genetics*
Leukemia / pathology
Lung Neoplasms / classification
Lung Neoplasms / genetics*
Lung Neoplasms / pathology
Mice
Mice, Knockout
Models, Genetic*
Oligonucleotide Array Sequence Analysis
Reproducibility of Results
Sensitivity and Specificity
Species Specificity
Transcription, Genetic*

Grants and funding

T32 CA009172/CA/NCI NIH HHS/United States