Purpose: Previous reports using genome-wide gene expression data to classify breast tumors have typically used standard unsupervised or supervised techniques, both of which have known limitations. We hypothesized that novel clinically relevant information could be revealed in these data sets by an alternative analytic approach. Using a recently described algorithm, signature analysis (SA), we identified "modules," comprising groups of tightly coexpressed genes that are conditionally linked to particular tumors, in a series of breast tumor gene expression profiles.
Experimental design and results: The SA successfully identified multiple breast cancer modules specifically linked to distinct biological functions. We identified a novel module, TuM1, whose presence was not readily discernible by conventional clustering techniques. The TuM1 module is expressed in a subset of estrogen receptor (ER)-positive tumors and is significantly enriched with genes involved in apoptosis and cell death. Clinically, TuM1-expressing tumors are associated with low histopathologic grade, and this association is independent of the inherent ER status of a tumor. We confirmed the robustness and general applicability of TuM1 module by demonstrating its association with low tumor grade in multiple independent breast cancer data sets generated using different array technologies. In vitro, the TuM1 module is down-regulated in ER+ MCF7 cells upon treatment with tamoxifen, suggesting that TuM1 expression may be dependent on active signaling by ER. Initial data is also suggestive that TuM1 expression may be clinically associated with a patient's response to antihormonal therapy.
Conclusion: Our results suggest that modular-based approaches toward gene expression data can prove useful in identifying novel, robust, and biologically relevant signatures even from data sets that have been the subject of substantial prior analysis.