In microarray-based case-control studies of a disease, people often attempt to identify a few diagnostic or prognostic markers amongst the most significant differentially expressed (DE) genes. However, the reproducibility of DE genes identified in different studies for a disease is typically very low. To tackle the problem, we could evaluate the reproducibility of DE genes across studies and define robust markers for disease diagnosis using disease-associated protein-protein interaction (PPI) subnetwork. Using datasets for four cancer types, we found that the most significant DE genes in cancer exhibit consistent up- or down-regulation in different datasets. For each cancer type, the 5 (or 10) most significant DE genes separately extracted from different datasets tend to be significantly coexpressed and closely connected in the PPI subnetwork, thereby indicating that they are highly reproducible at the PPI level. Consequently, we were able to build robust subnetwork-based classifiers for cancer diagnosis.
Keywords: Cancer; DE; Diagnosis; FDR; Gene expression profiling; PO; POD; PON; PPI; Protein interaction networks; RFE; Reproducibility of biomarkers; SAM; SVM; differentially expressed; false discovery rate; percentage of overlap; percentage of overlap in the PPI network; percentage of overlapping deregulations; protein–protein interaction; recursive feature elimination; significance analysis of microarray; support vector machine.
Copyright © 2013 Elsevier B.V. All rights reserved.