Gene expression profiling offers a great opportunity for studying multi-factor diseases and for understanding the key role of genes in mechanisms which drive a normal cell to a cancer state. Single gene analysis is insufficient to describe the complex perturbations responsible for cancer onset, progression and invasion. A deeper understanding of the mechanisms of tumorigenesis can be reached focusing on deregulation of gene sets or pathways rather than on individual genes. We apply two known and statistically well founded methods for finding pathways and biological processes deregulated in pathological conditions by analyzing gene expression profiles. In particular, we measure the amount of deregulation and assess the statistical significance of predefined pathways belonging to a curated collection (Molecular Signature Database) in a colon cancer data set. We find that pathways strongly involved in different tumors are strictly connected with colon cancer. Moreover, our experimental results show that the study of complex diseases through pathway analysis is able to highlight genes weakly connected to the phenotype which may be difficult to detect by using classical univariate statistics. Our study shows the importance of using gene sets rather than single genes for understanding the main biological processes and pathways involved in colorectal cancer. Our analysis evidences that many of the genes involved in these pathways are strongly associated to colorectal tumorigenesis. In this new perspective, the focus shifts from finding differentially expressed genes to identifying biological processes, cellular functions and pathways perturbed in the phenotypic conditions by analyzing genes co-expressed in a given pathway as a whole, taking into account the possible interactions among them and, more importantly, the correlation of their expression with the phenotypical conditions.
Keywords: Microarray; colon cancer.; gene expression; machine learning; pathway analysis; prediction accuracy.