Inference of combinatorial Boolean rules of synergistic gene sets from cancer microarray datasets

Inho Park; Kwang H Lee; Doheon Lee

doi:10.1093/bioinformatics/btq207

Inference of combinatorial Boolean rules of synergistic gene sets from cancer microarray datasets

Bioinformatics. 2010 Jun 15;26(12):1506-12. doi: 10.1093/bioinformatics/btq207. Epub 2010 Apr 21.

Authors

Inho Park¹, Kwang H Lee, Doheon Lee

Affiliation

¹ Department of Bio and Brain Engineering, KAIST, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea.

PMID: 20410052
DOI: 10.1093/bioinformatics/btq207

Abstract

Motivation: Gene set analysis has become an important tool for the functional interpretation of high-throughput gene expression datasets. Moreover, pattern analyses based on inferred gene set activities of individual samples have shown the ability to identify more robust disease signatures than individual gene-based pattern analyses. Although a number of approaches have been proposed for gene set-based pattern analysis, the combinatorial influence of deregulated gene sets on disease phenotype classification has not been studied sufficiently.

Results: We propose a new approach for inferring combinatorial Boolean rules of gene sets for a better understanding of cancer transcriptome and cancer classification. To reduce the search space of the possible Boolean rules, we identify small groups of gene sets that synergistically contribute to the classification of samples into their corresponding phenotypic groups (such as normal and cancer). We then measure the significance of the candidate Boolean rules derived from each group of gene sets; the level of significance is based on the class entropy of the samples selected in accordance with the rules. By applying the present approach to publicly available prostate cancer datasets, we identified 72 significant Boolean rules. Finally, we discuss several identified Boolean rules, such as the rule of glutathione metabolism (down) and prostaglandin synthesis regulation (down), which are consistent with known prostate cancer biology.

Availability: Scripts written in Python and R are available at http://biosoft.kaist.ac.kr/~ihpark/. The refined gene sets and the full list of the identified Boolean rules are provided in the Supplementary Material.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Gene Expression Regulation, Neoplastic
Gene Regulatory Networks
Genes, Neoplasm*
Humans
Male
Neoplasms / genetics*
Oligonucleotide Array Sequence Analysis / methods*
Prostatic Neoplasms / genetics