Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification

PLoS One. 2015 Mar 30;10(3):e0120364. doi: 10.1371/journal.pone.0120364. eCollection 2015.

Abstract

This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • DNA, Neoplasm / genetics
  • Data Interpretation, Statistical
  • Databases, Genetic / statistics & numerical data
  • Decision Support Systems, Clinical
  • Fuzzy Logic
  • Gene Expression Profiling / methods*
  • Gene Expression Profiling / statistics & numerical data
  • Humans
  • Leukemia / genetics
  • Lymphoma, Large B-Cell, Diffuse / genetics
  • Models, Genetic
  • Models, Statistical
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • ROC Curve
  • Supervised Machine Learning
  • Unsupervised Machine Learning

Substances

  • DNA, Neoplasm

Grants and funding

This research is supported by the Australian Research Council (Discovery Grant DP120102112) and the Centre for Intelligent Systems Research (CISR) at Deakin University.