Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data

Genes (Basel). 2023 Sep 14;14(9):1802. doi: 10.3390/genes14091802.

Abstract

Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods.

Keywords: cancer detection; feature selection; gene analysis; gene data; machine learning; voting classifier.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Benchmarking
  • Cluster Analysis
  • Gene Expression Profiling
  • Neoplasms* / diagnosis
  • Neoplasms* / genetics
  • Transcriptome* / genetics

Grants and funding

This research was supported by the Deanship of Scientific Research Large Groups at King Khalid University, Kingdom of Saudi Arabia (RGP.2/219/43).