Supervised classification of array CGH data with HMM-based feature selection

Pac Symp Biocomput. 2009:468-79.

Abstract

Motivation: For different tumour types, extended knowledge about the molecular mechanisms involved in tumorigenesis is lacking. Looking for copy number variations (CNV) by Comparative Genomic Hybridization (CGH) can help however to determine key elements in this tumorigenesis. As genome-wide array CGH gives the opportunity to evaluate CNV at high resolution, this leads to huge amount of data, necessitating adequate mathematical methods to carefully select and interpret these data.

Results: Two groups of patients differing in cancer subtype were defined in two publicly available array CGH data sets as well as in our own data set on ovarian cancer. Chromosomal regions characterizing each group of patients were gathered using recurrent hidden Markov Models (HMM). The differential regions were reduced to a subset of features for classification by integrating different univariate feature selection methods. Weighted Least Squares Support Vector Machines (LS-SVM), a supervised classification method which takes unbalancedness of data sets into account, resulted in leave-one-out or 10-fold cross-validation accuracies ranging from 88 to 95.5%.

Conclusion: The combination of recurrent HMMs for the detection of copy number alterations with LS-SVM classifiers offers a novel methodological approach for classification based on copy number alterations. Additionally, this approach limits the chromosomal regions that are necessary to classify patients according to cancer subtype.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Biometry
  • Carcinoma, Non-Small-Cell Lung / classification
  • Carcinoma, Non-Small-Cell Lung / genetics
  • Carcinoma, Squamous Cell / classification
  • Carcinoma, Squamous Cell / genetics
  • Cell Line, Tumor
  • Comparative Genomic Hybridization / statistics & numerical data*
  • Databases, Nucleic Acid
  • Female
  • Gene Dosage
  • Genes, BRCA1
  • Genes, p53
  • Humans
  • Least-Squares Analysis
  • Lung Neoplasms / classification
  • Lung Neoplasms / genetics
  • Markov Chains
  • Mouth Neoplasms / classification
  • Mouth Neoplasms / genetics
  • Neoplasms / classification
  • Neoplasms / genetics
  • Ovarian Neoplasms / classification
  • Ovarian Neoplasms / genetics