Supervised classification of array CGH data with HMM-based feature selection

Anneleen Daemen; Olivier Gevaert; Karin Leunen; Eric Legius; Ignace Vergote; Bart De Moor

Supervised classification of array CGH data with HMM-based feature selection

Pac Symp Biocomput. 2009:468-79.

Authors

Anneleen Daemen¹, Olivier Gevaert, Karin Leunen, Eric Legius, Ignace Vergote, Bart De Moor

Affiliation

¹ Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium. anneleen.daemen@esat.kuleuven.be

PMID: 19209723

Abstract

Motivation: For different tumour types, extended knowledge about the molecular mechanisms involved in tumorigenesis is lacking. Looking for copy number variations (CNV) by Comparative Genomic Hybridization (CGH) can help however to determine key elements in this tumorigenesis. As genome-wide array CGH gives the opportunity to evaluate CNV at high resolution, this leads to huge amount of data, necessitating adequate mathematical methods to carefully select and interpret these data.

Results: Two groups of patients differing in cancer subtype were defined in two publicly available array CGH data sets as well as in our own data set on ovarian cancer. Chromosomal regions characterizing each group of patients were gathered using recurrent hidden Markov Models (HMM). The differential regions were reduced to a subset of features for classification by integrating different univariate feature selection methods. Weighted Least Squares Support Vector Machines (LS-SVM), a supervised classification method which takes unbalancedness of data sets into account, resulted in leave-one-out or 10-fold cross-validation accuracies ranging from 88 to 95.5%.

Conclusion: The combination of recurrent HMMs for the detection of copy number alterations with LS-SVM classifiers offers a novel methodological approach for classification based on copy number alterations. Additionally, this approach limits the chromosomal regions that are necessary to classify patients according to cancer subtype.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Artificial Intelligence
Biometry
Carcinoma, Non-Small-Cell Lung / classification
Carcinoma, Non-Small-Cell Lung / genetics
Carcinoma, Squamous Cell / classification
Carcinoma, Squamous Cell / genetics
Cell Line, Tumor
Comparative Genomic Hybridization / statistics & numerical data*
Databases, Nucleic Acid
Female
Gene Dosage
Genes, BRCA1
Genes, p53
Humans
Least-Squares Analysis
Lung Neoplasms / classification
Lung Neoplasms / genetics
Markov Chains
Mouth Neoplasms / classification
Mouth Neoplasms / genetics
Neoplasms / classification
Neoplasms / genetics
Ovarian Neoplasms / classification
Ovarian Neoplasms / genetics