Machine Learning methods for Quantitative Radiomic Biomarkers

Chintan Parmar; Patrick Grossmann; Johan Bussink; Philippe Lambin; Hugo J W L Aerts

doi:10.1038/srep13087

Machine Learning methods for Quantitative Radiomic Biomarkers

Sci Rep. 2015 Aug 17:5:13087. doi: 10.1038/srep13087.

Authors

Chintan Parmar^{1

2

3}, Patrick Grossmann^{1

4}, Johan Bussink⁵, Philippe Lambin², Hugo J W L Aerts^{1

6

4}

Affiliations

¹ Departments of Radiation Oncology.
² Radiation Oncology (MAASTRO), Research Institute GROW, Maastricht University, Maastricht, the Netherlands.
³ Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India.
⁴ Department of Biostatistics &Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
⁵ Department of Radiation Oncology, Radboud University Medical Center, Nijmegen, the Netherlands.
⁶ Radiology, Dana-Farber Cancer Institute, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

Abstract

Radiomics extracts and mines large number of medical imaging features quantifying tumor phenotypic characteristics. Highly accurate and reliable machine-learning approaches can drive the success of radiomic applications in clinical care. In this radiomic study, fourteen feature selection methods and twelve classification methods were examined in terms of their performance and stability for predicting overall survival. A total of 440 radiomic features were extracted from pre-treatment computed tomography (CT) images of 464 lung cancer patients. To ensure the unbiased evaluation of different machine-learning methods, publicly available implementations along with reported parameter configurations were used. Furthermore, we used two independent radiomic cohorts for training (n = 310 patients) and validation (n = 154 patients). We identified that Wilcoxon test based feature selection method WLCX (stability = 0.84 ± 0.05, AUC = 0.65 ± 0.02) and a classification method random forest RF (RSD = 3.52%, AUC = 0.66 ± 0.03) had highest prognostic performance with high stability against data perturbation. Our variability analysis indicated that the choice of classification method is the most dominant source of performance variation (34.21% of total variance). Identification of optimal machine-learning methods for radiomic applications is a crucial step towards stable and clinically relevant radiomic biomarkers, providing a non-invasive way of quantifying and monitoring tumor-phenotypic characteristics in clinical practice.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Area Under Curve
Biomarkers, Tumor / metabolism*
Carcinoma, Non-Small-Cell Lung / metabolism
Carcinoma, Non-Small-Cell Lung / mortality
Carcinoma, Non-Small-Cell Lung / pathology*
Humans
Lung Neoplasms / metabolism
Lung Neoplasms / mortality
Lung Neoplasms / pathology*
Machine Learning*
ROC Curve
Survival Analysis
Tomography, X-Ray Computed

Substances

Biomarkers, Tumor

Abstract

Publication types

MeSH terms

Substances

Grants and funding