Feature Importance in Nonlinear Embeddings (FINE): Applications in Digital Pathology

IEEE Trans Med Imaging. 2016 Jan;35(1):76-88. doi: 10.1109/TMI.2015.2456188. Epub 2015 Jul 14.

Abstract

Quantitative histomorphometry (QH) refers to the process of computationally modeling disease appearance on digital pathology images by extracting hundreds of image features and using them to predict disease presence or outcome. Since constructing a robust and interpretable classifier is challenging in a high dimensional feature space, dimensionality reduction (DR) is often implemented prior to classifier construction. However, when DR is performed it can be challenging to quantify the contribution of each of the original features to the final classification result. We have previously presented a method for scoring features based on their importance for classification on an embedding derived via principal components analysis (PCA). However, nonlinear DR involves the eigen-decomposition of a kernel matrix rather than the data itself, compounding the issue of classifier interpretability. In this paper we present feature importance in nonlinear embeddings (FINE), an extension of our PCA-based feature scoring method to kernel PCA (KPCA), as well as several NLDR algorithms that can be cast as variants of KPCA. FINE is applied to four digital pathology datasets to identify key QH features for predicting the risk of breast and prostate cancer recurrence. Measures of nuclear and glandular architecture and clusteredness were found to play an important role in predicting the likelihood of recurrence of both breast and prostate cancers. Compared to the t-test, Fisher score, and Gini index, FINE was able to identify a stable set of features that provide good classification accuracy on four publicly available datasets from the NIPS 2003 Feature Selection Challenge.

MeSH terms

  • Algorithms
  • Biopsy
  • Breast Neoplasms / pathology
  • Diagnostic Imaging / methods*
  • Female
  • Histocytochemistry
  • Humans
  • Image Interpretation, Computer-Assisted / methods*
  • Male
  • Nonlinear Dynamics*
  • Pathology
  • Prostatic Neoplasms / pathology
  • Recurrence