Sparse relative risk regression models

Biostatistics. 2020 Apr 1;21(2):e131-e147. doi: 10.1093/biostatistics/kxy060.

Abstract

Clinical studies where patients are routinely screened for many genomic features are becoming more routine. In principle, this holds the promise of being able to find genomic signatures for a particular disease. In particular, cancer survival is thought to be closely linked to the genomic constitution of the tumor. Discovering such signatures will be useful in the diagnosis of the patient, may be used for treatment decisions and, perhaps, even the development of new treatments. However, genomic data are typically noisy and high-dimensional, not rarely outstripping the number of patients included in the study. Regularized survival models have been proposed to deal with such scenarios. These methods typically induce sparsity by means of a coincidental match of the geometry of the convex likelihood and a (near) non-convex regularizer. The disadvantages of such methods are that they are typically non-invariant to scale changes of the covariates, they struggle with highly correlated covariates, and they have a practical problem of determining the amount of regularization. In this article, we propose an extension of the differential geometric least angle regression method for sparse inference in relative risk regression models. A software implementation of our method is available on github (https://github.com/LuigiAugugliaro/dgcox).

Keywords: Gene expression data; High-dimensional data; Relative risk regression models; Sparsity; Survival analysis; dgLARS.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Biostatistics / methods*
  • Computer Simulation
  • Humans
  • Models, Statistical*
  • Neoplasms / genetics
  • Neoplasms / mortality
  • Regression Analysis
  • Risk Assessment / methods*
  • Survival Analysis*