Sparse relative risk regression models

Ernst C Wit; Luigi Augugliaro; Hassan Pazira; Javier González; Fentaw Abegaz

doi:10.1093/biostatistics/kxy060

Sparse relative risk regression models

Biostatistics. 2020 Apr 1;21(2):e131-e147. doi: 10.1093/biostatistics/kxy060.

Authors

Ernst C Wit¹, Luigi Augugliaro², Hassan Pazira³, Javier González⁴, Fentaw Abegaz^{3

5}

Affiliations

¹ Institute of Computational Science, USI, Via Buffi 13, Lugano, Switzerland.
² Department of Economics, Business and Statistics, University of Palermo, Building 13, Viale delle Scienze, Palermo, Italy.
³ Bernoulli Institute, University of Groningen, Nijenborg 9, AG Groningen, The Netherlands.
⁴ Amazon Research Cambridge, Poseidon House, Castle Park, Cambridge, UK.
⁵ Department of Pediatrics and Systems Biology Centre for Energy Metabolism and Ageing, University of Groningen, University Medical Center Groningen, AD Groningen, The Netherlands.

Abstract

Clinical studies where patients are routinely screened for many genomic features are becoming more routine. In principle, this holds the promise of being able to find genomic signatures for a particular disease. In particular, cancer survival is thought to be closely linked to the genomic constitution of the tumor. Discovering such signatures will be useful in the diagnosis of the patient, may be used for treatment decisions and, perhaps, even the development of new treatments. However, genomic data are typically noisy and high-dimensional, not rarely outstripping the number of patients included in the study. Regularized survival models have been proposed to deal with such scenarios. These methods typically induce sparsity by means of a coincidental match of the geometry of the convex likelihood and a (near) non-convex regularizer. The disadvantages of such methods are that they are typically non-invariant to scale changes of the covariates, they struggle with highly correlated covariates, and they have a practical problem of determining the amount of regularization. In this article, we propose an extension of the differential geometric least angle regression method for sparse inference in relative risk regression models. A software implementation of our method is available on github (https://github.com/LuigiAugugliaro/dgcox).

Keywords: Gene expression data; High-dimensional data; Relative risk regression models; Sparsity; Survival analysis; dgLARS.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Biostatistics / methods*
Computer Simulation
Humans
Models, Statistical*
Neoplasms / genetics
Neoplasms / mortality
Regression Analysis
Risk Assessment / methods*
Survival Analysis*