Abstract
Cancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or "mutational signatures". Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates a user-specified background signature, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using a variety of standard metrics. We then apply SparseSignatures to whole genome sequences of pancreatic and breast tumors, discovering well-differentiated signatures that are linked to known mutagenic mechanisms and are strongly associated with patient clinical features.
Publication types
-
Research Support, N.I.H., Extramural
-
Research Support, Non-U.S. Gov't
MeSH terms
-
Algorithms
-
Biomarkers, Tumor / genetics
-
Breast Neoplasms / classification
-
Breast Neoplasms / genetics
-
Computational Biology
-
Computer Simulation
-
DNA Mutational Analysis / statistics & numerical data*
-
Databases, Genetic / statistics & numerical data
-
Female
-
Genes, BRCA1
-
Genes, BRCA2
-
Genome, Human
-
Humans
-
Neoplasms / genetics*
-
Pancreatic Neoplasms / classification
-
Pancreatic Neoplasms / genetics
-
Point Mutation*
-
Software
Grants and funding
This work was supported by an R01 grant to A.S. (NIH/NCI) and gift funding from the BRCA Foundation. A.L. was supported by a Young Investigator Award from the BRCA Foundation. D.R. was partially supported by a Bicocca 2020 Starting Grant and by a Premio Giovani Talenti dell'Università degli Studi di Milano-Bicocca. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.