PINCAGE: probabilistic integration of cancer genomics data for perturbed gene identification and sample classification

Bioinformatics. 2016 May 1;32(9):1353-65. doi: 10.1093/bioinformatics/btv758. Epub 2016 Jan 6.

Abstract

Motivation: Cancer development and progression is driven by a complex pattern of genomic and epigenomic perturbations. Both types of perturbations can affect gene expression levels and disease outcome. Integrative analysis of cancer genomics data may therefore improve detection of perturbed genes and prediction of disease state. As different data types are usually dependent, analysis based on independence assumptions will make inefficient use of the data and potentially lead to false conclusions.

Model: Here, we present PINCAGE (Probabilistic INtegration of CAncer GEnomics data), a method that uses probabilistic integration of cancer genomics data for combined evaluation of RNA-seq gene expression and 450k array DNA methylation measurements of promoters as well as gene bodies. It models the dependence between expression and methylation using modular graphical models, which also allows future inclusion of additional data types.

Results: We apply our approach to a Breast Invasive Carcinoma dataset from The Cancer Genome Atlas consortium, which includes 82 adjacent normal and 730 cancer samples. We identify new biomarker candidates of breast cancer development (PTF1A, RABIF, RAG1AP1, TIMM17A, LOC148145) and progression (SERPINE3, ZNF706). PINCAGE discriminates better between normal and tumour tissue and between progressing and non-progressing tumours in comparison with established methods that assume independence between tested data types, especially when using evidence from multiple genes. Our method can be applied to any type of cancer or, more generally, to any genomic disease for which sufficient amount of molecular data is available.

Availability and implementation: R scripts available at http://moma.ki.au.dk/prj/pincage/

Contact: : michal.switnicki@clin.au.dk or jakob.skou@clin.au.dk

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Breast Neoplasms*
  • DNA Methylation
  • Epigenomics
  • Gene Expression Regulation, Neoplastic*
  • Genomics* / methods
  • Humans