Identifying candidate drivers of drug response in heterogeneous cancer by mining high throughput genomics data

BMC Genomics. 2016 Aug 15;17(1):638. doi: 10.1186/s12864-016-2942-5.

Abstract

Background: With advances in technologies, huge amounts of multiple types of high-throughput genomics data are available. These data have tremendous potential to identify new and clinically valuable biomarkers to guide the diagnosis, assessment of prognosis, and treatment of complex diseases, such as cancer. Integrating, analyzing, and interpreting big and noisy genomics data to obtain biologically meaningful results, however, remains highly challenging. Mining genomics datasets by utilizing advanced computational methods can help to address these issues.

Results: To facilitate the identification of a short list of biologically meaningful genes as candidate drivers of anti-cancer drug resistance from an enormous amount of heterogeneous data, we employed statistical machine-learning techniques and integrated genomics datasets. We developed a computational method that integrates gene expression, somatic mutation, and copy number aberration data of sensitive and resistant tumors. In this method, an integrative method based on module network analysis is applied to identify potential driver genes. This is followed by cross-validation and a comparison of the results of sensitive and resistance groups to obtain the final list of candidate biomarkers. We applied this method to the ovarian cancer data from the cancer genome atlas. The final result contains biologically relevant genes, such as COL11A1, which has been reported as a cis-platinum resistant biomarker for epithelial ovarian carcinoma in several recent studies.

Conclusions: The described method yields a short list of aberrant genes that also control the expression of their co-regulated genes. The results suggest that the unbiased data driven computational method can identify biologically relevant candidate biomarkers. It can be utilized in a wide range of applications that compare two conditions with highly heterogeneous datasets.

Keywords: Copy number aberration; Drug resistant; Gene expression; Gene module; Integrative analysis; Module network analysis; Serous ovarian carcinoma; Somatic mutation.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Antineoplastic Agents / therapeutic use*
  • Biomarkers, Tumor / genetics
  • Biomarkers, Tumor / metabolism
  • Cisplatin / therapeutic use
  • Cluster Analysis
  • Collagen Type XI / genetics
  • Collagen Type XI / metabolism
  • DNA Copy Number Variations
  • Data Mining*
  • Databases, Genetic
  • Drug Resistance, Neoplasm
  • Female
  • Gene Expression Regulation, Neoplastic
  • Genomics
  • Humans
  • Ovarian Neoplasms / drug therapy*
  • Ovarian Neoplasms / genetics
  • Ovarian Neoplasms / pathology

Substances

  • Antineoplastic Agents
  • Biomarkers, Tumor
  • COL11A1 protein, human
  • Collagen Type XI
  • Cisplatin