Cancer classification based on chromatin accessibility profiles with deep adversarial learning model

PLoS Comput Biol. 2020 Nov 9;16(11):e1008405. doi: 10.1371/journal.pcbi.1008405. eCollection 2020 Nov.

Abstract

Given the complexity and diversity of the cancer genomics profiles, it is challenging to identify distinct clusters from different cancer types. Numerous analyses have been conducted for this propose. Still, the methods they used always do not directly support the high-dimensional omics data across the whole genome (Such as ATAC-seq profiles). In this study, based on the deep adversarial learning, we present an end-to-end approach ClusterATAC to leverage high-dimensional features and explore the classification results. On the ATAC-seq dataset and RNA-seq dataset, ClusterATAC has achieved excellent performance. Since ATAC-seq data plays a crucial role in the study of the effects of non-coding regions on the molecular classification of cancers, we explore the clustering solution obtained by ClusterATAC on the pan-cancer ATAC dataset. In this solution, more than 70% of the clustering are single-tumor-type-dominant, and the vast majority of the remaining clusters are associated with similar tumor types. We explore the representative non-coding loci and their linked genes of each cluster and verify some results by the literature search. These results suggest that a large number of non-coding loci affect the development and progression of cancer through its linked genes, which can potentially advance cancer diagnosis and therapy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromatin / genetics
  • Chromatin Immunoprecipitation Sequencing / statistics & numerical data*
  • Computational Biology
  • Databases, Nucleic Acid / statistics & numerical data
  • Deep Learning*
  • Genomics / methods
  • Genomics / statistics & numerical data
  • Humans
  • Multigene Family
  • Neoplasms / classification*
  • Neoplasms / genetics*
  • Normal Distribution
  • Oncogenes
  • RNA-Seq / statistics & numerical data

Substances

  • Chromatin

Grants and funding

DL was supported by the National Major Scientific and Technological Special Project for “Significant New Drugs Development” under Grant No. 2019ZX09201004, ZW was supported by “Shuguang Program” supported by the Shanghai Education Development Foundation and Shanghai Municipal Education Commission. HY was supported by the Natural Science Foundation of China under Grant No. 61902126. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.