Prioritizing and characterizing functionally relevant genes across human tissues

PLoS Comput Biol. 2021 Jul 16;17(7):e1009194. doi: 10.1371/journal.pcbi.1009194. eCollection 2021 Jul.

Abstract

Knowledge of genes that are critical to a tissue's function remains difficult to ascertain and presents a major bottleneck toward a mechanistic understanding of genotype-phenotype links. Here, we present the first machine learning model-FUGUE-combining transcriptional and network features, to predict tissue-relevant genes across 30 human tissues. FUGUE achieves an average cross-validation auROC of 0.86 and auPRC of 0.50 (expected 0.09). In independent datasets, FUGUE accurately distinguishes tissue or cell type-specific genes, significantly outperforming the conventional metric based on tissue-specific expression alone. Comparison of tissue-relevant transcription factors across tissue recapitulate their developmental relationships. Interestingly, the tissue-relevant genes cluster on the genome within topologically associated domains and furthermore, are highly enriched for differentially expressed genes in the corresponding cancer type. We provide the prioritized gene lists in 30 human tissues and an open-source software to prioritize genes in a novel context given multi-sample transcriptomic data.

Publication types

  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Female
  • Gene Expression Regulation, Developmental
  • Gene Regulatory Networks
  • Genetic Association Studies*
  • Genome, Human
  • Genome-Wide Association Study / statistics & numerical data
  • Humans
  • Machine Learning*
  • Male
  • Models, Genetic*
  • Multigene Family
  • Neoplasms / genetics
  • Protein Interaction Maps / genetics
  • Software
  • Tissue Distribution
  • Transcription Factors / genetics
  • Transcription Factors / metabolism
  • Transcriptome

Substances

  • Transcription Factors

Grants and funding

G.S is supported by the University of Maryland. S.S. is supported in part by KVPY fellowship awarded by Department of Science and Technology (DST), Government of India. A.S and S.H are supported by the Intramural Research Program of the National Cancer Institute, Center for Cancer Research, NIH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.