EvoTol: a protein-sequence based evolutionary intolerance framework for disease-gene prioritization

Nucleic Acids Res. 2015 Mar 11;43(5):e33. doi: 10.1093/nar/gku1322. Epub 2014 Dec 29.

Abstract

Methods to interpret personal genome sequences are increasingly required. Here, we report a novel framework (EvoTol) to identify disease-causing genes using patient sequence data from within protein coding-regions. EvoTol quantifies a gene's intolerance to mutation using evolutionary conservation of protein sequences and can incorporate tissue-specific gene expression data. We apply this framework to the analysis of whole-exome sequence data in epilepsy and congenital heart disease, and demonstrate EvoTol's ability to identify known disease-causing genes is unmatched by competing methods. Application of EvoTol to the human interactome revealed networks enriched for genes intolerant to protein sequence variation, informing novel polygenic contributions to human disease.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence / genetics
  • Computational Biology / methods*
  • Evolution, Molecular*
  • Exome / genetics
  • Genetic Predisposition to Disease / genetics*
  • Heart Defects, Congenital / genetics
  • Humans
  • Mutation
  • Phylogeny
  • Polymorphism, Single Nucleotide
  • Protein Interaction Maps / genetics
  • Proteins / classification
  • Proteins / genetics*
  • Proteins / metabolism
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods

Substances

  • Proteins