Objectives: Genome-wide association studies (GWASs) have revealed many candidate SNPs, but the mechanisms by which these SNPs influence diseases are largely unknown. In order to decipher the underlying mechanisms, several methods have been developed to predict disease-associated genes based on the integration of GWAS and eQTL data (e.g., Sherlock and COLOC). A number of studies have also incorporated information from gene networks into GWAS analysis to reprioritize candidate genes.
Methods: Motivated by these two different approaches, we have developed a statistical framework to integrate information from GWAS, eQTL, and protein-protein interaction (PPI) data to predict disease-associated genes. Our approach is based on a hidden Markov random field (HMRF) model, and we called the resulting computational algorithm GeP-HMRF (a GWAS-eQTL-PPI-based HMRF).
Results: We compared the performance of GeP-HMRF with Sherlock, COLOC, and NetWAS methods on 9 GWAS datasets, using the disease-related genes in the MalaCards database as the standard, and found that GeP-HMRF significantly improves the prediction accuracy. We also applied GeP-HMRF to an age-related macular degeneration disease (AMD) dataset. Among the top 50 genes predicted by GeP-HMRF, 7 are reported by the MalaCards database to be AMD-related with an enrichment p value of 3.61 × 10-119. Among the top 20 genes predicted by GeP-HMRF, CFHR1, CGHR3, HTRA1, and CFH are AMD-related in the MalaCards database, and another 9 genes are supported by the literature.
Conclusions: We built a unified statistical model to predict disease-related genes by integrating GWAS, eQTL, and PPI data. Our approach outperforms Sherlock, COLOC, and NetWAS in simulation studies and 9 GWAS datasets. Our approach can be generalized to incorporate other molecular trait data beyond eQTL and other interaction data beyond PPI.
Keywords: Data integration; Disease-associated gene; Hidden Markov random field.
© 2019 S. Karger AG, Basel.