Bayesian analysis of genetic association across tree-structured routine healthcare data in the UK Biobank

Nat Genet. 2017 Sep;49(9):1311-1318. doi: 10.1038/ng.3926. Epub 2017 Jul 31.

Abstract

Genetic discovery from the multitude of phenotypes extractable from routine healthcare data can transform understanding of the human phenome and accelerate progress toward precision medicine. However, a critical question when analyzing high-dimensional and heterogeneous data is how best to interrogate increasingly specific subphenotypes while retaining statistical power to detect genetic associations. Here we develop and employ a new Bayesian analysis framework that exploits the hierarchical structure of diagnosis classifications to analyze genetic variants against UK Biobank disease phenotypes derived from self-reporting and hospital episode statistics. Our method displays a more than 20% increase in power to detect genetic effects over other approaches and identifies new associations between classical human leukocyte antigen (HLA) alleles and common immune-mediated diseases (IMDs). By applying the approach to genetic risk scores (GRSs), we show the extent of genetic sharing among IMDs and expose differences in disease perception or diagnosis with potential clinical implications.

MeSH terms

  • Adult
  • Aged
  • Alleles
  • Bayes Theorem*
  • Cluster Analysis
  • Delivery of Health Care / classification
  • Delivery of Health Care / statistics & numerical data*
  • Female
  • Genetic Association Studies / statistics & numerical data*
  • Genetic Predisposition to Disease / genetics
  • Genome-Wide Association Study / statistics & numerical data
  • HLA Antigens / genetics
  • Health Information Systems / statistics & numerical data*
  • Humans
  • International Classification of Diseases / classification
  • International Classification of Diseases / statistics & numerical data
  • Logistic Models
  • Male
  • Middle Aged
  • Polymorphism, Single Nucleotide
  • United Kingdom

Substances

  • HLA Antigens