A Bayesian hierarchical model for identifying significant polygenic effects while controlling for confounding and repeated measures

Stat Appl Genet Mol Biol. 2017 Nov 27;16(5-6):407-419. doi: 10.1515/sagmb-2017-0044.

Abstract

Genomic studies of plants often seek to identify genetic factors associated with desirable traits. The process of evaluating genetic markers one by one (i.e. a marginal analysis) may not identify important polygenic and environmental effects. Further, confounding due to growing conditions/factors and genetic similarities among plant varieties may influence conclusions. When developing new plant varieties to optimize yield or thrive in future adverse conditions (e.g. flood, drought), scientists seek a complete understanding of how the factors influence desirable traits. Motivated by a study design that measures rice yield across different seasons, fields, and plant varieties in Indonesia, we develop a regression method that identifies significant genomic factors, while simultaneously controlling for field factors and genetic similarities in the plant varieties. Our approach develops a Bayesian maximum a posteriori probability (MAP) estimator under a generalized double Pareto shrinkage prior. Through a hierarchical representation of the proposed model, a novel and computationally efficient expectation-maximization (EM) algorithm is developed for variable selection and estimation. The performance of the proposed approach is demonstrated through simulation and is used to analyze rice yields from a pilot study conducted by the Indonesian Center for Rice Research.

Keywords: Bayesian hierarchical models; EM algorithm; MAP estimator; genomic studies; rice science; shrinkage prior.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem*
  • Genomics* / methods
  • Models, Genetic*
  • Multifactorial Inheritance*
  • Oryza / genetics