A Multilevel Bayesian Approach to Improve Effect Size Estimation in Regression Modeling of Metabolomics Data Utilizing Imputation with Uncertainty

Christopher E Gillies; Theodore S Jennaro; Michael A Puskarich; Ruchi Sharma; Kevin R Ward; Xudong Fan; Alan E Jones; Kathleen A Stringer

doi:10.3390/metabo10080319

A Multilevel Bayesian Approach to Improve Effect Size Estimation in Regression Modeling of Metabolomics Data Utilizing Imputation with Uncertainty

Metabolites. 2020 Aug 6;10(8):319. doi: 10.3390/metabo10080319.

Authors

Christopher E Gillies^{1

2

3}, Theodore S Jennaro⁴, Michael A Puskarich⁵, Ruchi Sharma⁶, Kevin R Ward^{1

2

3

6}, Xudong Fan^{3

6}, Alan E Jones⁷, Kathleen A Stringer^{2

8

9}

Affiliations

¹ Department of Emergency Medicine, University of Michigan, Ann Arbor, MI 48109, USA.
² Michigan Center for Integrative Research in Critical Care (MCIRCC), University of Michigan, Ann Arbor, MI 48109, USA.
³ Michigan Institute for Data Science (MIDAS), Office of Research, University of Michigan, Ann Arbor, MI 48109, USA.
⁴ Department of Clinical Pharmacy, College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, USA.
⁵ Department of Emergency Medicine, University of Minnesota, Minneapolis, MN 55455, USA.
⁶ Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI 48109, USA.
⁷ Department of Emergency Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA.
⁸ The NMR Metabolomics Laboratory, Department of Clinical Pharmacy, College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, USA.
⁹ Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, School of Medicine, University of Michigan, Ann Arbor, MI 48109, USA.

Abstract

To ensure scientific reproducibility of metabolomics data, alternative statistical methods are needed. A paradigm shift away from the p-value toward an embracement of uncertainty and interval estimation of a metabolite's true effect size may lead to improved study design and greater reproducibility. Multilevel Bayesian models are one approach that offer the added opportunity of incorporating imputed value uncertainty when missing data are present. We designed simulations of metabolomics data to compare multilevel Bayesian models to standard logistic regression with corrections for multiple hypothesis testing. Our simulations altered the sample size and the fraction of significant metabolites truly different between two outcome groups. We then introduced missingness to further assess model performance. Across simulations, the multilevel Bayesian approach more accurately estimated the effect size of metabolites that were significantly different between groups. Bayesian models also had greater power and mitigated the false discovery rate. In the presence of increased missing data, Bayesian models were able to accurately impute the true concentration and incorporating the uncertainty of these estimates improved overall prediction. In summary, our simulations demonstrate that a multilevel Bayesian approach accurately quantifies the estimated effect size of metabolite predictors in regression modeling, particularly in the presence of missing data.

Keywords: Bayesian statistics; hierarchical modeling; imputation; missing values; multiple test corrections; nuclear magnetic resonance spectroscopy.

Abstract

Grants and funding