From genome-scale data to models of infectious disease: A Bayesian network-based strategy to drive model development

Math Biosci. 2015 Dec;270(Pt B):156-68. doi: 10.1016/j.mbs.2015.06.006. Epub 2015 Jun 17.

Abstract

High-throughput, genome-scale data present a unique opportunity to link host to pathogen on a molecular level. Forging such connections will help drive the development of mathematical models to better understand and predict both pathogen behavior and the epidemiology of infectious diseases, including malaria. However, the datasets that can aid in identifying these links and models are vast and not amenable to simple, reductionist, and univariate analyses. These datasets require data mining in order to identify the truly important measurements that best describe clinical and molecular observations. Moreover, these datasets typically have relatively few samples due to experimental limitations (particularly for human studies or in vivo animal experiments), making data mining extremely difficult. Here, after first providing a brief overview of common strategies for data reduction and identification of relationships between variables for inclusion in mathematical models, we present a new generalized strategy for performing these data reduction and relationship inference tasks. Our approach emphasizes the importance of robustness when using data to drive model development, particularly when using genome-scale, small-sample in vivo data. We identify the use of appropriate feature reduction combined with data permutations and subsampling strategies as being critical to enable increasingly robust results from network inference using high-dimensional, low-observation data.

Keywords: Bayesian network inference; Infectious diseases; Large-scale data analysis; Malaria; Model development.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Animals
  • Bayes Theorem*
  • Communicable Diseases*
  • Genomics*
  • Macaca mulatta
  • Models, Theoretical*