An individualized predictor of health and disease using paired reference and target samples

BMC Bioinformatics. 2016 Jan 22:17:47. doi: 10.1186/s12859-016-0889-9.

Abstract

Background: Consider the problem of designing a panel of complex biomarkers to predict a patient's health or disease state when one can pair his or her current test sample, called a target sample, with the patient's previously acquired healthy sample, called a reference sample. As contrasted to a population averaged reference this reference sample is individualized. Automated predictor algorithms that compare and contrast the paired samples to each other could result in a new generation of test panels that compare to a person's healthy reference to enhance predictive accuracy. This paper develops such an individualized predictor and illustrates the added value of including the healthy reference for design of predictive gene expression panels.

Results: The objective is to predict each subject's state of infection, e.g., neither exposed nor infected, exposed but not infected, pre-acute phase of infection, acute phase of infection, post-acute phase of infection. Using gene microarray data collected in a large scale serially sampled respiratory virus challenge study we quantify the diagnostic advantage of pairing a person's baseline reference with his or her target sample. The full study consists of 2886 microarray chips assaying 12,023 genes of 151 human volunteer subjects under 4 different inoculation regimes (HRV, RSV, H1N1, H3N2). We train (with cross-validation) reference-aided sparse multi-class classifier algorithms on this data to show that inclusion of a subject's reference sample can improve prediction accuracy by as much as 14 %, for the H3N2 cohort, and by at least 6 %, for the H1N1 cohort. Remarkably, these gains in accuracy are achieved by using smaller panels of genes, e.g., 39 % fewer for H3N2 and 31 % fewer for H1N1. The biomarkers selected by the predictors fall into two categories: 1) contrasting genes that tend to differentially express between target and reference samples over the population; 2) reinforcement genes that remain constant over the two samples, which function as housekeeping normalization genes. Many of these genes are common to all 4 viruses and their roles in the predictor elucidate the function that they play in differentiating the different states of host immune response.

Conclusions: If one uses a suitable mathematical prediction algorithm, inclusion of a healthy reference in biomarker diagnostic testing can potentially improve accuracy of disease prediction with fewer biomarkers.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Gene Expression
  • Genes, Essential
  • Genetic Markers*
  • Humans
  • Influenza A Virus, H1N1 Subtype
  • Influenza A Virus, H3N2 Subtype
  • Microarray Analysis*
  • Models, Molecular
  • Respiratory Syncytial Viruses
  • Rhinovirus
  • Virus Diseases / diagnosis*

Substances

  • Genetic Markers