A computational framework for complex disease stratification from multiple large-scale datasets

BMC Syst Biol. 2018 May 29;12(1):60. doi: 10.1186/s12918-018-0556-z.

Abstract

Background: Multilevel data integration is becoming a major area of research in systems biology. Within this area, multi-'omics datasets on complex diseases are becoming more readily available and there is a need to set standards and good practices for integrated analysis of biological, clinical and environmental data. We present a framework to plan and generate single and multi-'omics signatures of disease states.

Methods: The framework is divided into four major steps: dataset subsetting, feature filtering, 'omics-based clustering and biomarker identification.

Results: We illustrate the usefulness of this framework by identifying potential patient clusters based on integrated multi-'omics signatures in a publicly available ovarian cystadenocarcinoma dataset. The analysis generated a higher number of stable and clinically relevant clusters than previously reported, and enabled the generation of predictive models of patient outcomes.

Conclusions: This framework will help health researchers plan and perform multi-'omics big data analyses to generate hypotheses and make sense of their rich, diverse and ever growing datasets, to enable implementation of translational P4 medicine.

Keywords: Molecular signatures; Stratification; Systems medicine; ‘Omics data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers / metabolism
  • Cluster Analysis
  • Disease / genetics*
  • False Positive Reactions
  • Machine Learning
  • Quality Control
  • Systems Biology / methods*

Substances

  • Biomarkers