Gene set enrichment ensemble using fold change data only

J Biomed Inform. 2015 Oct:57:189-203. doi: 10.1016/j.jbi.2015.07.019. Epub 2015 Aug 1.

Abstract

In a number of biological studies, the raw gene expression data are not usually published due to different causes, such as data privacy and patent rights. Instead, significant gene lists with fold change values are usually provided in most studies. However, due to variations in data sources and profiling conditions, only a small number of common significant genes could be found among similar studies. Moreover, traditional gene set based analyses that consider these genes have not taken into account the fold change values, which may be important to distinguish between the different levels of significance of the genes. Human embryonic stem cell derived cardiomyocytes (hESC-CM) is a good representative of this category. hESC-CMs, with its role as a potentially unlimited source of human heart cells for regenerative medicine, have attracted the attentions of biological and medical researchers. Because of the difficulty of acquiring data and the resulting expenses, there are only a few related hESC-CM studies and few hESC-CM gene expression data are provided. In view of these challenges, we propose a new Gene Set Enrichment Ensemble (GSEE) approach to perform gene set based analysis on individual studies based on significant up-regulated gene lists with fold change data only. Our approach provides both explicit and implicit ways to utilize the fold change data, in order to make full use of scarce data. We validate our approach with hESC-CM data and fetal heart data, respectively. Experimental results on significant gene lists from different studies illustrate the effectiveness of our proposed approach.

Keywords: Comparative analysis; Gene Set Enrichment Analysis; Human embryonic stem cell-derived cardiomyocytes.

MeSH terms

  • Cell Differentiation*
  • Embryonic Stem Cells
  • Gene Expression
  • Gene Expression Profiling*
  • Humans
  • Information Dissemination
  • Myocytes, Cardiac*
  • Statistics as Topic*