Random gene sets in predicting survival of patients with hepatocellular carcinoma

J Mol Med (Berl). 2019 Jun;97(6):879-888. doi: 10.1007/s00109-019-01764-2. Epub 2019 Apr 17.

Abstract

Despite multiple publications, molecular signatures predicting the course of hepatocellular carcinoma (HCC) have not yet been integrated into clinical routine decision-making. Given the diversity of published signatures, optimal number, best combinations, and benefit of functional associations of genes in prognostic signatures remain to be defined. We investigated a vast number of randomly chosen gene sets (varying between 1 and 10,000 genes) to encompass the full range of prognostic gene sets on 242 transcriptomic profiles of patients with HCC. Depending on the selected size, 4.7 to 23.5% of all random gene sets exhibit prognostic potential by separating patient subgroups with significantly diverse survival. This was further substantiated by investigating gene sets and signaling pathways also resulting in a comparable high number of significantly prognostic gene sets. However, combining multiple random gene sets using "swarm intelligence" resulted in a significantly improved predictability for approximately 63% of all patients. In these patients, approx. 70% of all random 50-gene containing gene sets resulted in equal and stable prediction of survival. For all other patients, a reliable prediction seems highly unlikely for any selected gene set. Using a machine learning and independent validation approach, we demonstrated a high reliability of random gene sets and swarm intelligence in HCC prognosis. Ultimately, these findings were validated in two independent patient cohorts and independent technical platforms (microarray, RNASeq). In conclusion, we demonstrate that using "swarm intelligence" of multiple gene sets for prognosis prediction may not only be superior but also more robust for predictive purposes. KEY MESSAGES: Molecular signatures predicting HCC have not yet been integrated into clinical routine Depending on the selected size, 4.7 to 23.5% of all random gene sets exhibit prognostic potential; independent of the technical platform (microarray, RNASeq) Using "swarm intelligence" resulted in a significantly improved predictability In these patients, approx. 70% of all random 50-gene containing gene sets resulted in equal and stable prediction of survival Overall, "swarm intelligence" is superior and more robust for predictive purposes in HCC.

Keywords: Bioinformatics; Gene set; HCC; Liver cancer; Microarray; Profiling; Prognostic; RNA Seq; Random; Signature; Swarm intelligence; Transcriptome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Carcinoma, Hepatocellular / genetics*
  • Cluster Analysis
  • Cohort Studies
  • Databases, Genetic
  • Gene Expression Regulation, Neoplastic
  • Gene Ontology
  • Genes, Neoplasm*
  • Humans
  • Liver Neoplasms / genetics*
  • Prognosis
  • Reproducibility of Results
  • Signal Transduction / genetics
  • Survival Analysis