Applying graph database technology for analyzing perturbed co-expression networks in cancer

Claire M Simpson; Florian Gnad

doi:10.1093/database/baaa110

Applying graph database technology for analyzing perturbed co-expression networks in cancer

Database (Oxford). 2020 Dec 11:2020:baaa110. doi: 10.1093/database/baaa110.

Authors

Claire M Simpson¹, Florian Gnad¹

Affiliation

¹ Department of Bioinformatics and Data Science, Cell Signaling Technology Inc., 3 Trask Lane, Danvers, MA 01923, USA.

Abstract

Graph representations provide an elegant solution to capture and analyze complex molecular mechanisms in the cell. Co-expression networks are undirected graph representations of transcriptional co-behavior indicating (co-)regulations, functional modules or even physical interactions between the corresponding gene products. The growing avalanche of available RNA sequencing (RNAseq) data fuels the construction of such networks, which are usually stored in relational databases like most other biological data. Inferring linkage by recursive multiple-join statements, however, is computationally expensive and complex to design in relational databases. In contrast, graph databases store and represent complex interconnected data as nodes, edges and properties, making it fast and intuitive to query and analyze relationships. While graph-based database technologies are on their way from a fringe domain to going mainstream, there are only a few studies reporting their application to biological data. We used the graph database management system Neo4j to store and analyze co-expression networks derived from RNAseq data from The Cancer Genome Atlas. Comparing co-expression in tumors versus healthy tissues in six cancer types revealed significant perturbation tracing back to erroneous or rewired gene regulation. Applying centrality, community detection and pathfinding graph algorithms uncovered the destruction or creation of central nodes, modules and relationships in co-expression networks of tumors. Given the speed, accuracy and straightforwardness of managing these densely connected networks, we conclude that graph databases are ready for entering the arena of biological data.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Database Management Systems*
Databases, Factual
Humans
Neoplasms* / genetics
Technology