Cross-modal embedding integrator for disease-gene/protein association prediction using a multi-head attention mechanism

Pharmacol Res Perspect. 2024 Dec;12(6):e70034. doi: 10.1002/prp2.70034.

Abstract

Knowledge graphs, powerful tools that explicitly transfer knowledge to machines, have significantly advanced new knowledge inferences. Discovering unknown relationships between diseases and genes/proteins in biomedical knowledge graphs can lead to the identification of disease development mechanisms and new treatment targets. Generating high-quality representations of biomedical entities is essential for successfully predicting disease-gene/protein associations. We developed a computational model that predicts disease-gene/protein associations using the Precision Medicine Knowledge Graph, a biomedical knowledge graph. Embeddings of biomedical entities were generated using two different methods-a large language model (LLM) and the knowledge graph embedding (KGE) algorithm. The LLM utilizes information obtained from massive amounts of text data, whereas the KGE algorithm relies on graph structures. We developed a disease-gene/protein association prediction model, "Cross-Modal Embedding Integrator (CMEI)," by integrating embeddings from different modalities using a multi-head attention mechanism. The area under the receiver operating characteristic curve of CMEI was 0.9662 (± 0.0002) in predicting disease-gene/protein associations. In conclusion, we developed a computational model that effectively predicts disease-gene/protein associations. CMEI may contribute to the identification of disease development mechanisms and new treatment targets.

Keywords: disease; gene; knowledge graph embedding; large language model; multi‐head attention; protein.

MeSH terms

  • Algorithms*
  • Computational Biology / methods
  • Humans
  • Precision Medicine / methods
  • Proteins / genetics
  • ROC Curve

Substances

  • Proteins