PolyAMiner-Bulk is a deep learning-based algorithm that decodes alternative polyadenylation dynamics from bulk RNA-seq data

Cell Rep Methods. 2024 Feb 26;4(2):100707. doi: 10.1016/j.crmeth.2024.100707. Epub 2024 Feb 6.

Abstract

Alternative polyadenylation (APA) is a key post-transcriptional regulatory mechanism; yet, its regulation and impact on human diseases remain understudied. Existing bulk RNA sequencing (RNA-seq)-based APA methods predominantly rely on predefined annotations, severely impacting their ability to decode novel tissue- and disease-specific APA changes. Furthermore, they only account for the most proximal and distal cleavage and polyadenylation sites (C/PASs). Deconvoluting overlapping C/PASs and the inherent noisy 3' UTR coverage in bulk RNA-seq data pose additional challenges. To overcome these limitations, we introduce PolyAMiner-Bulk, an attention-based deep learning algorithm that accurately recapitulates C/PAS sequence grammar, resolves overlapping C/PASs, captures non-proximal-to-distal APA changes, and generates visualizations to illustrate APA dynamics. Evaluation on multiple datasets strongly evinces the performance merit of PolyAMiner-Bulk, accurately identifying more APA changes compared with other methods. With the growing importance of APA and the abundance of bulk RNA-seq data, PolyAMiner-Bulk establishes a robust paradigm of APA analysis.

Keywords: CP: Systems biology; alternative polyadenylation (APA); bioinformatics; computational biology; deep learning; gene regulation; large language model (LLM); post-transcriptional regulation.

MeSH terms

  • Algorithms
  • Deep Learning*
  • Humans
  • Polyadenylation* / genetics
  • RNA
  • RNA-Seq
  • Sequence Analysis, RNA / methods

Substances

  • RNA