Transipedia.org: k-mer-based exploration of large RNA sequencing datasets and application to cancer data

Genome Biol. 2024 Oct 10;25(1):266. doi: 10.1186/s13059-024-03413-5.

Abstract

Indexing techniques relying on k-mers have proven effective in searching for RNA sequences across thousands of RNA-seq libraries, but without enabling direct RNA quantification. We show here that arbitrary RNA sequences can be quantified in seconds through their decomposition into k-mers, with a precision akin to that of conventional RNA quantification methods. Using an index of the Cancer Cell Line Encyclopedia (CCLE) collection consisting of 1019 RNA-seq samples, we show that k-mer indexing offers a powerful means to reveal non-reference sequences, and variant RNAs induced by specific gene alterations, for instance in splicing factors.

Keywords: Bioinformatics; Non-coding RNA; RNA-processing; RNA-seq; Transcriptomics.

MeSH terms

  • Cell Line, Tumor
  • Humans
  • Neoplasms* / genetics
  • RNA-Seq / methods
  • Sequence Analysis, RNA* / methods
  • Software