BASiCS workflow: a step-by-step analysis of expression variability using single cell RNA sequencing data

F1000Res. 2024 May 7:11:59. doi: 10.12688/f1000research.74416.1. eCollection 2022.

Abstract

Cell-to-cell gene expression variability is an inherent feature of complex biological systems, such as immunity and development. Single-cell RNA sequencing is a powerful tool to quantify this heterogeneity, but it is prone to strong technical noise. In this article, we describe a step-by-step computational workflow that uses the BASiCS Bioconductor package to robustly quantify expression variability within and between known groups of cells (such as experimental conditions or cell types). BASiCS uses an integrated framework for data normalisation, technical noise quantification and downstream analyses, propagating statistical uncertainty across these steps. Within a single seemingly homogeneous cell population, BASiCS can identify highly variable genes that exhibit strong heterogeneity as well as lowly variable genes with stable expression. BASiCS also uses a probabilistic decision rule to identify changes in expression variability between cell populations, whilst avoiding confounding effects related to differences in technical noise or in overall abundance. Using a publicly available dataset, we guide users through a complete pipeline that includes preliminary steps for quality control, as well as data exploration using the scater and scran Bioconductor packages. The workflow is accompanied by a Docker image that ensures the reproducibility of our results.

Keywords: Bayesian; bioinformatics; differential expression testing; expression variability; heterogeneity; scRNAseq; single-cell RNA sequencing; transcriptional noise.

MeSH terms

  • Computational Biology / methods
  • Gene Expression Profiling / methods
  • Humans
  • Sequence Analysis, RNA* / methods
  • Single-Cell Analysis* / methods
  • Software
  • Workflow*

Grants and funding

C.A.V. is a Chancellor’s Fellow funded by the University of Edinburgh. A.OC. was funded by the Chancellor’s Fellowship granted to C.A.V. N.E. was funded by the European Molecular Biology Laboratory International PhD Programme. Work by JCM was supported by CRUK (C9545/A29580) and by core support from EMBL