BASiCS workflow: a step-by-step analysis of expression variability using single cell RNA sequencing data

Alan O'Callaghan; Nils Eling; John C Marioni; Catalina A Vallejos

doi:10.12688/f1000research.74416.1

BASiCS workflow: a step-by-step analysis of expression variability using single cell RNA sequencing data

F1000Res. 2024 May 7:11:59. doi: 10.12688/f1000research.74416.1. eCollection 2022.

Authors

Alan O'Callaghan¹, Nils Eling^{2

3}, John C Marioni^{4

5}, Catalina A Vallejos^{1

6}

Affiliations

¹ MRC Human Genetics Unit, Institute of Genetics & Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK.
² Institute for Molecular Health Sciences, ETH Zürich, Zürich, 8093, Switzerland.
³ Department of Quantitative Biomedicine, University of Zurich, Zürich, CH-8057, Switzerland.
⁴ Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB2 0RE, UK.
⁵ European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, CB10 1SD, UK.
⁶ The Alan Turing Institute, The Alan Turing Institute, London, NW1 2DB, UK.

Abstract

Cell-to-cell gene expression variability is an inherent feature of complex biological systems, such as immunity and development. Single-cell RNA sequencing is a powerful tool to quantify this heterogeneity, but it is prone to strong technical noise. In this article, we describe a step-by-step computational workflow that uses the BASiCS Bioconductor package to robustly quantify expression variability within and between known groups of cells (such as experimental conditions or cell types). BASiCS uses an integrated framework for data normalisation, technical noise quantification and downstream analyses, propagating statistical uncertainty across these steps. Within a single seemingly homogeneous cell population, BASiCS can identify highly variable genes that exhibit strong heterogeneity as well as lowly variable genes with stable expression. BASiCS also uses a probabilistic decision rule to identify changes in expression variability between cell populations, whilst avoiding confounding effects related to differences in technical noise or in overall abundance. Using a publicly available dataset, we guide users through a complete pipeline that includes preliminary steps for quality control, as well as data exploration using the scater and scran Bioconductor packages. The workflow is accompanied by a Docker image that ensures the reproducibility of our results.

Keywords: Bayesian; bioinformatics; differential expression testing; expression variability; heterogeneity; scRNAseq; single-cell RNA sequencing; transcriptional noise.

MeSH terms

Computational Biology / methods
Gene Expression Profiling / methods
Humans
Sequence Analysis, RNA* / methods
Single-Cell Analysis* / methods
Software
Workflow*

Grants and funding

C.A.V. is a Chancellor’s Fellow funded by the University of Edinburgh. A.OC. was funded by the Chancellor’s Fellowship granted to C.A.V. N.E. was funded by the European Molecular Biology Laboratory International PhD Programme. Work by JCM was supported by CRUK (C9545/A29580) and by core support from EMBL