Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis

Abhishek Sarkar; Matthew Stephens

doi:10.1038/s41588-021-00873-4

Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis

Nat Genet. 2021 Jun;53(6):770-777. doi: 10.1038/s41588-021-00873-4. Epub 2021 May 24.

Authors

Abhishek Sarkar¹, Matthew Stephens^{2

3}

Affiliations

¹ Department of Human Genetics, University of Chicago, Chicago, IL, USA. aksarkar@uchicago.edu.
² Department of Human Genetics, University of Chicago, Chicago, IL, USA. mstephens@uchicago.edu.
³ Department of Statistics, University of Chicago, Chicago, IL, USA. mstephens@uchicago.edu.

Abstract

The high proportion of zeros in typical single-cell RNA sequencing datasets has led to widespread but inconsistent use of terminology such as dropout and missing data. Here, we argue that much of this terminology is unhelpful and confusing, and outline simple ideas to help to reduce confusion. These include: (1) observed single-cell RNA sequencing counts reflect both true gene expression levels and measurement error, and carefully distinguishing between these contributions helps to clarify thinking; and (2) method development should start with a Poisson measurement model, rather than more complex models, because it is simple and generally consistent with existing data. We outline how several existing methods can be viewed within this framework and highlight how these methods differ in their assumptions about expression variation. We also illustrate how our perspective helps to address questions of biological interest, such as whether messenger RNA expression levels are multimodal among cells.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Review

MeSH terms

Gene Expression Regulation*
Humans
Models, Genetic
Poisson Distribution
Sequence Analysis, RNA*
Single-Cell Analysis*
Terminology as Topic

Abstract

Publication types

MeSH terms

Grants and funding