On the identification of differentially-active transcription factors from ATAC-seq data

Felix Ezequiel Gerbaldo; Emanuel Sonder; Vincent Fischer; Selina Frei; Jiayi Wang; Katharina Gapp; Mark D Robinson; Pierre-Luc Germain

doi:10.1371/journal.pcbi.1011971

On the identification of differentially-active transcription factors from ATAC-seq data

PLoS Comput Biol. 2024 Oct 23;20(10):e1011971. doi: 10.1371/journal.pcbi.1011971. eCollection 2024 Oct.

Authors

Felix Ezequiel Gerbaldo¹, Emanuel Sonder^{1

2

3

4}, Vincent Fischer⁵, Selina Frei⁵, Jiayi Wang³, Katharina Gapp⁵, Mark D Robinson^{3

4}, Pierre-Luc Germain^{1

3

4}

Affiliations

¹ Computational Neurogenomics, D-HEST Institute for Neurosciences, Zürich, Switzerland.
² Systems Neuroscience, D-HEST Institute for Neurosciences, Zürich, Switzerland.
³ Department of Molecular Life Sciences, University of Zürich, Zürich, Switzerland.
⁴ SIB Swiss Institute of Bioinformatics, University of Zurich, Switzerland.
⁵ Epigenetics and Neuroendocrinology, D-HEST Institute for Neurosciences, Zürich, Switzerland.

Abstract

ATAC-seq has emerged as a rich epigenome profiling technique, and is commonly used to identify Transcription Factors (TFs) underlying given phenomena. A number of methods can be used to identify differentially-active TFs through the accessibility of their DNA-binding motif, however little is known on the best approaches for doing so. Here we benchmark several such methods using a combination of curated datasets with various forms of short-term perturbations on known TFs, as well as semi-simulations. We include both methods specifically designed for this type of data as well as some that can be repurposed for it. We also investigate variations to these methods, and identify three particularly promising approaches (a chromVAR-limma workflow with critical adjustments, monaLisa and a combination of GC smooth quantile normalization and multivariate modeling). We further investigate the specific use of nucleosome-free fragments, the combination of top methods, and the impact of technical variation. Finally, we illustrate the use of the top methods on a novel dataset to characterize the impact on DNA accessibility of TRAnscription Factor TArgeting Chimeras (TRAFTAC), which can deplete TFs-in our case NFkB-at the protein level.

Copyright: © 2024 Gerbaldo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Binding Sites / genetics
Chromatin Immunoprecipitation Sequencing* / methods
Computational Biology* / methods
DNA / genetics
DNA / metabolism
Humans
Sequence Analysis, DNA / methods
Transcription Factors* / genetics
Transcription Factors* / metabolism

Substances

Transcription Factors
DNA

Grants and funding

This work was supported by research grants (ETH-25 02-2 to PLG and 23-2 ETH-015 to KG) from the Swiss Federal Institute of Technology (ETH Zurich). The salary of ES is paid by the ETH-25 02-2 grant. The Gapp lab received funding from a SNF PR00P3_201543 and the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract number MB22.00037. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.