Statistical modelling of bacterial promoter sequences for regulatory motif discovery with the help of transcriptome data: application to Listeria monocytogenes

Ibrahim Sultan; Vincent Fromion; Sophie Schbath; Pierre Nicolas

doi:10.1098/rsif.2020.0600

Statistical modelling of bacterial promoter sequences for regulatory motif discovery with the help of transcriptome data: application to Listeria monocytogenes

J R Soc Interface. 2020 Oct;17(171):20200600. doi: 10.1098/rsif.2020.0600. Epub 2020 Oct 7.

Authors

Ibrahim Sultan¹, Vincent Fromion¹, Sophie Schbath¹, Pierre Nicolas¹

Affiliation

¹ Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France.

Abstract

Automatic de novo identification of the main regulons of a bacterium from genome and transcriptome data remains a challenge. To address this task, we propose a statistical model that can use information on exact positions of the transcription start sites and condition-dependent expression profiles. The central idea of this model is to improve the probabilistic representation of the promoter DNA sequences by incorporating covariates summarizing expression profiles (e.g. coordinates in projection spaces or hierarchical clustering trees). A dedicated trans-dimensional Markov chain Monte Carlo algorithm adjusts the width and palindromic properties of the corresponding position-weight matrices, the number of parameters to describe exact position relative to the transcription start site, and chooses the expression covariates relevant for each motif. All parameters are estimated simultaneously, for many motifs and many expression covariates. The method is applied to a dataset of transcription start sites and expression profiles available for Listeria monocytogenes. The results validate the approach and provide a new global view of the transcription regulatory network of this important pathogen. Remarkably, a previously unreported motif is found in promoter regions of ribosomal protein genes, suggesting a role in the regulation of growth.

Keywords: DNA motifs; Markov chain Monte Carlo; bacteria; transcriptional regulatory network; transcriptomics.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Listeria monocytogenes* / genetics
Markov Chains
Models, Statistical
Promoter Regions, Genetic
Transcriptome