Since the discovery of 5-hydroxymethylcytosine (5hmC) as a prominent DNA modification found in mammalian genomes, an emergent question has been what role this mark plays in gene regulation. 5hmC is hypothesized to function as an intermediate in the demethylation of 5-methylcytosine (5mC) and in the reactivation of silenced promoters and enhancers. Further, weak positive correlations are observed between gene body 5hmC and gene expression. We previously demonstrated that ME-Class is an effective tool to understand relationships between whole-genome bisulfite sequencing data and expression. In this work, we present ME-Class2, a machine-learning based tool to perform integrative 5mCG, 5hmCG and expression analysis. Using ME-Class2 we analyze whole-genome single-base resolution 5mCG and 5hmCG datasets from 20 primary tissue and cell samples to reveal relationships between 5hmCG and expression. Our analysis indicates that conversion of 5mCG to 5hmCG within 2 kb of the transcription start site associates with distinct functions depending on the summed level of 5mCG + 5hmCG. Unchanged levels of 5mCG + 5hmCG (conversion from 5mCG to stable 5hmCG) associate with repression. Meanwhile, decreases in 5mCG + 5hmCG (5hmCG-mediated demethylation) associate with gene activation. Our results demonstrate that ME-Class2 will prove invaluable to interpret genome-wide 5mC and 5hmC datasets and guide mechanistic studies into the function of 5hmCG.
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.