The gene regulatory code and grammar remain largely unknown, precluding our ability to link phenotype to genotype in regulatory sequences. Here, using a massively parallel reporter assay (MPRA) of 209,440 sequences, we examine all possible pair and triplet combinations, permutations and orientations of eighteen liver-associated transcription factor binding sites (TFBS). We find that TFBS orientation and order have a major effect on gene regulatory activity. Corroborating these results with genomic analyses, we find clear human promoter TFBS orientation biases and similar TFBS orientation and order transcriptional effects in an MPRA that tested 164,307 liver candidate regulatory elements. Additionally, by adding TFBS orientation to a model that predicts expression from sequence we improve performance by 7.7%. Collectively, our results show that TFBS orientation and order have a significant effect on gene regulatory activity and need to be considered when analyzing the functional effect of variants on the activity of these sequences.
© 2023. The Author(s).