CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting

Genome Res. 2004 Jan;14(1):170-8. doi: 10.1101/gr.1642804. Epub 2003 Dec 12.

Abstract

Prediction of transcription-factor target sites in promoters remains difficult due to the short length and degeneracy of the target sequences. Although the use of orthologous sequences and phylogenetic footprinting approaches may help in the recognition of conserved and potentially functional sequences, correct alignment of the short transcription-factor binding sites can be problematic for established algorithms, especially when aligning more divergent species. Here, we report a novel phylogenetic footprinting approach, CONREAL, that uses biologically relevant information, that is, potential transcription-factor binding sites as represented by positional weight matrices, to establish anchors between orthologous sequences and to guide promoter sequence alignment. Comparison of the performance of CONREAL with the global alignment programs LAGAN and AVID using a reference data set, shows that CONREAL performs equally well for closely related species like rodents and human, and has a clear added value for aligning promoter elements of more divergent species like human and fish, as it identifies conserved transcription-factor binding sites that are not found by other methods. CONREAL is accessible via a Web interface at http://conreal.niob.knaw.nl/.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms*
  • Animals
  • Binding Sites / genetics
  • Conserved Sequence / genetics*
  • DNA Footprinting / methods*
  • DNA Footprinting / statistics & numerical data
  • DNA-Binding Proteins / genetics
  • Hepatocyte Nuclear Factor 3-beta
  • Humans
  • Internet
  • Mice
  • Nuclear Proteins / genetics
  • Phylogeny*
  • Promoter Regions, Genetic / genetics
  • Rats
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Sequence Alignment* / methods
  • Sequence Alignment* / statistics & numerical data
  • Sequence Homology, Nucleic Acid
  • Takifugu / genetics
  • Transcription Factors / genetics*
  • Transcription Factors / metabolism*
  • Zebrafish / genetics

Substances

  • DNA-Binding Proteins
  • FOXA2 protein, human
  • Foxa2 protein, mouse
  • Foxa2 protein, rat
  • Nuclear Proteins
  • Transcription Factors
  • Hepatocyte Nuclear Factor 3-beta