Regression trees for regulatory element identification

Bioinformatics. 2004 Mar 22;20(5):750-7. doi: 10.1093/bioinformatics/btg480. Epub 2004 Jan 29.

Abstract

Motivation: The transcription of a gene is largely determined by short sequence motifs that serve as binding sites for transcription factors. Recent findings suggest direct relationships between the motifs and gene expression levels. In this work, we present a method for identifying regulatory motifs. Our method makes use of tree-based techniques for recovering the relationships between motifs and gene expression levels.

Results: We treat regulatory motifs and gene expression levels as predictor variables and responses, respectively, and use a regression tree model to identify the structural relationships between them. The regression tree methodology is extended to handle responses from multiple experiments by modifying the split function. The significance of regulatory elements is determined by analyzing tree structures and using a variable importance measure. When applied to two data sets of the yeast Saccharomyces cerevisiae, the method successfully identifies most of the regulatory motifs that are known to control gene transcription under the given experimental conditions, and suggests several new putative motifs. Analysis of the tree structures also reconfirms several pairs of motifs that are known to regulate gene transcription in combination.

Availability: http://if.kaist.ac.kr/~phuong/RegTree

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Amino Acid Motifs / genetics
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation / genetics*
  • Genes, Regulator / physiology*
  • Models, Genetic*
  • Models, Statistical
  • Regression Analysis
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Transcription Factors / genetics
  • Transcription Factors / metabolism

Substances

  • Transcription Factors