Boosting the prediction and understanding of DNA-binding domains from sequence

Nucleic Acids Res. 2010 Jun;38(10):3149-58. doi: 10.1093/nar/gkq061. Epub 2010 Feb 15.

Abstract

DNA-binding proteins perform vital functions related to transcription, repair and replication. We have developed a new sequence-based machine learning protocol to identify DNA-binding proteins. We compare our method with an extensive benchmark of previously published structure-based machine learning methods as well as a standard sequence alignment technique, BLAST. Furthermore, we elucidate important feature interactions found in a learned model and analyze how specific rules capture general mechanisms that extend across DNA-binding motifs. This analysis is carried out using the malibu machine learning workbench available at http://proteomics.bioengr.uic.edu/malibu and the corresponding data sets and features are available at http://proteomics.bioengr.uic.edu/dna.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • DNA-Binding Proteins / chemistry*
  • Protein Structure, Tertiary
  • Sequence Alignment
  • Sequence Analysis, Protein*

Substances

  • DNA-Binding Proteins