Protein structure prediction via combinatorial assembly of sub-structural units

Bioinformatics. 2003:19 Suppl 1:i158-68. doi: 10.1093/bioinformatics/btg1020.

Abstract

Following the hierarchical nature of protein folding, we propose a three-stage scheme for the prediction of a protein structure from its sequence. First, the sequence is cut to fragments that are each assigned a structure. Second, the assigned structures are combinatorially assembled to form the overall 3D organization. Third, highly ranked predicted arrangements are completed and refined. This work focuses on the second stage of this scheme: the combinatorial assembly. We present CombDock, a combinatorial docking algorithm. CombDock gets an ordered set of protein sub-structures and predicts the inter-contacts that define their overall organization. We reduce the combinatorial assembly to a graph-theory problem, and give a heuristic polynomial solution to this computationally hard problem. We applied CombDock to various examples of structural units of two types: protein domains and building blocks, which are relatively stable sub-structures of domains. Moreover, we tested CombDock using increasingly distorted input, where the native structural units were replaced by similarly folded units extracted from homologous proteins and, in the more difficult cases, from globally unrelated proteins. The algorithm is robust, showing low sensitivity to input distortion. This suggests that CombDock is a useful tool in protein structure prediction that may be applied to large target proteins.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Binding Sites
  • Combinatorial Chemistry Techniques / methods*
  • Computer Simulation
  • Glucosyltransferases / chemistry
  • Models, Chemical
  • Models, Molecular*
  • Protein Binding
  • Protein Conformation
  • Protein Structure, Tertiary
  • Protein Subunits
  • Proteins / chemistry*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid

Substances

  • Protein Subunits
  • Proteins
  • Glucosyltransferases
  • cyclomaltodextrin glucanotransferase