Structure motif discovery and mining the PDB

Bioinformatics. 2002 Feb;18(2):362-7. doi: 10.1093/bioinformatics/18.2.362.

Abstract

Motivation: Many of the most interesting functional and evolutionary relationships among proteins are so ancient that they cannot be reliably detected through sequence analysis and are apparent only through a comparison of the tertiary structures. The conserved features can often be described as structural motifs consisting of a few single residues or Secondary Structure (SS) elements. Confidence in such motifs is greatly boosted when they are found in more than a pair of proteins.

Results: We describe an algorithm for the automatic discovery of recurring patterns in protein structures. The patterns consist of individual residues having a defined order along the protein's backbone that come close together in the structure and whose spatial conformations are similar. The residues in a pattern need not be close in the protein's sequence. The work described in this paper builds on an earlier reported algorithm for motif discovery. This paper describes a significant improvement of the algorithm which makes it very efficient. The improved efficiency allows us to use it for doing unsupervised learning of patterns occurring in small subsets in a large set of structures, a non-redundant subset of the Protein Data Bank (PDB) database of all known protein structures.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Computational Biology
  • Cystine / chemistry
  • Databases, Protein*
  • Molecular Structure
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Software*

Substances

  • Proteins
  • Cystine