Motivation: Many of the most interesting functional and evolutionary relationships among proteins are so ancient that they cannot be reliably detected through sequence analysis and are apparent only through a comparison of the tertiary structures. The conserved features can often be described as structural motifs consisting of a few single residues or Secondary Structure (SS) elements. Confidence in such motifs is greatly boosted when they are found in more than a pair of proteins.
Results: We describe an algorithm for the automatic discovery of recurring patterns in protein structures. The patterns consist of individual residues having a defined order along the protein's backbone that come close together in the structure and whose spatial conformations are similar. The residues in a pattern need not be close in the protein's sequence. The work described in this paper builds on an earlier reported algorithm for motif discovery. This paper describes a significant improvement of the algorithm which makes it very efficient. The improved efficiency allows us to use it for doing unsupervised learning of patterns occurring in small subsets in a large set of structures, a non-redundant subset of the Protein Data Bank (PDB) database of all known protein structures.