Structure motif discovery and mining the PDB

Inge Jonassen; Ingvar Eidhammer; Darrell Conklin; William R Taylor

doi:10.1093/bioinformatics/18.2.362

Structure motif discovery and mining the PDB

Bioinformatics. 2002 Feb;18(2):362-7. doi: 10.1093/bioinformatics/18.2.362.

Authors

Inge Jonassen¹, Ingvar Eidhammer, Darrell Conklin, William R Taylor

Affiliation

¹ Department of Informatics, University of Bergen, HIB, N5020 Bergen, Norway. Inge.Jonassen@ii.uib.no

PMID: 11847094
DOI: 10.1093/bioinformatics/18.2.362

Abstract

Motivation: Many of the most interesting functional and evolutionary relationships among proteins are so ancient that they cannot be reliably detected through sequence analysis and are apparent only through a comparison of the tertiary structures. The conserved features can often be described as structural motifs consisting of a few single residues or Secondary Structure (SS) elements. Confidence in such motifs is greatly boosted when they are found in more than a pair of proteins.

Results: We describe an algorithm for the automatic discovery of recurring patterns in protein structures. The patterns consist of individual residues having a defined order along the protein's backbone that come close together in the structure and whose spatial conformations are similar. The residues in a pattern need not be close in the protein's sequence. The work described in this paper builds on an earlier reported algorithm for motif discovery. This paper describes a significant improvement of the algorithm which makes it very efficient. The improved efficiency allows us to use it for doing unsupervised learning of patterns occurring in small subsets in a large set of structures, a non-redundant subset of the Protein Data Bank (PDB) database of all known protein structures.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Amino Acid Motifs
Computational Biology
Cystine / chemistry
Databases, Protein*
Molecular Structure
Protein Structure, Secondary
Proteins / chemistry*
Software*

Substances

Proteins
Cystine