Statistical geometry analysis of proteins: implications for inverted structure prediction

Pac Symp Biocomput. 1996:614-23.

Abstract

The topology of folded proteins from the representative dataset of well-defined three-dimensional protein structures is studied using a statistical geometry approach. Amino acid residues in protein chains are represented by C alpha atoms, thus reducing the protein three-dimensional structure to a set of points in three dimensional space. The Delaunay tessellation of a protein structure generates an aggregate of space-filling irregular tetrahedra, or Delaunay simplices. Each simplex objectively defines four nearest neighbor C alpha atoms, i.e. four nearest neighbor residues. The statistical analysis of residue composition of Delaunay simplices reveals nonrandom preferences for certain quadruplets of amino acids. These nonrandom preferences are used to develop a fitness function that evaluates sequence-structure compatibility. Using this fitness function, several tested native proteins score the highest among 100,000 random sequences with average protein amino acid composition. The statistical geometry approach, based solely on first principles, provides a unique means for protein structure analysis and has direct implications for inverted protein structure prediction.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Computer Simulation
  • Models, Chemical
  • Models, Molecular*
  • Monte Carlo Method
  • Peptide Library
  • Plant Proteins / chemistry
  • Protein Conformation*
  • Protein Folding*
  • Proteins / chemistry*
  • Reproducibility of Results

Substances

  • Peptide Library
  • Plant Proteins
  • Proteins
  • crambin protein, Crambe abyssinica