ProMate: a structure based prediction program to identify the location of protein-protein binding sites

Hani Neuvirth; Ran Raz; Gideon Schreiber

doi:10.1016/j.jmb.2004.02.040

ProMate: a structure based prediction program to identify the location of protein-protein binding sites

J Mol Biol. 2004 Apr 16;338(1):181-99. doi: 10.1016/j.jmb.2004.02.040.

Authors

Hani Neuvirth¹, Ran Raz, Gideon Schreiber

Affiliation

¹ Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100 Israel.

PMID: 15050833
DOI: 10.1016/j.jmb.2004.02.040

Abstract

Is the whole protein surface available for interaction with other proteins, or are specific sites pre-assigned according to their biophysical and structural character? And if so, is it possible to predict the location of the binding site from the surface properties? These questions are answered quantitatively by probing the surfaces of proteins using spheres of radius of 10 A on a database (DB) of 57 unique, non-homologous proteins involved in heteromeric, transient protein-protein interactions for which the structures of both the unbound and bound states were determined. In structural terms, we found the binding site to have a preference for beta-sheets and for relatively long non-structured chains, but not for alpha-helices. Chemically, aromatic side-chains show a clear preference for binding sites. While the hydrophobic and polar content of the interface is similar to the rest of the surface, hydrophobic and polar residues tend to cluster in interfaces. In the crystal, the binding site has more bound water molecules surrounding it, and a lower B-factor already in the unbound protein. The same biophysical properties were found to hold for the unbound and bound DBs. All the significant interface properties were combined into ProMate, an interface prediction program. This was followed by an optimization step to choose the best combination of properties, as many of them are correlated. During optimization and prediction, the tested proteins were not used for data collection, to avoid over-fitting. The prediction algorithm is fully automated, and is used to predict the location of potential binding sites on unbound proteins with known structures. The algorithm is able to successfully predict the location of the interface for about 70% of the proteins. The success rate of the predictor was equal whether applied on the unbound DB or on the disjoint bound DB. A prediction is assumed correct if over half of the predicted continuous interface patch is indeed interface. The ability to predict the location of protein-protein interfaces has far reaching implications both towards our understanding of specificity and kinetics of binding, as well as in assisting in the analysis of the proteome.

MeSH terms

Algorithms*
Binding Sites
Computational Biology / methods*
Databases, Factual
Models, Molecular
Protein Binding
Protein Conformation
Proteins / chemistry*
Proteins / metabolism
Software*
Surface Properties

Substances

Proteins