To advance our ability to predict impacts of the protein scaffold on catalysis, robust classification schemes to define features of proteins that will influence reactivity are needed. One of these features is a protein's metal-binding ability, as metals are critical to catalytic conversion by metalloenzymes. As a step toward realizing this goal, we used convolutional neural networks (CNNs) to enable the classification of a metal cofactor binding pocket within a protein scaffold. CNNs enable images to be classified based on multiple levels of detail in the image, from edges and corners to entire objects, and can provide rapid classification. First, six CNN models were fine-tuned to classify the 20 standard amino acids to choose a performant model for amino acid classification. This model was then trained in two parallel efforts: to classify a 2D image of the environment within a given radius of the central metal binding site, either an Fe ion or a [2Fe-2S] cofactor, with the metal visible (effort 1) or the metal hidden (effort 2). We further used two sub-classifications of the [2Fe-2S] cofactor: (1) a standard [2Fe-2S] cofactor and (2) a Rieske [2Fe-2S] cofactor. The accuracy for the model correctly identifying all three defined features was >95%, despite our perception of the increased challenge of the metalloenzyme identification. This demonstrates that machine learning methodology to classify and distinguish similar metal-binding sites, even in the absence of a visible cofactor, is indeed possible and offers an additional tool for metal-binding site identification in proteins.
Keywords: Rieske; amino acids; convolutional neural network; image classification; iron-sulfur; metal-binding sites; metalloenzyme.
© 2023 Battelle Memorial Institute. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.