Coherent Blending of Biophysics-Based Knowledge with Bayesian Neural Networks for Robust Protein Property Prediction

Hunter Nisonoff; Yixin Wang; Jennifer Listgarten

doi:10.1021/acssynbio.3c00217

Coherent Blending of Biophysics-Based Knowledge with Bayesian Neural Networks for Robust Protein Property Prediction

ACS Synth Biol. 2023 Nov 17;12(11):3242-3251. doi: 10.1021/acssynbio.3c00217. Epub 2023 Oct 27.

Authors

Hunter Nisonoff¹, Yixin Wang², Jennifer Listgarten^{1

3}

Affiliations

¹ Center for Computational Biology, University of California, Berkeley, Berkeley, California 94720-3220, United States.
² Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109-1107, United States.
³ Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, California 94720-1776, United States.

PMID: 37888887
DOI: 10.1021/acssynbio.3c00217

Abstract

Predicting properties of proteins is of interest for basic biological understanding and protein engineering alike. Increasingly, machine learning (ML) approaches are being used for this task. However, the accuracy of such ML models typically degrades as test proteins stray further from the training data distribution. On the other hand, models that are more data-free, such as biophysics-based models, are typically uniformly accurate over all of the protein space, even if inferior for test points close to the training distribution. Consequently, being able to cohesively blend these two types of information within one model, as appropriate in different parts of the protein space, will improve overall importance. Herein, we tackle just this problem to yield a simple, practical, and scalable approach that can be easily implemented. In particular, we use a Bayesian formulation to integrate biophysical knowledge into neural networks. However, in doing so, a technical challenge arises: Bayesian neural networks (BNNs) enable the user to specify prior information only on the neural network weight parameters, rather than on the function values given to us from a typical biophysics-based model. Consequently, we devise a principled probabilistic method to overcome this challenge. Our approach yields intuitively pleasing results: predictions rely more heavily on the biophysical prior information when the BNN epistemic uncertainty─uncertainty arising from a lack of training data rather than sensor noise─is large and more heavily on the neural network when the epistemic uncertainty is small. We demonstrate this approach on an illustrative synthetic example, on two examples of protein property prediction (fluorescence and binding), and for generality on one small molecule property prediction problem.

Keywords: Bayesian methodology; biophysical models; deep learning; machine learning; protein engineering; uncertainty quantification.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Bayes Theorem
Machine Learning*
Neural Networks, Computer*
Proteins

Substances

Proteins