How protein domain structure changes in response to mutations is not well understood. Some mutations change the structure drastically, while most only result in small changes. To gain an understanding of this, we decompose the relationship between changes in domain sequence and structure using machine learning. We select pairs of evolutionarily related domains with a broad range of evolutionary distances. In contrast to earlier studies, we do not find a strictly linear relationship between sequence and structural changes. We train a random forest regressor that predicts the structural similarity between pairs with an average accuracy of 0.029 lDDT ( local Distance Difference Test) score, and a correlation coefficient of 0.92. Decomposing the feature importance shows that the domain length, or analogously, size is the most important feature. Our model enables assessing deviations in relative structural response, and thus prediction of evolutionary trajectories, in protein domains across evolution.
Keywords: evolutionary distance; mutations; protein evolution; protein structure.
Copyright © 2020 The Authors. Published by Elsevier Ltd.. All rights reserved.