Prediction of DNA-Binding Protein-Drug-Binding Sites Using Residue Interaction Networks and Sequence Feature

Front Bioeng Biotechnol. 2022 Apr 20:10:822392. doi: 10.3389/fbioe.2022.822392. eCollection 2022.

Abstract

Identification of protein-ligand binding sites plays a critical role in drug discovery. However, there is still a lack of targeted drug prediction for DNA-binding proteins. This study aims at the binding sites of DNA-binding proteins and drugs, by mining the residue interaction network features, which can describe the local and global structure of amino acids, combined with sequence feature. The predictor of DNA-binding protein-drug-binding sites is built by employing the Extreme Gradient Boosting (XGBoost) model with random under-sampling. We found that the residue interaction network features can better characterize DNA-binding proteins, and the binding sites with high betweenness value and high closeness value are more likely to interact with drugs. The model shows that the residue interaction network features can be used as an important quantitative indicator of drug-binding sites, and this method achieves high predictive performance for the binding sites of DNA-binding protein-drug. This study will help in drug discovery research for DNA-binding proteins.

Keywords: binding site; extreme gradient boosting; protein–ligand; residue interaction network; sequence.