Transfer learning for drug-target interaction prediction

Alperen Dalkıran; Ahmet Atakan; Ahmet S Rifaioğlu; Maria J Martin; Rengül Çetin Atalay; Aybar C Acar; Tunca Doğan; Volkan Atalay

doi:10.1093/bioinformatics/btad234

Transfer learning for drug-target interaction prediction

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i103-i110. doi: 10.1093/bioinformatics/btad234.

Authors

Alperen Dalkıran^{1

2}, Ahmet Atakan^{1

3}, Ahmet S Rifaioğlu^{4

5}, Maria J Martin⁶, Rengül Çetin Atalay⁷, Aybar C Acar⁸, Tunca Doğan^{6

9}, Volkan Atalay¹

Affiliations

¹ Department of Computer Engineering, Middle East Technical University, Ankara 06800, Turkey.
² Department of Computer Engineering, Adana Alparslan Türkeş Science and Technology University, Adana 01250, Turkey.
³ Department of Computer Engineering, Erzincan Binali Yıldırım University, Erzincan 24002, Turkey.
⁴ Department of Computer Engineering, Iskenderun Technical University, Hatay 31200, Turkey.
⁵ Faculty of Medicine, Institute for Computational Biomedicine, Heidelberg University and Heidelberg University Hospital, Heidelberg 69120, Germany.
⁶ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, Hinxton CB10 1SD, United Kingdom.
⁷ Faculty of Pulmonary and Critical Care Medicine, the University of Chicago, Chicago, IL, 60637, United States.
⁸ Cancer Systems Biology Laboratory (Kansil), Middle East Technical University, Ankara 06800, Turkey.
⁹ Department of Computer Engineering, Hacettepe University, Ankara 06800, Turkey.

Abstract

Motivation: Utilizing AI-driven approaches for drug-target interaction (DTI) prediction require large volumes of training data which are not available for the majority of target proteins. In this study, we investigate the use of deep transfer learning for the prediction of interactions between drug candidate compounds and understudied target proteins with scarce training data. The idea here is to first train a deep neural network classifier with a generalized source training dataset of large size and then to reuse this pre-trained neural network as an initial configuration for re-training/fine-tuning purposes with a small-sized specialized target training dataset. To explore this idea, we selected six protein families that have critical importance in biomedicine: kinases, G-protein-coupled receptors (GPCRs), ion channels, nuclear receptors, proteases, and transporters. In two independent experiments, the protein families of transporters and nuclear receptors were individually set as the target datasets, while the remaining five families were used as the source datasets. Several size-based target family training datasets were formed in a controlled manner to assess the benefit provided by the transfer learning approach.

Results: Here, we present a systematic evaluation of our approach by pre-training a feed-forward neural network with source training datasets and applying different modes of transfer learning from the pre-trained source network to a target dataset. The performance of deep transfer learning is evaluated and compared with that of training the same deep neural network from scratch. We found that when the training dataset contains fewer than 100 compounds, transfer learning outperforms the conventional strategy of training the system from scratch, suggesting that transfer learning is advantageous for predicting binders to under-studied targets.

Availability and implementation: The source code and datasets are available at https://github.com/cansyl/TransferLearning4DTI. Our web-based service containing the ready-to-use pre-trained models is accessible at https://tl4dti.kansil.org.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Machine Learning
Neural Networks, Computer*
Peptide Hydrolases*
Software

Substances

Peptide Hydrolases