Prediction of conotoxin type based on long short-term memory network

Math Biosci Eng. 2021 Aug 9;18(5):6700-6708. doi: 10.3934/mbe.2021332.

Abstract

Aiming at the problems of the wet experiment method in identifying the types of conotoxins, such as the complexity, low efficiency and high cost, this study proposes a method that uses the sequence information of the conotoxin peptides combined with long short term memory networks (LSTM) models to predict the Methods of spirotoxin category. This method only needs to take the conotoxin peptide sequence as input, and adopts the character embedding method in text processing to automatically map the sequence to the feature vector representation, and the model extracts features for training and prediction. Experimental results show that the correct index of this method on the test set reaches 0.80, and the AUC value reaches 0.817. For the same test set, the AUC value of the KNN algorithm is 0.641, and the AUC value of the method proposed in this paper is 0.817, indicating that this method can effectively assist in identifying the type of conotoxin.

Keywords: LSTM; character embedding; conotoxin; prediction; spirotoxin category.

MeSH terms

  • Algorithms
  • Conotoxins*
  • Memory, Short-Term

Substances

  • Conotoxins