Machine Learning Prediction of the Transmission Function for Protein Sequencing with Graphene Nanoslit

ACS Appl Mater Interfaces. 2022 Nov 23;14(46):51645-51655. doi: 10.1021/acsami.2c13405. Epub 2022 Nov 14.

Abstract

Protein sequencing has rapidly changed the landscape of healthcare and life science by accelerating the growth of diagnostics and personalized medicines for a variety of fatal diseases. Next-generation nanopore/nanoslit sequencing is promising to achieve single-molecule resolution with chromosome-size-long readability. However, due to inherent complexity, high-throughput sequencing of all 20 amino acids demands different approaches. Aiming to accelerate the detection of amino acids, a general machine learning (ML) method has been developed for quick and accurate prediction of the transmission function for amino acid sequencing. Among the utilized ML models, the XGBoost regression model is found to be the most effective algorithm for fast prediction of the transmission function with a very low test root-mean-square error (RMSE ∼0.05). In addition, using the random forest ML classification technique, we are able to classify the neutral amino acids with a prediction accuracy of 100%. Therefore, our approach is an initiative for the prediction of the transmission function through ML and can provide a platform for the quick identification of amino acids with high accuracy.

Keywords: amino acids; machine learning; sensitivity; sequencing; transmission.

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / genetics
  • Graphite*
  • Machine Learning
  • Sequence Analysis, Protein

Substances

  • Graphite
  • Amino Acids