Uncovering Thousands of New Peptides with Sequence-Mask-Search Hybrid De Novo Peptide Sequencing Framework

Mol Cell Proteomics. 2019 Dec;18(12):2478-2491. doi: 10.1074/mcp.TIR119.001656. Epub 2019 Oct 7.

Abstract

Typical analyses of mass spectrometry data only identify amino acid sequences that exist in reference databases. This restricts the possibility of discovering new peptides such as those that contain uncharacterized mutations or originate from unexpected processing of RNAs and proteins. De novo peptide sequencing approaches address this limitation but often suffer from low accuracy and require extensive validation by experts. Here, we develop SMSNet, a deep learning-based de novo peptide sequencing framework that achieves >95% amino acid accuracy while retaining good identification coverage. Applications of SMSNet on landmark proteomics and peptidomics studies reveal over 10,000 previously uncharacterized HLA antigens and phosphopeptides, and in conjunction with database-search methods, expand the coverage of peptide identification by almost 30%. The power to accurately identify new peptides of SMSNet would make it an invaluable tool for any future proteomics and peptidomics studies, including tumor neoantigen discovery, antibody sequencing, and proteome characterization of non-model organisms.

Keywords: De novo sequencing; bioinformatics searching; deep learning; mass spectrometry; peptides; phosphoproteome; software.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Datasets as Topic
  • Deep Learning*
  • HLA Antigens / analysis
  • Humans
  • Peptides / analysis*
  • Phosphopeptides / analysis
  • Sequence Analysis, Protein / methods*
  • Tandem Mass Spectrometry

Substances

  • HLA Antigens
  • Peptides
  • Phosphopeptides