Understanding the Limit of Open Search in the Identification of Peptides With Post-translational Modifications - A Simulation-Based Study

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2884-2890. doi: 10.1109/TCBB.2020.2991207. Epub 2021 Dec 8.

Abstract

Peptide identification from tandem mass spectrometry data is a fundamental task in computational proteomics. Traditional algorithms perform well when facing unmodified peptides. However, when peptides have post-translational modifications (PTMs), these methods cannot provide satisfactory results. Recently, open search methods have been proposed to identify peptides with PTMs. While the performance of these new methods is promising, the identification results vary greatly with respect to the quality of tandem mass spectra and the number of PTMs in peptides. This motivates us to systematically study the relationship between the performance of open search methods and the quality parameters of tandem mass spectrometry data as well as the number of PTMs in peptides. In this paper, we have proposed an analytical model derived from simulated data to describe the relationship between the probability of obtaining correct results and the spectrum quality as well as the number of PTMs. The proposed model is verified using 1,464,146 real experimental spectra. The consistent trend observed in both simulated data and real data reveals the necessary conditions to effectively apply open search methods. Source code of our study is available at http://bioinformatics.ust.hk/PST.html.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Simulation
  • Databases, Protein
  • Peptides / chemistry*
  • Protein Processing, Post-Translational*
  • Proteomics / methods*
  • Tandem Mass Spectrometry

Substances

  • Peptides