High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles

Bioinformatics. 2011 Mar 15;27(6):777-84. doi: 10.1093/bioinformatics/btr021. Epub 2011 Jan 13.

Abstract

Motivation: Bacterial type III secreted (T3S) effectors are delivered into host cells specifically via type III secretion systems (T3SSs), which play important roles in the interaction between bacteria and their hosts. Previous computational methods for T3S protein prediction have only achieved limited accuracy, and distinct features for effective T3S protein prediction remain to be identified.

Results: In this work, a distinctive N-terminal position-specific amino acid composition (Aac) feature was identified for T3S proteins. A large portion (∼50%) of T3S proteins exhibit distinct position-specific Aac features that can tolerate position shift. A classifier, BPBAac, was developed and trained using Support Vector Machine (SVM) based on the Aac feature extracted using a Bi-profile Bayes model. We demonstrated that the BPBAac model outperformed other implementations in classification of T3S and non-T3S proteins, giving an average sensitivity of ∼90.97% and an average selectivity of ∼97.42% in a 5-fold cross-validation evaluation. The model was also robust when a small-size training dataset was used. The fact that the position-specific Aac feature is commonly found in T3S proteins across different bacterial species gives this model wide application. To demonstrate the model's application, a genome-wide prediction of T3S effector proteins was performed for Ralstonia solanacearum, an important plant pathogenic bacterium, and a number of putative candidates were identified using this model.

Availability: An R package of BPBAac tool is freely downloadable from: http://biocomputer.bio.cuhk.edu.hk/softwares/BPBAac.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Bacterial Proteins / chemistry*
  • Bacterial Proteins / classification
  • Bacterial Secretion Systems
  • Bayes Theorem
  • Computational Biology / methods*
  • Genome, Bacterial
  • Models, Statistical
  • Ralstonia solanacearum / chemistry*
  • Sequence Analysis, Protein / methods*
  • Software

Substances

  • Bacterial Proteins
  • Bacterial Secretion Systems