On the beta-binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics

Bioinformatics. 2010 Feb 1;26(3):363-9. doi: 10.1093/bioinformatics/btp677. Epub 2009 Dec 9.

Abstract

Motivation: Spectral count data generated from label-free tandem mass spectrometry-based proteomic experiments can be used to quantify protein's abundances reliably. Comparing spectral count data from different sample groups such as control and disease is an essential step in statistical analysis for the determination of altered protein level and biomarker discovery. The Fisher's exact test, the G-test, the t-test and the local-pooled-error technique (LPE) are commonly used for differential analysis of spectral count data. However, our initial experiments in two cancer studies show that the current methods are unable to declare at 95% confidence level a number of protein markers that have been judged to be differential on the basis of the biology of the disease and the spectral count numbers. A shortcoming of these tests is that they do not take into account within- and between-sample variations together. Hence, our aim is to improve upon existing techniques by incorporating both the within- and between-sample variations.

Result: We propose to use the beta-binomial distribution to test the significance of differential protein abundances expressed in spectral counts in label-free mass spectrometry-based proteomics. The beta-binomial test naturally normalizes for total sample count. Experimental results show that the beta-binomial test performs favorably in comparison with other methods on several datasets in terms of both true detection rate and false positive rate. In addition, it can be applied for experiments with one or more replicates, and for multiple condition comparisons. Finally, we have implemented a software package for parameter estimation of two beta-binomial models and the associated statistical tests.

Availability and implementation: A software package implemented in R is freely available for download at http://www.oncoproteomics.nl/.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Protein
  • Models, Statistical*
  • Proteins / chemistry*
  • Proteome / analysis*
  • Proteomics / methods*
  • Sequence Analysis, Protein / methods
  • Tandem Mass Spectrometry / methods*

Substances

  • Proteins
  • Proteome