Mass spectrometry allows direct identification of proteins in large genomes

B Küster; P Mortensen; J S Andersen; M Mann

doi:10.1002/1615-9861(200104)1:5<641::AID-PROT641>3.0.CO;2-R

Mass spectrometry allows direct identification of proteins in large genomes

Proteomics. 2001 May;1(5):641-50. doi: 10.1002/1615-9861(200104)1:5<641::AID-PROT641>3.0.CO;2-R.

Authors

B Küster¹, P Mortensen, J S Andersen, M Mann

Affiliation

¹ Protein Interaction Laboratory (PIL), University of Southern Denmark, Odense M, Denmark. MDS-Proteomics, Odense M, Denmark.

PMID: 11678034
DOI: 10.1002/1615-9861(200104)1:5<641::AID-PROT641>3.0.CO;2-R

Abstract

Proteome projects seek to provide systematic functional analysis of the genes uncovered by genome sequencing initiatives. Mass spectrometric protein identification is a key requirement in these studies but to date, database searching tools rely on the availability of protein sequences derived from full length cDNA, expressed sequence tags or predicted open reading frames (ORFs) from genomic sequences. We demonstrate here that proteins can be identified directly in large genomic databases using peptide sequence tags obtained by tandem mass spectrometry. On the background of vast amounts of noncoding DNA sequence, identified peptides localize coding sequences (exons) in a confined region of the genome, which contains the cognate gene. The approach does not require prior information about putative ORFs as predicted by computerized gene finding algorithms. The method scales to the complete human genome and allows identification, mapping, cloning and assistance in gene prediction of any protein for which minimal mass spectrometric information can be obtained. Several novel proteins from Arabidopsis thaliana and human have been discovered in this way.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Arabidopsis / chemistry
Arabidopsis / genetics
Arabidopsis Proteins / analysis
Base Sequence
Databases, Genetic
Genes, Plant
Genome*
Genome, Human
Genome, Plant
Humans
Molecular Sequence Data
Peptides / analysis*
Proteins / analysis*
Proteome*
Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / methods*

Substances

Arabidopsis Proteins
Peptides
Proteins
Proteome