Abstract
A systematic computational analysis of protein sequences containing known nuclear domains led to the identification of 28 novel domain families. This represents a 26% increase in the starting set of 107 known nuclear domain families used for the analysis. Most of the novel domains are present in all major eukaryotic lineages, but 3 are species specific. For about 500 of the 1200 proteins that contain these new domains, nuclear localization could be inferred, and for 700, additional features could be predicted. For example, we identified a new domain, likely to have a role downstream of the unfolded protein response; a nematode-specific signalling domain; and a widespread domain, likely to be a noncatalytic homolog of ubiquitin-conjugating enzymes.
Publication types
-
Letter
-
Research Support, Non-U.S. Gov't
-
Research Support, U.S. Gov't, Non-P.H.S.
-
Research Support, U.S. Gov't, P.H.S.
MeSH terms
-
Amidohydrolases / classification
-
Amidohydrolases / physiology
-
Amino Acid Motifs / physiology
-
Amino Acid Sequence
-
Animals
-
Caenorhabditis elegans / chemistry
-
Caenorhabditis elegans / physiology
-
Caenorhabditis elegans Proteins / physiology
-
Cell Nucleus / chemistry*
-
Cell Nucleus / physiology*
-
Databases, Protein
-
Humans
-
Molecular Sequence Data
-
Multigene Family
-
Nuclear Proteins / classification
-
Nuclear Proteins / physiology*
-
Peptide-N4-(N-acetyl-beta-glucosaminyl) Asparagine Amidase
-
Phylogeny
-
Protein Structure, Tertiary / physiology
-
Species Specificity
Substances
-
Caenorhabditis elegans Proteins
-
Nuclear Proteins
-
Amidohydrolases
-
Peptide-N4-(N-acetyl-beta-glucosaminyl) Asparagine Amidase