NCBI Reference Sequence project: update and current status

Kim D Pruitt; Tatiana Tatusova; Donna R Maglott

doi:10.1093/nar/gkg111

NCBI Reference Sequence project: update and current status

Nucleic Acids Res. 2003 Jan 1;31(1):34-7. doi: 10.1093/nar/gkg111.

Authors

Kim D Pruitt¹, Tatiana Tatusova, Donna R Maglott

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A Room 6N605, 8600 Rockville Pike, Bethesda, MD 20894, USA. pruitt@ncbi.nlm.nih.gov

Abstract

The goal of the NCBI Reference Sequence (RefSeq) project is to provide the single best non-redundant and comprehensive collection of naturally occurring biological molecules, representing the central dogma. Nucleotide and protein sequences are explicitly linked on a residue-by-residue basis in this collection. Ideally all molecule types will be available for each well-studied organism, but the initial database collection pragmatically includes only those molecules and organisms that are most readily identified. Thus different amounts of information are available for different organisms at any given time. Furthermore, for some organisms additional intermediate records are provided when the genome sequence is not yet finished. The collection is supplied by NCBI through three distinct pipelines in addition to collaborations with community groups. The collection is curated on an ongoing basis. Additional information about the NCBI RefSeq project is available at http://www.ncbi.nih.gov/RefSeq/.

MeSH terms

Alternative Splicing
Animals
Biotechnology*
Databases, Genetic* / standards
Genomics
Humans
Mice
Proteins / analysis
Pseudogenes
RNA / genetics
Rats
United States

Substances

Proteins
RNA