Summary: One of the distinguishing criteria of the SWISS-PROT protein sequence data bank is minimal redundancy. The introduction of TrEMBL as a supplementary database ensured the comprehensiveness of SWISS-PROT and TrEMBL but introduced some degree of redundancy. We developed a strategy to identify the redundancy present within and between SWISS-PROT and TrEMBL and its subsequent removal.
Availability: The tools mentioned in this paper are available on request.