A software system for gene sequence database construction

Conf Proc IEEE Eng Med Biol Soc. 2004:2004:2797-800. doi: 10.1109/IEMBS.2004.1403799.

Abstract

We propose a Web-based software system for sequence database construction. An example application of this system is to construct a ribosomal RNA gene (rDNA) sequence database to facilitate the study of microbial communities. A fast and accurate approximate string-matching algorithm is implemented to fetch rDNA sequences sandwiched by two given primers from GenBank. A homology search algorithm based on Basic-Local-Alignment-Search-Tool (BLAST) is then used to extract rDNA sequences that do not contain the primers. This two-step process leads to an rDNA sequence database for a specific taxonomic group. We consider the distance between two given primers, mismatches and degeneracy when performing string matching. In the homology search, a chaining algorithm is combined with BLAST to obtain global alignments based on local alignments. This system can be used in many biological applications.