Space-conserving optimal DNA-protein alignment

Proc IEEE Comput Syst Bioinform Conf. 2004:80-8. doi: 10.1109/csb.2004.1332420.

Abstract

DNA-protein alignment algorithms can be used to discover coding sequences in a genomic sequence, if the corresponding protein derivatives are known. They can also be used to identify potential coding sequences of a newly sequenced genome, by using proteins from related species. Previously known algorithms either solve a simplified formulation, or sacrifice optimality to achieve practical implementation. In this paper, we present a comprehensive formulation of the DNA-protein alignment problem, and an algorithm to compute the optimal alignment in O(mn) time using only four tables of size (m + 1) x (n + 1), where m and n are the lengths of the DNA and protein sequences, respectively. We also developed a Protein and DNA Alignment program PanDA that implements the proposed solution. Experimental results indicate that our algorithm produces high quality alignments.

Publication types

  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Base Sequence
  • DNA / genetics*
  • Data Compression / methods
  • Molecular Sequence Data
  • Proteins / genetics*
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins
  • DNA