A new method based on entropy theory for genomic sequence analysis

Acta Biotheor. 2002;50(3):155-65. doi: 10.1023/a:1016587025917.

Abstract

We have refined entropy theory to explore the meaning of the increasing sequence data on nucleic acids and proteins more conveniently. The concept of selection constraint was not introduced, only the analyzed sequences themselves were considered. The refined theory serves as a basis for deriving a method to analyze non-coding regions (NCRs) as well as coding regions. Positions with maximal entropy might play the most important role in genome functions as opposed to positions with minimal entropy. This method was tested in the well-characterized coding regions of 12 strains of Classical Swine Fever Virus (CSFV) and non-coding regions of 20 strains of CSFV. It is suitable to analyze nucleic acid sequences of a complete genome and to detect sensitive positions for mutagenesis. As such, the method serves to formulate the basis for elucidating the functional mechanism.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Bayes Theorem
  • Classical Swine Fever Virus / genetics*
  • Databases, Nucleic Acid
  • Entropy
  • Genome, Viral
  • Mathematical Computing
  • Models, Genetic*
  • Models, Statistical
  • Molecular Sequence Data
  • Ribosomal Proteins / genetics
  • Sequence Analysis, RNA / methods*

Substances

  • Ribosomal Proteins