A stochastic analysis of three viral sequences

Mol Biol Evol. 1992 Jul;9(4):666-77. doi: 10.1093/oxfordjournals.molbev.a040741.

Abstract

This paper analyzes the nucleotide sequences of three viruses: Kunjin, west Nile, and yellow fever. Each virus has one long open reading frame of greater than 10,200 nucleotides that codes for four structural and seven nonstructural genes. The Kunjin and west Nile viruses are the most closely related pair, when assessed on the basis of matches between their nucleotide sequences. As would be expected, the matching is least for bases at third-position codon sites and is greatest for second-position sites. Statistics are presented for the numbers of mismatches that are transitions or transversions. Nucleotide base usage is also reported. To each of the 33 virus-gene segments, nonhomogeneous Markov chain models have been fitted to describe the sequences of nucleotide bases. The models allow for different transition probabilities ("transition" is used in the mathematical sense here) and for different degrees of dependency, at the three sites in the codons. Reasonably satisfactory fits can be obtained for many of the genes by using models that are first order for both first- and second-position sites in the codon but that are second order for third-position sites. One consequence of such a model is that the correlation between one amino acid and the next is limited to the correlation of the last base of the former with the first base of the latter. Other consequences are that the model can (and does) prohibit the occurrence of stop codons within a gene and that subsequences of only first-position bases, or only third-position bases, are also first-order Markov chains. In theory, second-position subsequences may not be Markov chains at all. In practice, the data suggest that each of these subsequences is effectively a zero-order Markov chain, i.e., bases spaced three apart are statistically independent. Stationarity of nucleotide base distributions can be interpreted in either of two ways: (1) spatially along the sites or (2) temporally at each site. These interpretations must often be inconsistent, when the former allows for Markov dependence between adjacent sites whereas the latter assumes independence between sites. The inconsistency can be overcome, for these viruses, if subsequences at different codon positions are analyzed separately.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Base Sequence
  • Chi-Square Distribution
  • Flavivirus / genetics*
  • Models, Genetic
  • RNA, Viral
  • West Nile virus / genetics*
  • Yellow fever virus / genetics*

Substances

  • RNA, Viral