Comparative analysis of codon usage bias and codon context patterns between dipteran and hymenopteran sequenced genomes

PLoS One. 2012;7(8):e43111. doi: 10.1371/journal.pone.0043111. Epub 2012 Aug 17.

Abstract

Background: Codon bias is a phenomenon of non-uniform usage of codons whereas codon context generally refers to sequential pair of codons in a gene. Although genome sequencing of multiple species of dipteran and hymenopteran insects have been completed only a few of these species have been analyzed for codon usage bias.

Methods and principal findings: Here, we use bioinformatics approaches to analyze codon usage bias and codon context patterns in a genome-wide manner among 15 dipteran and 7 hymenopteran insect species. Results show that GAA is the most frequent codon in the dipteran species whereas GAG is the most frequent codon in the hymenopteran species. Data reveals that codons ending with C or G are frequently used in the dipteran genomes whereas codons ending with A or T are frequently used in the hymenopteran genomes. Synonymous codon usage orders (SCUO) vary within genomes in a pattern that seems to be distinct for each species. Based on comparison of 30 one-to-one orthologous genes among 17 species, the fruit fly Drosophila willistoni shows the least codon usage bias whereas the honey bee (Apis mellifera) shows the highest bias. Analysis of codon context patterns of these insects shows that specific codons are frequently used as the 3'- and 5'-context of start and stop codons, respectively.

Conclusions: Codon bias pattern is distinct between dipteran and hymenopteran insects. While codon bias is favored by high GC content of dipteran genomes, high AT content of genes favors biased usage of synonymous codons in the hymenopteran insects. Also, codon context patterns vary among these species largely according to their phylogeny.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Base Composition
  • Base Sequence
  • Cluster Analysis
  • Codon / genetics*
  • Computational Biology / methods
  • Diptera / genetics*
  • Genome, Insect / genetics*
  • Hymenoptera / genetics*
  • Logistic Models
  • Molecular Sequence Data
  • Phylogeny
  • Sequence Analysis, DNA
  • Species Specificity

Substances

  • Codon