Genome-wide comparative analysis of simple sequence coding repeats among 25 insect species

Gene. 2012 Aug 10;504(2):226-32. doi: 10.1016/j.gene.2012.05.020. Epub 2012 May 23.

Abstract

We present a detailed genome-scale comparative analysis of simple sequence repeats within protein coding regions among 25 insect genomes. The repetitive sequences in the coding regions primarily represented single codon repeats and codon pair repeats. The CAG triplet is highly repetitive in the coding regions of insect genomes. It is frequently paired with the synonymous codon CAA to code for polyglutamine repeats. The codon pairs that are least repetitive code for polyalanine repeats. The frequency of hexanucleotide and dinucleotide motifs of codon pair repeats is significantly (p<0.001) different in the Drosophila species compared to the non-Drosophila species. However, the frequency of synonymous and non-synonymous codon pair repeats varies in a correlated manner (r(2)=0.79) among all the species. Results further show that perfect and imperfect repeats have significant association with the trinucleotide and hexanucleotide coding repeats in most of these insects. However, only select species show significant association between the numbers of perfect/imperfect hexamers and repeat coding for single amino acid/amino acid pair runs. Our data further suggests that genes containing simple sequence coding repeats may be under negative selection as they tend to be poorly conserved across species. The sequences of coding repeats of orthologous genes vary according to the known phylogeny among the species. In conclusion, the study shows that simple sequence coding repeats are important features of genome diversity among insects.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Animals
  • Codon*
  • Genome, Insect*
  • Insecta / genetics*
  • Microsatellite Repeats / genetics
  • Repetitive Sequences, Nucleic Acid*

Substances

  • Codon