Improved annotation of C. elegans microRNAs by deep sequencing reveals structures associated with processing by Drosha and Dicer

RNA. 2011 Apr;17(4):563-77. doi: 10.1261/rna.2432311. Epub 2011 Feb 9.

Abstract

MicroRNAs (miRNAs) are small regulatory RNAs that are essential in all studied metazoans. Research has focused on the prediction and identification of novel miRNAs, while little has been done to validate, annotate, and characterize identified miRNAs. Using Illumina sequencing, ∼20 million small RNA sequences were obtained from Caenorhabditis elegans. Of the 175 miRNAs listed on the miRBase database, 106 were validated as deriving from a stem-loop precursor with hallmark characteristics of miRNAs. This result suggests that not all sequences identified as miRNAs belong in this category of small RNAs. Our large data set of validated miRNAs facilitated the determination of general sequence and structural characteristics of miRNAs and miRNA precursors. In contrast to previous observations, we did not observe a preference for the 5' nucleotide of the miRNA to be unpaired compared to the 5' nucleotide of the miRNA*, nor a preference for the miRNA to be on either the 5' or 3' arm of the miRNA precursor stem-loop. We observed that steady-state pools of miRNAs have fairly homogeneous termini, especially at their 5' end. Nearly all mature miRNA-miRNA* duplexes had two nucleotide 3' overhangs, and there was a preference for a uracil in the first and ninth position of the mature miRNA. Finally, we observed that specific nucleotides and structural distortions were overrepresented at certain positions adjacent to Drosha and Dicer cleavage sites. Our study offers a comprehensive data set of C. elegans miRNAs and their precursors that significantly decreases the uncertainty associated with the identity of these molecules in existing databases.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Animals
  • Base Sequence
  • Caenorhabditis elegans / genetics*
  • Caenorhabditis elegans Proteins / metabolism*
  • Computational Biology
  • MicroRNAs / chemistry
  • MicroRNAs / genetics*
  • Molecular Sequence Annotation*
  • RNA Processing, Post-Transcriptional*
  • Ribonuclease III / metabolism*
  • Sequence Analysis, RNA
  • Thermodynamics

Substances

  • Caenorhabditis elegans Proteins
  • MicroRNAs
  • Ribonuclease III
  • drsh-1 protein, C elegans