Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences

BMC Res Notes. 2014 Jan 13:7:34. doi: 10.1186/1756-0500-7-34.

Abstract

Background: Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings.

Results: Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition.

Conclusion: The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence*
  • Computational Biology / methods*
  • Entropy*
  • Herpesvirus 4, Human / genetics
  • Inverted Repeat Sequences
  • MicroRNAs / chemistry
  • MicroRNAs / genetics*
  • Molecular Sequence Data
  • Normal Distribution
  • Nucleic Acid Conformation

Substances

  • MicroRNAs