MicroRNAs (miRNAs) are post-transcriptional regulators of gene expression. Since the discovery of lin-4, the founding member of the miRNA family, over 360 miRNAs have been identified for Caenorhabditis elegans (C. elegans). Prediction and validation of targets are essential for elucidation of regulatory functions of these miRNAs. For C. elegans, crosslinking immunoprecipitation (CLIP) has been successfully performed for the identification of target mRNA sequences bound by Argonaute protein ALG-1. In addition, reliable annotation of the 3' untranslated regions (3' UTRs) as well as developmental stage-specific expression profiles for both miRNAs and 3' UTR isoforms are available. By utilizing these data, we developed statistical models and bioinformatics tools for both transcriptome-scale and developmental stage-specific predictions of miRNA binding sites in C. elegans 3' UTRs. In performance evaluation via cross validation on the ALG-1 CLIP data, the models were found to offer major improvements over established algorithms for predicting both seed sites and seedless sites. In particular, our top-ranked predictions have a substantially higher true positive rate, suggesting a much higher likelihood of positive experimental validation. A gene ontology analysis of stage-specific predictions suggests that miRNAs are involved in dynamic regulation of biological functions during C. elegans development. In particular, miRNAs preferentially target genes related to development, cell cycle, trafficking, and cell signaling processes. A database for both transcriptome-scale and stage-specific predictions and software for implementing the prediction models are available through the Sfold web server at http://sfold.wadsworth.org.
Keywords: GO analysis; developmental stage; microRNA; prediction; target binding site.