CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data

BMC Genomics. 2014 Jun 3;15(1):423. doi: 10.1186/1471-2164-15-423.

Abstract

Background: miRNAs play a key role in normal physiology and various diseases. miRNA profiling through next generation sequencing (miRNA-seq) has become the main platform for biological research and biomarker discovery. However, analyzing miRNA sequencing data is challenging as it needs significant amount of computational resources and bioinformatics expertise. Several web based analytical tools have been developed but they are limited to processing one or a pair of samples at time and are not suitable for a large scale study. Lack of flexibility and reliability of these web applications are also common issues.

Results: We developed a Comprehensive Analysis Pipeline for microRNA Sequencing data (CAP-miRSeq) that integrates read pre-processing, alignment, mature/precursor/novel miRNA detection and quantification, data visualization, variant detection in miRNA coding region, and more flexible differential expression analysis between experimental conditions. According to computational infrastructure, users can install the package locally or deploy it in Amazon Cloud to run samples sequentially or in parallel for a large number of samples for speedy analyses. In either case, summary and expression reports for all samples are generated for easier quality assessment and downstream analyses. Using well characterized data, we demonstrated the pipeline's superior performances, flexibility, and practical use in research and biomarker discovery.

Conclusions: CAP-miRSeq is a powerful and flexible tool for users to process and analyze miRNA-seq data scalable from a few to hundreds of samples. The results are presented in the convenient way for investigators or analysts to conduct further investigation and discovery.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Carcinoma, Renal Cell / genetics
  • Computational Biology / methods*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Internet
  • MCF-7 Cells
  • MicroRNAs / genetics*
  • Reproducibility of Results
  • Sequence Analysis, RNA / methods*
  • Software
  • User-Computer Interface

Substances

  • MicroRNAs