Accuracy of allele frequency estimation using pooled RNA-Seq

Mol Ecol Resour. 2014 Mar;14(2):381-92. doi: 10.1111/1755-0998.12186. Epub 2013 Nov 16.

Abstract

For nonmodel organisms, genome-wide information that describes functionally relevant variation may be obtained by RNA-Seq following de novo transcriptome assembly. While sequencing has become relatively inexpensive, the preparation of a large number of sequencing libraries remains prohibitively expensive for population genetic analyses of nonmodel species. Pooling samples may be then an attractive alternative. To test whether pooled RNA-Seq accurately predicts true allele frequencies, we analysed the liver transcriptomes of 10 bank voles. Each sample was sequenced both as an individually barcoded library and as a part of a pool. Equal amounts of total RNA from each vole were pooled prior to mRNA selection and library construction. Reads were mapped onto the de novo assembled reference transcriptome. High-quality genotypes for individual voles, determined for 23,682 SNPs, provided information on 'true' allele frequencies; allele frequencies estimated from the pool were then compared with these values. 'True' frequencies and those estimated from the pool were highly correlated. Mean relative estimation error was 21% and did not depend on expression level. However, we also observed a minor effect of interindividual variation in gene expression and allele-specific gene expression influencing allele frequency estimation accuracy. Moreover, we observed strong negative relationship between minor allele frequency and relative estimation error. Our results indicate that pooled RNA-Seq exhibits accuracy comparable with pooled genome resequencing, but variation in expression level between individuals should be assessed and accounted for. This should help in taking account the difference in accuracy between conservatively expressed transcripts and these which are variable in expression level.

Keywords: RNA-Seq; accuracy estimation; bank vole; pool; transcriptome.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Arvicolinae / genetics
  • Computational Biology / methods
  • Gene Frequency*
  • Genetics, Population / methods*
  • Liver
  • Sequence Analysis, RNA / methods*
  • Transcriptome*