Plants produce a vast array of specialized metabolites, many of which are used as pharmaceuticals, flavors, fragrances, and other high-value fine chemicals. However, most of these compounds occur in non-model plants for which genomic sequence information is not yet available. The production of a large amount of nucleotide sequence data using next-generation technologies is now relatively fast and cost-effective, especially when using the latest Roche-454 and Illumina sequencers with enhanced base-calling accuracy. To investigate specialized metabolite biosynthesis in non-model plants we have established a data-mining framework, employing next-generation sequencing and computational algorithms, to construct and analyze the transcriptomes of 75 non-model plants that produce compounds of interest for biotechnological applications. After sequence assembly an extensive annotation approach was applied to assign functional information to over 800,000 putative transcripts. The annotation is based on direct searches against public databases, including RefSeq and InterPro. Gene Ontology (GO), Enzyme Commission (EC) annotations and associated Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway maps are also collected. As a proof-of-concept, the selection of biosynthetic gene candidates associated with six specialized metabolic pathways is described. A web-based BLAST server has been established to allow public access to assembled transcriptome databases for all 75 plant species of the PhytoMetaSyn Project (www.phytometasyn.ca).
Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.