Validation of picogram- and femtogram-input DNA libraries for microscale metagenomics

PeerJ. 2016 Sep 22:4:e2486. doi: 10.7717/peerj.2486. eCollection 2016.

Abstract

High-throughput sequencing libraries are typically limited by the requirement for nanograms to micrograms of input DNA. This bottleneck impedes the microscale analysis of ecosystems and the exploration of low biomass samples. Current methods for amplifying environmental DNA to bypass this bottleneck introduce considerable bias into metagenomic profiles. Here we describe and validate a simple modification of the Illumina Nextera XT DNA library preparation kit which allows creation of shotgun libraries from sub-nanogram amounts of input DNA. Community composition was reproducible down to 100 fg of input DNA based on analysis of a mock community comprising 54 phylogenetically diverse Bacteria and Archaea. The main technical issues with the low input libraries were a greater potential for contamination, limited DNA complexity which has a direct effect on assembly and binning, and an associated higher percentage of read duplicates. We recommend a lower limit of 1 pg (∼100-1,000 microbial cells) to ensure community composition fidelity, and the inclusion of negative controls to identify reagent-specific contaminants. Applying the approach to marine surface water, pronounced differences were observed between bacterial community profiles of microliter volume samples, which we attribute to biological variation. This result is consistent with expected microscale patchiness in marine communities. We thus envision that our benchmarked, slightly modified low input DNA protocol will be beneficial for microscale and low biomass metagenomics.

Keywords: 100 fg; Illumina; Low biomass; Low input DNA library; Low volume; Marine microheterogeneity; Microscale metagenomics; Nextera XT; Picogram; Reagent contamination.

Grants and funding

This work was primarily funded by the Gordon and Betty Moore Foundation (Grant ID: GBMF3801). PH was also supported by an Australian Research Council Laureate Fellowship (FL150100038), and BJW and GWT by the Genomic Science Program of the United States Department of Energy Office of Biological and Environmental Research, grant DE-SC0004632. BJW was also supported by Australian Research Council Discovery Early Career Research Award #DE160100248. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.