The evolution of venoms is the story of how toxins arise and of the processes that generate and maintain their diversity. For animal venoms these processes include recruitment for expression in the venom gland, neofunctionalization, paralogous expansions, and functional divergence. The systematic study of these processes requires the reliable identification of the venom components involved in antagonistic interactions. High-throughput sequencing has the potential of uncovering the entire set of toxins in a given organism, yet the existence of non-venom toxin paralogs and the misleading effects of partial census of the molecular diversity of toxins make necessary to collect complementary evidence to distinguish true toxins from their non-venom paralogs. Here, we analyzed the whole genomes of two scorpions, one spider and one snake, aiming at the identification of the full repertoires of genes encoding toxin-like proteins. We classified the entire set of protein-coding genes into paralogous groups and monotypic genes, identified genes encoding toxin-like proteins based on known toxin families, and quantified their expression in both venom-glands and pooled tissues. Our results confirm that genes encoding toxin-like proteins are part of multigene families, and that these families arise by recruitment events from non-toxin genes followed by limited expansions of the toxin-like protein coding genes. We also show that failing to account for sequence similarity with non-toxin proteins has a considerable misleading effect that can be greatly reduced by comparative transcriptomics. Our study overall contributes to the understanding of the evolutionary dynamics of proteins involved in antagonistic interactions.
© The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.