Rock, paper, scissors: harnessing complementarity in ortholog detection methods improves comparative genomic inference

M Cyrus Maher; Ryan D Hernandez

doi:10.1534/g3.115.017095

Rock, paper, scissors: harnessing complementarity in ortholog detection methods improves comparative genomic inference

G3 (Bethesda). 2015 Feb 23;5(4):629-38. doi: 10.1534/g3.115.017095.

Authors

M Cyrus Maher¹, Ryan D Hernandez²

Affiliations

¹ Department of Epidemiology and Biostatistics, University of California, San Francisco, University of California, San Francisco, San Francisco, California.
² Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California Institute for Human Genetics, University of California, San Francisco, San Francisco, California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, California 94158 ryan.hernandez@ucsf.edu.

Abstract

Ortholog detection (OD) is a lynchpin of most statistical methods in comparative genomics. This task involves accurately identifying genes across species that descend from a common ancestral sequence. OD methods comprise a wide variety of approaches, each with their own benefits and costs under a variety of evolutionary and practical scenarios. In this article, we examine the proteomes of ten mammals by using four methodologically distinct, rigorously filtered OD methods. In head-to-head comparisons, we find that these algorithms significantly outperform one another for 38-45% of the genes analyzed. We leverage this high complementarity through the development MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization, the first tool for integrating methodologically diverse OD methods. Relative to the four methods examined, MOSAIC more than quintuples the number of alignments for which all species are present while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, this improvement in alignment quality yields more confidently aligned sites and higher levels of overall conservation, while simultaneously detecting of up to 180% more positively selected sites. We close by highlighting a MOSAIC-specific positively selected sites near the active site of TPSAB1, an enzyme linked to asthma, heart disease, and irritable bowel disease. MOSAIC alignments, source code, and full documentation are available at http://pythonhosted.org/bio-MOSAIC.

Keywords: comparative genomics; multiple sequence alignment; open source software; ortholog detection; positive selection.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Animals
Evolution, Molecular
Genomics / methods*
Humans
Internet
Sequence Alignment
User-Computer Interface*

Abstract

Publication types

MeSH terms

Grants and funding