Genotyping inversions and tandem duplications

Jana Ebler; Alexander Schönhuth; Tobias Marschall

doi:10.1093/bioinformatics/btx020

Genotyping inversions and tandem duplications

Bioinformatics. 2017 Dec 15;33(24):4015-4023. doi: 10.1093/bioinformatics/btx020.

Authors

Jana Ebler¹, Alexander Schönhuth², Tobias Marschall^{1

3}

Affiliations

¹ Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
² Life Sciences Group, Centrum Wiskunde and Informatica, Amsterdam, The Netherlands.
³ Department for Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany.

PMID: 28169394
DOI: 10.1093/bioinformatics/btx020

Abstract

Motivation: Next Generation Sequencing (NGS) has enabled studying structural genomic variants (SVs) such as duplications and inversions in large cohorts. SVs have been shown to play important roles in multiple diseases, including cancer. As costs for NGS continue to decline and variant databases become ever more complete, the relevance of genotyping also SVs from NGS data increases steadily, which is in stark contrast to the lack of tools to do so.

Results: We introduce a novel statistical approach, called DIGTYPER (Duplication and Inversion GenoTYPER), which computes genotype likelihoods for a given inversion or duplication and reports the maximum likelihood genotype. In contrast to purely coverage-based approaches, DIGTYPER uses breakpoint-spanning read pairs as well as split alignments for genotyping, enabling typing also of small events. We tested our approach on simulated and on real data and compared the genotype predictions to those made by DELLY, which discovers SVs and computes genotypes, and SVTyper, a genotyping program used to genotype variants detected by LUMPY. DIGTYPER compares favorable especially for duplications (of all lengths) and for shorter inversions (up to 300 bp). In contrast to DELLY, our approach can genotype SVs from data bases without having to rediscover them.

Availability and implementation: https://bitbucket.org/jana_ebler/digtyper.git.

Contact: t.marschall@mpi-inf.mpg.de.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Chromosome Duplication*
Chromosome Inversion*
Databases, Nucleic Acid
Genomic Structural Variation*
Genotype
Genotyping Techniques / methods*
High-Throughput Nucleotide Sequencing
Humans
Sequence Analysis, DNA
Sequence Deletion
Software