BOA: A partitioned view of genome assembly

Xiaojing An; Priyanka Ghosh; Patrick Keppler; Sureyya Emre Kurt; Sriram Krishnamoorthy; Ponnuswamy Sadayappan; Aravind Sukumaran Rajam; Ümit V Çatalyürek; Ananth Kalyanaraman

doi:10.1016/j.isci.2022.105273

BOA: A partitioned view of genome assembly

iScience. 2022 Oct 8;25(11):105273. doi: 10.1016/j.isci.2022.105273. eCollection 2022 Nov 18.

Authors

Xiaojing An¹, Priyanka Ghosh², Patrick Keppler³, Sureyya Emre Kurt⁴, Sriram Krishnamoorthy⁵, Ponnuswamy Sadayappan⁴, Aravind Sukumaran Rajam³, Ümit V Çatalyürek^{1

6}, Ananth Kalyanaraman³

Affiliations

¹ School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
² National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
³ School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USA.
⁴ School of Computing, University of Utah, Salt Lake City, UT 84112, USA.
⁵ Google, Mountain View, CA 94043, USA.
⁶ Amazon Web Services, Seattle, WA 98109, USA.

Abstract

De novo genome assembly is a fundamental problem in computational molecular biology that aims to reconstruct an unknown genome sequence from a set of short DNA sequences (or reads) obtained from the genome. The relative ordering of the reads along the target genome is not known a priori, which is one of the main contributors to the increased complexity of the assembly process. In this article, with the dual objective of improving assembly quality and exposing a high degree of parallelism, we present a partitioning-based approach. Our framework, BOA (bucket-order-assemble), uses a bucketing alongside graph- and hypergraph-based partitioning techniques to produce a partial ordering of the reads. This partial ordering enables us to divide the read set into disjoint blocks that can be independently assembled in parallel using any state-of-the-art serial assembler of choice. Experimental results show that BOA improves both the overall assembly quality and performance.

Keywords: Algorithms; Bioinformatics; Genomics; High-performance computing in bioinformatics.