Gutenberg Open Science: Accelerating bioinformatics applications on CUDA-enabled multi-GPU systems

Please use this identifier to cite or link to this item: http://doi.org/10.25358/openscience-9634

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Schmidt, Bertil	-
dc.contributor.advisor	Hildebrandt, Andreas	-
dc.contributor.author	Kobus, Robin	-
dc.date.accessioned	2023-11-16T07:45:13Z	-
dc.date.available	2023-11-16T07:45:13Z	-
dc.date.issued	2023	-
dc.identifier.uri	https://openscience.ub.uni-mainz.de/handle/20.500.12030/9652	-
dc.description.abstract	A wide range of bioinformatics applications have to deal with a continuously growing amount of data generated by high-throughput sequencing techniques. Exclusively CPU-based workstations fail to keep up with the task. Instead of employing dozens of CPU cluster nodes to increase the computational power, massively parallel accelerators like modern CUDA-enabled GPUs can be used to achieve higher throughput and reduce execution times. However, memory capacity of such devices is often limited. Efficient parallelization and data distribution are essential to accelerate performance critical components of bionformatics pipelines like read classification and read mapping. In this thesis we analyze and optimize tasks common to many GPU-based applications in the context of bioinformatics. We study sequence processing, construction and querying of k-mer-based hash tables, segmented sort as well as multi-GPU communication. With these methods we accelerate suffix array construction and metagenomic read classification on CUDA-enabled GPUs by overcoming the aforementioned challenges. By leveraging multiple GPUs, we extend the limited memory available from a single GPU to allow for the construction of larger indices. Our communication library, called Gossip, introduces optimized scatter, gather and all-to-all patterns for multi-GPU systems. Gossip's all-to-all communication pattern is successfully applied to suffix array construction, accelerating it to run in 3.44 s for a full-length human genome on an 8-GPU server, which is faster than previously reported 4.8 seconds achieved by employing 1600 cores on 100 nodes on a CPU-based HPC cluster. Furthermore, we introduce MetaCache-GPU -- an ultra-fast metagenomic short read classifier specifically tailored to fit the characteristics of CUDA-enabled accelerators. Our approach employs a novel hash table variant featuring efficient minhash fingerprinting of reads for locality-sensitive hashing and their rapid insertion using warp-aggregated operations. Our performance evaluation shows that MetaCache-GPU is able to build large reference databases in a matter of seconds, enabling instantaneous operability, while popular CPU-based tools such as Kraken2 require over an hour for index construction on the same data. In the light of an ever-growing number of reference genomes, MetaCache-GPU is the first metagenomic classifier that makes analysis pipelines with on-demand composition of large-scale reference genome sets practical. Although many sub-problems in this thesis are optimized in a specific application context, they also apply to other bioinformatics problems like k-mer counting, sequence alignment and assembly, which would benefit from GPU acceleration. In addition to the insights from this work, we make our source code publicly available to allow for easier adaptation of our methods to related problems.	en_GB
dc.language.iso	eng	de
dc.rights	CC BY-SA	*
dc.rights.uri	https://creativecommons.org/licenses/by-sa/4.0/	*
dc.subject.ddc	004 Informatik	de_DE
dc.subject.ddc	004 Data processing	en_GB
dc.title	Accelerating bioinformatics applications on CUDA-enabled multi-GPU systems	en_GB
dc.type	Dissertation	de
dc.identifier.urn	urn:nbn:de:hebis:77-openscience-a2a40a9f-0eaa-4a33-828c-7db1382d40d90	-
dc.identifier.doi	http://doi.org/10.25358/openscience-9634	-
jgu.type.dinitype	doctoralThesis	en_GB
jgu.type.version	Original work	de
jgu.type.resource	Text	de
jgu.date.accepted	2023-10-16	-
jgu.description.extent	viii, 160 Seiten ; Illustrationen, Diagramme	de
jgu.organisation.department	FB 08 Physik, Mathematik u. Informatik	de
jgu.organisation.number	7940	-
jgu.organisation.name	Johannes Gutenberg-Universität Mainz	-
jgu.rights.accessrights	openAccess	-
jgu.organisation.place	Mainz	-
jgu.subject.ddccode	004	de
jgu.organisation.ror	https://ror.org/023b0x485	-
Appears in collections:	JGU-Publikationen

Files in This Item:

	File	Description	Size	Format
	accelerating_bioinformatics_a-20231022144007790.pdf		4.22 MB	Adobe PDF	View/Open

Show simple item record