Please use this identifier to cite or link to this item: http://doi.org/10.25358/openscience-9322
Authors: Vef, Marc-André
Advisor: Brinkmann, André
Title: New techniques for tracing and designing HPC storage systems
Online publication date: 9-Aug-2023
Year of first publication: 2023
Language: english
Abstract: In the domain of high-performance computing (HPC), large (distributed) parallel file systems form the storage backbone of HPC systems to solve complex computational problems. Such HPC systems can include hundreds of thousands of compute nodes backed by hundreds of petabytes of available storage. However, in recent years, file system access patterns have changed as increasingly more data-intensive applications have started using HPC systems to process massive amounts of experimental data. Their access patterns differ from traditional HPC workloads that mainly performed sequential I/O (input/output) operations on large files for which the parallel file systems were initially optimized. Since the file system is shared within the entire HPC system, access patterns of a single data-intensive application can interfere with other users of the HPC system, resulting in reduced I/O throughput, prolonged I/O latencies, and long waiting times. Various storage solutions were proposed to mitigate such issues, including dedicated hardware-based solutions or new software-based I/O interfaces. Nevertheless, these methods can be costly or, depending on the application, challenging to support, particularly if they involve adapting an application's I/O layer to new I/O APIs and semantics. In this dissertation, we will discuss three topics in the realm of these new challenges, focusing on system analytics in parallel file systems, burst buffer file systems, and object store file systems. First, we will provide a detailed investigation and offer insights into the difficulties developers face when aiming to understand the behavior of the GPFS parallel file system. We will propose a new analysis framework, FlexTrace, alleviating various limitations and overheads of existing mechanisms. Next, we will design, implement, and evaluate two novel distributed file systems, GekkoFS and DelveFS, leveraging existing compute node-local storage devices that often remain unused. The GekkoFS distributed burst buffer file system offers a temporary and highly scalable I/O system that is optimized for the above-presented data-intensive HPC applications. GekkoFS can be started ad hoc in seconds, provides the standard I/O interface, and can be easily used exclusively by a single application. Therefore, challenging access patterns are handled by GekkoFS instead of the HPC system's parallel file system. By redefining file system protocols and semantics, the burst buffer file system can considerably outperform the capabilities of parallel file systems if given enough node-local storage resources while still supporting most HPC applications. Lastly, the DelveFS event-driven semantic file system focuses on providing the standard I/O interface for data-intensive HPC applications that need to access scientific data which is only available via the custom APIs of object stores, e.g., OpenIO or Amazon's S3. DelveFS builds on new mechanisms of object stores to provide users with a new interactive way of accessing data, allowing them to define their own views on object store containers that sometimes contain millions of objects. By offering new techniques, DelveFS provides similar I/O throughput compared to an object store's native I/O interface while accelerating certain I/O operations that are generally challenging for object stores to perform efficiently.
DDC: 004 Informatik
004 Data processing
Institution: Johannes Gutenberg-Universität Mainz
Department: FB 08 Physik, Mathematik u. Informatik
Place: Mainz
ROR: https://ror.org/023b0x485
DOI: http://doi.org/10.25358/openscience-9322
URN: urn:nbn:de:hebis:77-openscience-ad7b4dbd-c507-455c-b55a-6d411bc00aa77
Version: Original work
Publication type: Dissertation
License: In Copyright
Information on rights of use: http://rightsstatements.org/vocab/InC/1.0/
Extent: xix, 173 Seiten ; Illustrationen, Diagramme
Appears in collections:JGU-Publikationen

Files in This Item:
  File Description SizeFormat
Thumbnail
new_techniques_for_tracing_an-20230730222449615.pdf2.69 MBAdobe PDFView/Open