Modern methods in bayesian probabilistic modeling and their applications
Date issued
Authors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
License
Abstract
A fundamental task in all fields of science is to learn from observations. However,
undertaking this is usually hindered in two ways: first, the direct observation of the
phenomenon may be challenging or impossible, requiring a model of the phenomenon
and a statistical approach to separate desired from undesired information. Second,
the number of observations may be small, so the resulting uncertainty must be taken
into account.
As a resort, the field of Bayesian modeling provides a canonical framework to
perform statistical inference from data and prior knowledge in a way that allows to
quantify the uncertainty of the results as well. In this approach, probability distributions are used as carriers of information and transformed accordingly. However, in
most cases the required computations can only be performed numerically.
In this work we contribute to Bayesian modeling in several ways: first, we present
the paraNUTS algorithm for parallelized inference that is formulated in the map-
reduce paradigm and achieves considerable speed-ups without significant loss of
inference quality. Next, we present TuringOnline.jl, a software package for inference
in online settings, that also achieves speed-ups while retaining inference quality
to a high degree. Moreover, we present an application of Bayesian modeling to
surface topography analysis that yielded action-guiding findings for the field to
ensure reproducible results from future studies. Finally, we contribute to nowcasting
of infection numbers with our CorCast system that provides the necessary unified
treatment of data and models that is extremely important for practical application.
Although targeted at the Sars-CoV-2 pandemic, the system is designed to be adopted
to other epidemiological modeling easily.
Additionally, the appendices cover research that was not related to Bayesian
modeling: first, a scalable and flexible approach to signal classification in mass
spectrometry raw data using locality-sensitive hashing and second, a machine learning
approach to a classification task in the field of surface topography analysis.