Machine learning for the elucidation of multiphase processes and systems
Loading...
Date issued
Authors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Reuse License
Description of rights: InC-1.0
Abstract
Atmospheric chemistry governs many processes in the context of air quality, climate, and human health. It includes, among others, multiphase interactions between gases and condensed phases, affecting the formation, transformation and removal of atmospheric constituents. Understanding multiphase chemistry is therefore essential for accurately describing atmospheric composition and its impacts. In this context, atmospheric aerosols play a particularly important role. They serve as cloud nuclei, transport reactive species, provide reaction surface and impact radiative forcing, the key driver of global warming. Among them, organic aerosols represent an abundant fraction, yet their formation and transformation remains poorly understood, largely contributing to uncertainty in climate and health modelling. The immense diversity of precursor compounds for organic aerosols, coupled with complex environmental conditions and multiphase chemistry, pose a major challenge for laboratory and modelling studies alike. At the same time, advances in measurement and monitoring technologies generate large volumes of atmospheric data. Data-centric methods like machine learning pose a powerful opportunity to use these data to forward the understanding of atmospheric processes and improve parameterizations of Earth system models. This study explores and advances machine learning applications in multiphase chemistry, including compound property prediction, model acceleration, uncertainty quantification, experiment design, and multiscale modelling. The focal points of this thesis can be summarized as follows:
- Advancement of quantitative structure-activity relationship (QSAR) models with novel artificial neural network architectures. This includes the application of pattern-detecting convolutional neural networks on one-hot encoded simplified molecular input line entry system (SMILES) representations of molecular structures to estimate the reduction potentials of atmospherically-relevant quinones. Reduction potentials determine the quinones' reactivity, and therefore their ability to undergo redox-cycling, which leads to the catalytic production of reactive oxygen species (ROS). ROS are likely associated with adverse health effects of air pollution, as they react with biomolecules in the lungs of exposed humans. Trained convolutional neural network models permit the rapid screening of atmospherically-relevant quninones that pose an elevated risk of adverse health effects, without the need of expensive measurements. Another advancement of QSAR methods is the application of graph convolutional neural networks alongside graph representations of molecules to estimate their saturation vapor pressure. Vapor pressures determine the partitioning equilibrium of atmospheric compounds between the condensed and gas phase. Their accurate determination is of high relevance in various fields of atmospheric science, critically affecting the formation and growth of secondary organic aerosol, which account for a substantial mass fraction of tropospheric aerosols. A novel group contribution-assisted, adaptive-depth graph convolutional neural network architecture outperforms existing methods for vapor pressure prediction, even when trained on relatively few experimental data.
- Acceleration of kinetic multilayer process models for mass transport and chemical reactions in aerosols through machine learning surrogate models. While differential equation models are capable of accurately simulating the growth and aging of atmospheric aerosols, their computational expense often poses restrictions on their application in inverse or global atmospheric modelling. Artificial neural network and polynomial chaos expansion surrogate models can be trained on sampling data from differential equation models to reproduce their output. The trained surrogate models show high accuracy in comparison with their reference models, and offer immense acceleration by multiple orders of magnitude. They were successfully used to aid complex differential equation models in inverse modelling tasks, notably reducing the computational cost.
- Development of the Numerical Compass, a computational method for automated uncertainty quantification of process models and experiment design. The Numerical Compass utilizes a fit ensemble, i.e., an ensemble of plausible model solutions consistent with experimental data, to identify combinations of laboratory parameters for experiments that lead to near-optimal model constraints. This enables researchers to use a more target-oriented approach when designing experiments by addressing the major sources of uncertainty in parametric models. The method was thoroughly tested in simulations of the ozonolysis of oleic acid.
- Application of parametric equations and neural network models for the modelling of gas exchange rates of biological soil crusts to advance the calculation of the net primary production and long-term carbon balance of dryland ecosystems. Biological soil crusts, communities of photoautotrophic and heterotrophic organisms form a common ecological feature in dryland areas across the globe and are an important factor for carbon balance and cycling. NN and parametric equation models trained on laboratory data are capable of describing carbon dioxide CO2 gas exchange rates of various types of biological soil crusts based on the prevailing environmental conditions. The models are accurate, versatile and are applied to investigate the effect of ambient CO2 concentrations on biological soil crusts in a follow-up study.