Unsupervised identification of metastable molecular conformations with deep learning methods

Lemcke, Simon

doi:https://doi.org/10.25358/openscience-13426

Unsupervised identification of metastable molecular conformations with deep learning methods

Files

unsupervised_identification_o-20251020103232148396.pdf (18.47 MB)

Date issued

2025

Authors

Lemcke, Simon

Reuse License

Description of rights: CC-BY-4.0

Item

Dissertation

Open Access

Abstract

The rise of compute power over the last decades, best described by Moore’s empirical law, has made it possible to establish simulation as the third pillar of science in between the longstanding pillars of theory and experiment. Investigating systems ’in- silico’ has since then become a wide-spread approach to research, enabling numerical insights on scales not accessible to theory and experiment. In recent years, artificial intelligence and machine learning, specifically deep learning, has emerged as one of the key technologies of the information age, fueled by the abundant availability of computation and data. In this thesis, we show in two case studies that a deep learning approach to dimen- sionality reduction, called EncoderMap, is able to find better, more descriptive col- lective variables in the same amount of dimensions than established linear methods. In the main chapter, we concern ourselves with improving the analysis of simulation data by incorporating this deep learning method. Simulation can be considered as an experiment conducted on a computer that creates a lot of raw data from which insights can only be extracted in a nontrivial manner. This analysis follows an elab- orate modeling pipeline, which consists of multiple steps and algorithms. One of these crucial steps is dimensionality reduction, in which high-dimensional data is mapped into a lower-dimensional space, retaining as much of the important informa- tion as possible and aiming to find descriptive collective variables fit for modeling. We show with a well-studied small peptide, deca-alanine, that the aforementioned deep autoencoder architecture with an additional distance metric - EncoderMap - allows to find collective variables that are at least as good as an established linear method - TICA - in the same amount of dimensions. Connecting results, obtained by simulation, back to experiment is done by identifying metastable states, long- lived structural conformations that are accessible to experiment. We compare these dimensionality reduction methods in their capabilities to find expressive collective variables that allow to find these metastable states. Lastly, as EncoderMap does not make use of the time-series character of the data and works on structure alone, our results hint towards potential applications in combination with algorithms that allow to harvest unordered data fast, e.g. Monte Carlo simulations.

DOI

https://doi.org/10.25358/openscience-13426

URI

https://openscience.ub.uni-mainz.de/handle/20.500.12030/13447

Collections

JGU-Hochschulschriften

Full item page

Unsupervised identification of metastable molecular conformations with deep learning methods

Files

Date issued

Authors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Reuse License

Abstract

DOI

Description

Keywords

Citation

URI

Relationships

Collections

Endorsement

Review

Supplemented By

Referenced By