Unsupervised identification of metastable molecular conformations with deep learning methods
Loading...
Date issued
Authors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Reuse License
Description of rights: CC-BY-4.0
Abstract
The rise of compute power over the last decades, best described by Moore’s empirical
law, has made it possible to establish simulation as the third pillar of science in
between the longstanding pillars of theory and experiment. Investigating systems ’in-
silico’ has since then become a wide-spread approach to research, enabling numerical
insights on scales not accessible to theory and experiment. In recent years, artificial
intelligence and machine learning, specifically deep learning, has emerged as one
of the key technologies of the information age, fueled by the abundant availability
of computation and data.
In this thesis, we show in two case studies that a deep learning approach to dimen-
sionality reduction, called EncoderMap, is able to find better, more descriptive col-
lective variables in the same amount of dimensions than established linear methods.
In the main chapter, we concern ourselves with improving the analysis of simulation
data by incorporating this deep learning method. Simulation can be considered as
an experiment conducted on a computer that creates a lot of raw data from which
insights can only be extracted in a nontrivial manner. This analysis follows an elab-
orate modeling pipeline, which consists of multiple steps and algorithms. One of
these crucial steps is dimensionality reduction, in which high-dimensional data is
mapped into a lower-dimensional space, retaining as much of the important informa-
tion as possible and aiming to find descriptive collective variables fit for modeling.
We show with a well-studied small peptide, deca-alanine, that the aforementioned
deep autoencoder architecture with an additional distance metric - EncoderMap -
allows to find collective variables that are at least as good as an established linear
method - TICA - in the same amount of dimensions. Connecting results, obtained
by simulation, back to experiment is done by identifying metastable states, long-
lived structural conformations that are accessible to experiment. We compare these
dimensionality reduction methods in their capabilities to find expressive collective
variables that allow to find these metastable states. Lastly, as EncoderMap does
not make use of the time-series character of the data and works on structure alone,
our results hint towards potential applications in combination with algorithms that
allow to harvest unordered data fast, e.g. Monte Carlo simulations.
