Informationflow in Deep ReLU Networks

Hartmann, David

doi:https://doi.org/10.25358/openscience-13481

Informationflow in Deep ReLU Networks

Files

Primary informationflow_in_deep_relu_-20251024101647156382.pdf (17.55 MB)

Date issued

2025

Authors

Hartmann, David

Reuse License

Description of rights: CC-BY-SA-4.0

Item

Dissertation

Open Access

Abstract

Deep learning has proven its effectiveness in large parts of the scientific world. Even large-scale applications, especially text-to-image or text-to-text processors with billions of parameters, consist at their core of simple linear algebra, stacked and separated by non-linear functions. One such so-called activation function, Rectified Linear Unit (ReLU), is defined as the maximum of its argument with zero, effectively discretizing space into one of two cases: greater or smaller than zero. These mechanisms; a continuous basis (using linear algebra) and a discrete choice (using ReLU) seem sufficient to induce representations capable of tackling tasks such as Autonomous Driving or passing the Turing Test. This thesis aims to explore the propagation of information in training deep ReLU networks, moving beyond the perspective of a solely continuous optimization process. By switching back and forth between these two ideas, continuous and discrete interpretation of the very same process, this work aims to explore different instances of the same underlying question: How does information flow from the dataset using the learning scheme through a deep network? One way to answer this question is to observe what discrete decisions a deep network implicitly makes during training and inference, leading to one of the key contributions of this work, which is to examine the activation patterns and their changes during training, enabling the analysis of architectural and optimization choices in a unified model of the training process. Using these insights, the thesis introduces ActCooLR, a proof-of-concept learning rate scheduler based on the introduced transition model of activation pattern changes. A second way to approach the question is to adaptively enhance the optimization process by incorporating additional discrete decisions using a stochastic number system during training, and monitoring optimization for this increasing difficulty.

DOI

https://doi.org/10.25358/openscience-13481

URI

https://openscience.ub.uni-mainz.de/handle/20.500.12030/13502

Collections

JGU-Hochschulschriften

Full item page

Informationflow in Deep ReLU Networks

Files

Date issued

Authors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Reuse License

Abstract

DOI

Description

Keywords

Citation

URI

Relationships

Collections

Endorsement

Review

Supplemented By

Referenced By