On the loss landscape of deep neural networks

Mehmeti-Göpel, Christian Heinrich Xhemal Ali

On the loss landscape of deep neural networks

dc.contributor.advisor	Wand, Michael
dc.contributor.author	Mehmeti-Göpel, Christian Heinrich Xhemal Ali
dc.date.accessioned	2025-01-30T10:45:53Z
dc.date.available	2025-01-30T10:45:53Z
dc.date.issued	2025
dc.description.abstract	As a non-convex optimization problem, the training of deep neural networks remains poorly understood, and its success critically depends on the exact network architecture used. While the amount of new network architectures proposed in the last decade is staggering, only a handful of common patterns emerged that are shared by most successful architectures. First, using the smoothness of the optimization landscape as a heuristic for trainability, we investigate what network components render training difficult and how these patterns help alleviate such difficulties. We find that while giving networks their expressivity, deep stacks of nonlinear layers significantly increase the roughness of the optimization landscape as network depth increases. Developing prior work, we quantify this effect and show that for networks at initialization, the strength of this effect depends on the smoothness of the nonlinear layer used. We then demonstrate how residual connections and multi-path architectures reduce high frequencies in the optimization landscape, resulting in increased trainability. Second, we found that normalization layers combined with an adequate warm-up scheme compensate for the increasing roughness in lower layers by dynamically re-scaling the layer-wise gradients. We prove that in a properly normalized network, all layer-wise effective learning speeds align over time, compensating for even exponentially exploding gradients at initialization. Finally, we conduct an empirical study to determine the necessary nonlinear depth of a network to generalize effectively on common deep learning tasks. Surprisingly, we find that a shallow network extracted after training significantly outperforms a comparably shallow network trained from scratch, although their expressivity is exactly the same. We also observe that ensembles of both shallow and deep paths outperform comparable networks comprised of only deep paths, even when extracted after training. Using these insights, we aim to gain a deeper understanding of how to design deep neural networks with high trainability and strong generalization properties.	en_GB
dc.identifier.doi	http://doi.org/10.25358/openscience-11249
dc.identifier.uri	https://openscience.ub.uni-mainz.de/handle/20.500.12030/11270
dc.identifier.urn	urn:nbn:de:hebis:77-openscience-44649e00-7cb1-4a2a-95b6-24cdf6b9f4c06
dc.language.iso	eng	de
dc.rights	CC-BY-SA-4.0	*
dc.rights.uri	https://creativecommons.org/licenses/by-sa/4.0/	*
dc.subject.ddc	004 Informatik	de_DE
dc.subject.ddc	004 Data processing	en_GB
dc.title	On the loss landscape of deep neural networks	en_GB
dc.type	Dissertation	de
jgu.date.accepted	2025-01-15
jgu.description.extent	x, 114, 2 Seiten ; Illustrationen, Diagramme	de
jgu.organisation.department	FB 08 Physik, Mathematik u. Informatik	de
jgu.organisation.name	Johannes Gutenberg-Universität Mainz
jgu.organisation.number	7940
jgu.organisation.place	Mainz
jgu.organisation.ror	https://ror.org/023b0x485
jgu.organisation.year	2024
jgu.rights.accessrights	openAccess
jgu.subject.ddccode	004	de
jgu.type.dinitype	PhDThesis	en_GB
jgu.type.resource	Text	de
jgu.type.version	Original work	de

Files

Original bundle

Now showing 1 - 1 of 1

Name:: on_the_loss_landscape_of_deep-20250117153052220.pdf
Size:: 9.81 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 3.57 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

JGU-Hochschulschriften