On the loss landscape of deep neural networks

dc.contributor.advisorWand, Michael
dc.contributor.authorMehmeti-Göpel, Christian Heinrich Xhemal Ali
dc.date.accessioned2025-01-30T10:45:53Z
dc.date.available2025-01-30T10:45:53Z
dc.date.issued2025
dc.description.abstractAs a non-convex optimization problem, the training of deep neural networks remains poorly understood, and its success critically depends on the exact network architecture used. While the amount of new network architectures proposed in the last decade is staggering, only a handful of common patterns emerged that are shared by most successful architectures. First, using the smoothness of the optimization landscape as a heuristic for trainability, we investigate what network components render training difficult and how these patterns help alleviate such difficulties. We find that while giving networks their expressivity, deep stacks of nonlinear layers significantly increase the roughness of the optimization landscape as network depth increases. Developing prior work, we quantify this effect and show that for networks at initialization, the strength of this effect depends on the smoothness of the nonlinear layer used. We then demonstrate how residual connections and multi-path architectures reduce high frequencies in the optimization landscape, resulting in increased trainability. Second, we found that normalization layers combined with an adequate warm-up scheme compensate for the increasing roughness in lower layers by dynamically re-scaling the layer-wise gradients. We prove that in a properly normalized network, all layer-wise effective learning speeds align over time, compensating for even exponentially exploding gradients at initialization. Finally, we conduct an empirical study to determine the necessary nonlinear depth of a network to generalize effectively on common deep learning tasks. Surprisingly, we find that a shallow network extracted after training significantly outperforms a comparably shallow network trained from scratch, although their expressivity is exactly the same. We also observe that ensembles of both shallow and deep paths outperform comparable networks comprised of only deep paths, even when extracted after training. Using these insights, we aim to gain a deeper understanding of how to design deep neural networks with high trainability and strong generalization properties.en_GB
dc.identifier.doihttp://doi.org/10.25358/openscience-11249
dc.identifier.urihttps://openscience.ub.uni-mainz.de/handle/20.500.12030/11270
dc.identifier.urnurn:nbn:de:hebis:77-openscience-44649e00-7cb1-4a2a-95b6-24cdf6b9f4c06
dc.language.isoengde
dc.rightsCC-BY-SA-4.0*
dc.rights.urihttps://creativecommons.org/licenses/by-sa/4.0/*
dc.subject.ddc004 Informatikde_DE
dc.subject.ddc004 Data processingen_GB
dc.titleOn the loss landscape of deep neural networksen_GB
dc.typeDissertationde
jgu.date.accepted2025-01-15
jgu.description.extentx, 114, 2 Seiten ; Illustrationen, Diagrammede
jgu.organisation.departmentFB 08 Physik, Mathematik u. Informatikde
jgu.organisation.nameJohannes Gutenberg-Universität Mainz
jgu.organisation.number7940
jgu.organisation.placeMainz
jgu.organisation.rorhttps://ror.org/023b0x485
jgu.organisation.year2024
jgu.rights.accessrightsopenAccess
jgu.subject.ddccode004de
jgu.type.dinitypePhDThesisen_GB
jgu.type.resourceTextde
jgu.type.versionOriginal workde

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
on_the_loss_landscape_of_deep-20250117153052220.pdf
Size:
9.81 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.57 KB
Format:
Item-specific license agreed upon to submission
Description: