Zusammenfassung für den Vortrag am 15.04.2021 (17:00 Uhr)

Math Machine Learning seminar MPI MIS + UCLA

Stanislav Fort (Stanford University)
Neural Network Loss Landscapes in High Dimensions: Theory Meets Practice

Deep neural networks trained with gradient descent have been extremely successful at learning solutions to a broad suite of difficult problems across a wide range of domains such as vision, gameplay, and natural language, many of which had previously been considered to require intelligence. Despite their tremendous success, we still do not have a detailed, predictive understanding of how these systems work. In this talk, I will focus on recent efforts to understand the structure of deep neural network loss landscapes and how gradient descent navigates them during training. In particular, I will discuss a phenomenological approach to modelling their large-scale structure [1], the role of their nonlinear nature in the early phases of training [2], and its effects on ensembling and calibration. [3,4]

[1] Stanislav Fort, and Stanislaw Jastrzebski. “Large Scale Structure of Neural Network Loss Landscapes.” Advances in Neural Information Processing Systems 32 (NeurIPS 2019). arXiv 1906.04724

[2] Stanislav Fort et al. "Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel". NeurIPS 2020. arXiv 2010.15110

[3] Stanislav Fort, Huiyi Hu, Balaji Lakshminarayanan. "Deep Ensembles: A Loss Landscape Perspective." arXiv 1912.02757

[4] Marton Havasi et al. "Training independent subnetworks for robust prediction". ICLR 2021. arXiv 2010.06610

 

17.04.2021, 02:30