Institute
General information about the institute, such as our mission statement, organizational structure, staff directory, history, directions, etc.

See More
Research
Scientific profile with all research groups, topics, collaborations, as well as columns on research at the institute.

See More
News
News and press releases about the institute, as well as a media archive.

See More
- News Overview
- Press Releases
Events
Overview of all events around the institute, such as talks, seminars, lectures, workshops, conferences and public events.

See More
Publications
Overview of all scientific publications of the institute, as well as our preprint and software repositories.

See More
Career
Information on open positions at the institute, benefits of working with us, graduate school, and postdoctoral supervision.

See More

Talk

Thursday, January 18, 2024 17:00

The Geometry of Neural Nets' Parameter Spaces Under Reparametrization

Agustinus Kristiadi (Vector Institute)

Live Stream

Abstract

Model reparametrization, which follows the change-of-variable rule of calculus (not to be confused with weight-space symmetry), is a popular way to improve the training of neural nets, e.g. in WeightNorm. But it can also be problematic since it can induce inconsistencies in, e.g., Hessian-based flatness measures, optimization trajectories, and modes of probability densities. This complicates downstream analyses: e.g. one cannot definitively relate flatness with generalization since arbitrary reparametrization changes their relationship. In this talk, I will present a study of the invariance of neural nets under reparametrization from the perspective of Riemannian geometry. From this point of view, invariance is an inherent property of any neural net if one explicitly represents the metric and uses the correct associated transformation rules. This is important since although the metric is always present, it is often implicitly assumed as identity, and thus dropped from the notation, then lost under reparametrization. I will discuss implications for measuring the flatness of minima, optimization, and for probability-density maximization. As a bonus, I will also give a teaser of our other recent work in exploiting the geometry of preconditioning matrices to develop an inverse-free, structured KFAC-like second-order optimization method for very large, modern neural nets like transformers. The resulting method is thus numerically stable in low precision and also memory efficient.

Links

seminar

5/9/24 6/13/24

Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Katharina Matschke

MPI for Mathematics in the Sciences Contact via Mail

Upcoming Events of This Seminar

May 9, 2024 Achieving equivariance in neural networks with Axel Flinth
May 16, 2024 Conservation Laws for Gradient Flows with Rémi Gribonval
May 23, 2024 Why interpolating neural nets generalize well: recent insights from neural tangent model with Yiqiao Zhong
May 30, 2024 to be announced with Mariya Toneva
Jun 6, 2024 Are activation functions required for learning in all deep networks? with Grigoris Chrysos
Jun 13, 2024 to be announced with Vahid Shahverdi