Conference on Mathematics of Machine Learning

Abstracts for the talks

Yasaman Bahri
Google Brain, USA
Dynamics & Phase Transitions in Wide, Deep Neural Networks


Peter Bartlett
University of California, Berkeley, USA
Benign overfitting


Youness Boutaib
RWTH Aachen University, Germany
Path classification by stochastic linear RNNs


Benjamin Fehrman
University of Oxford, United Kingdom
Convergence rates for stochastic gradient descent algorithms in non-convex loss landscapes


Minh Hà Quang
RIKEN-AIP, Japan
Regularized information geometric and optimal transport distances between covariance operators and Gaussian processes


Andreas Habring
University of Graz, Austria
A Generative Variational Model for Inverse Problems in Imaging


Hanyuan Hang
University of Twente (Enschede), Netherlands
A Combination of Ensemble Methods for Large-Scale Regression


Matthias Hein
University of Tübingen, Germany
Towards neural networks which know when they don't know


Stefanie Jegelka
Massachusetts Institute of Technology, USA
Learning and Generalization in Graph Neural Networks


Sebastian Kassing
WWU Münster, Germany
Convergence of Stochastic Gradient Descent for Analytic Target Functions


Kathlén Kohn
KTH - Royal Institute of Technology, Sweden
The Geometry of Linear Convolutional Networks


Kirandeep Kour
Max Planck Institute for Dynamics of Complex Technical Systems (Magdeburg), Germany
A Low-rank Support Tensor Network


Florent Krzakala
EPFL (Lausanne), Switzerland
Generalization & Overparametrization in Machine Learning: Rigorous Insights from Simple Models
The increasing dimensionality of data in the modern machine learning age presents new challenges and opportunities. The high-dimensional settings allow one to use powerful asymptotic methods from probability theory and statistical physics to obtain precise asymptotic characterizations of the generalization errors and of the benefits of overparametrization. I will present and review some recent works in this direction, and discuss what they teach us in the broader context of generalization, double descent, and over-parameterization in modern machine learning problems.

Věra Kůrková
Czech Academy of Sciences, Czech Republic
Some implications of high-dimensional geometry for neurocomputing


Soon Hoe Lim
Nordita, KTH Royal Institute of Technology and Stockholm University, Sweden
Noisy Recurrent Neural Networks


Matthias Löffler
ETH Zürich (Zurich), Switzerland
AdaBoost and robust one-bit compressed sensing


Luigi Malagò
Transylvanian Institute of Neuroscience (TINS) (Cluj-Napoca), Romania
Lagrangian and Hamiltonian Mechanics for Probabilities on the Statistical Bundle

We provide an Information-Geometric formulation of Classical Mechanics on the Riemannian manifold of probability distributions, which is an affine manifold endowed with a dually-flat connection. In a non-parametric formalism, we consider the full set of positive probability functions on a finite sample space, and we provide a specific expression for the tangent and cotangent spaces over the statistical manifold, in terms of a Hilbert bundle structure that we call the Statistical Bundle. In this setting, we compute velocities and accelerations of a one-dimensional statistical model using the canonical dual pair of parallel transports and define a coherent formalism for Lagrangian and Hamiltonian mechanics on the bundle. Finally, in a series of examples, we show how our formalism provides a consistent framework for accelerated natural gradient dynamics on the probability simplex, paving the way for direct applications in optimization, game theory and neural networks. The work in based on joint collaboration with Goffredo Chirco and Giovanni Pistone arxiv.org/abs/2009.09431

Oxana Manita
Eindhoven University of Technology, Netherlands
Dropout regularization viewed from the large deviations perspective
Dropout regularisation for training neural networks turns out to be very successful in practical applications. The empirical explanation of this success is based on reducing co-adaptation of features during training. Moreover, practicionners observe that 'training with dropout converges not faster, but to a better local minimum'. However, there is hardly any mathematical understanding of these statements. In this talk I want to give a mathematical interpretation of the last statement, discuss a continuous time model of training with dropout and explain why it 'converges to a better local minimum' than in case of a conventional training.

Daniel McKenzie
University of California, Los Angeles, USA
Learning to predict equilibria from data using fixed point networks


Jochen Merker
HTWK Leipzig, Germany
Complexity-reduced data models beyond the classical bias-variance trade-off


Guido Montúfar
Max Planck Institute for Mathematics in the Sciences (Leipzig), Germany, and UCLA ( ), USA
Implicit bias of gradient descent for mean squared error regression with wide neural networks
We investigate gradient descent training of wide neural
networks and the corresponding implicit bias in function space. For
univariate regression, the solution of training a width-n shallow ReLU
network is within n^−1/2 of the function which fits the training data
and whose difference from the initial function has the smallest 2-norm
of the second derivative weighted by a curvature penalty that depends
on the probability distribution that is used to initialize the network
parameters. We compute the curvature penalty function explicitly for
various common initialization procedures. For multivariate regression
we show an analogous result, whereby the second derivative is replaced
by the Radon transform of a fractional Laplacian. For initialization
schemes that yield a constant penalty function, the solutions are
polyharmonic splines. Moreover, we obtain results for different
activations and show that the training trajectories are captured by
trajectories of smoothing splines with decreasing regularization
strength. This is joint work with Hui Jin arxiv.org/abs/2006.07356.

Michael Murray
University of Oxford (Oxford, UK), United Kingdom
Activation Function Design for Deep Networks: Linearity and Effective Initialisation


Burim Ramosaj
TU Dortmund, Germany
Interpretable Machines - Constructing valid Prediction Intervals with Random Forest


Luca Ratti
University of Genoa (Genova), Italy
Learning the optimal regularizer for linear inverse problems


Michael Schmischke
Chemnitz University of Technology, Germany
High-Dimensional Explainable ANOVA Approximation


Stefano Soatto
University of California, Los Angeles, USA
The Information in Optimal Representations


Yuguang Wang
Max Planck Institute for Mathematics in the Sciences (Leipzig), Germany
How framelets enhance graph neural networks


 

Date and Location

August 04 - 07, 2021 (previously planned for February 22 - 25, 2021)
Center for Interdisciplinary Research (ZiF), Bielefeld University
Methoden 1
33615 Bielefeld

Scientific Organizers

Benjamin Gess, MPI for Mathematics in the Sciences & Universität Bielefeld

Guido Montúfar, MPI for Mathematics in the Sciences & UCLA

Nihat Ay
Hamburg University of Technology
Institute of Data Science Foundations

23.08.2021, 09:28