Institute
General information about the institute, such as our mission statement, organizational structure, staff directory, history, directions, etc.

See More
Research
Scientific profile with all research groups, topics, collaborations, as well as columns on research at the institute.

See More
News
News and press releases about the institute, as well as a media archive.

See More
- News Overview
- Press Releases
Events
Overview of all events around the institute, such as talks, seminars, lectures, workshops, conferences and public events.

See More
Publications
Overview of all scientific publications of the institute, as well as our preprint and software repositories.

See More
Career
Information on open positions at the institute, benefits of working with us, graduate school, and postdoctoral supervision.

See More

Talk

Thursday, March 21, 2024 17:00

Generalization and Stability in Interpolating Neural Networks

Hossein Taheri (UCSB)

Live Stream

Abstract

Neural networks are renowned for their ability to memorize datasets, often achieving near-zero training loss via gradient descent optimization. Despite this capability, they also demonstrate remarkable generalization to new data. This paper delves into studying the generalization behavior of neural networks trained with logistic loss through the lens of algorithmic stability. Our focus lies on the neural tangent regime, where network weights move a constant distance from initialization to solution to achieve minimal training loss. Our main finding reveals that under NTK-separability, optimal test loss bounds are achievable if the network width is at least poly-logarithmically large with respect to the number of training samples. This departure from existing generalization outcomes using algorithmic stability, which typically require polynomial width and yield suboptimal rates, underscores the significance of our approach. Moreover, our analysis presents improved generalization bounds and width lower bounds compared to prior works employing alternative methods such as uniform convergence via Rademacher complexity. The key to this improvement lies in leveraging the Hessian information of the objective function during gradient descent iterates. We demonstrate that neural networks of sufficiently large width trained by the logistic loss satisfy an approximate quasi-convexity property along the gradient descent path. To demonstrate the practical implications of our findings, we specialize our analysis to a XOR dataset, where we present refined width conditions.

Biography: Hossein Taheri received the B.Sc. degree in electrical engineering and mathematics from the Sharif University of Technology, Tehran, Iran, in 2018. He is currently pursuing the Ph.D. degree in electrical and computer engineering under the guidance of Christos Thrampoulidis at the University of California at Santa Barbara. His main area of research is statistical learning and optimization.

Links

seminar

5/9/24 6/13/24

Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Katharina Matschke

MPI for Mathematics in the Sciences Contact via Mail

Upcoming Events of This Seminar

May 9, 2024 Achieving equivariance in neural networks with Axel Flinth
May 16, 2024 Conservation Laws for Gradient Flows with Rémi Gribonval
May 23, 2024 Why interpolating neural nets generalize well: recent insights from neural tangent model with Yiqiao Zhong
May 30, 2024 to be announced with Mariya Toneva
Jun 6, 2024 Are activation functions required for learning in all deep networks? with Grigoris Chrysos
Jun 13, 2024 to be announced with Vahid Shahverdi