- Lecturer: Nihat Ay, Juan Pablo Vigneaux
- Prerequisites: Linear algebra, elementary probability theory, basic notions from functional analysis
Statistical learning theory (SLT) is a powerful framework to study the generalisation ability of learning machines (their performance in the context of new data). The corresponding bounds depend on certain capacity measures that quantify the expressivity of the underlying architecture (how many different functions it can represent).
Recent case studies have shown that the high performance of state-of-the-art learning machines, particularly deep neural networks, cannot be explained within the framework of classical SLT. On the one hand, such systems work extremely well in over-parametrised regimes where the classical bounds, being strongly dependent on the number of parameters, become uninformative. On the other hand, some evidence suggests that convergence and generalisation should be controlled by bounds that depend on some norm of the optimal solution, rather than the capacity of the architecture. The control of this norm is believed to be strongly dependent on the optimisation algorithm at hand.