
Mathematical Machine Learning
Head:
Guido Montúfar (Email)
Phone:
+49 (0) 341 - 9959 - 880
Fax:
+49 (0) 341 - 9959 - 658
Address:
Inselstr. 22
04103 Leipzig
DLT Project
Deep Learning Theory: Geometric Analysis of Capacity, Optimization, and Generalization for Improving Learning in Deep Neural Networks.
Deep Learning is one of the most vibrant areas of contemporary machine learning and one of the most promising approaches to Artificial Intelligence. Deep Learning drives the latest systems for image, text, and audio processing, as well as an increasing number of new technologies. The goal of this project is to advance on key open problems in Deep Learning, specifically regarding the capacity, optimization, and regularization of these algorithms. The idea is to consolidate a theoretical basis that allows us to pin down the inner workings of the present success of Deep Learning and make it more widely applicable, in particular in situations with limited data and challenging problems in reinforcement learning. The approach is based on the geometry of neural networks and exploits innovative mathematics, drawing on information geometry and algebraic statistics. This is a quite timely and unique proposal which holds promise to vastly streamline the progress of Deep Learning into new frontiers.
Main Objective. The main objective of this project is to advance on key open problems in DL, specifically regarding the capacity, optimization, and regularization of these algorithms. The idea is to consolidate a theoretical basis that allows us to further develop the current performance of DL and make it more widely applicable, in particular in situations with limited data and challenging problems in reinforcement learning. The approach is based on the geometry of neural networks and exploits innovative mathematics, drawing on information geometry and algebraic statistics.
1. Capacity
Provide useful descriptions of the types of functions that can and cannot be learned by deep neural networks and structured probability models used in DL practice, and compare these with the possible targets in relevant problems. Questions include the dimension, interpretable classes, approximation errors, relative representational power, and implicit descriptions of the representable functions.
2. Optimization
Provide useful descriptions of the optimization landscapes that appear in DL, covering the role of initialization, unsupervised pretraining, local optimizers, and saddles. This relates to the effective model capacity, which depends on the ability of the optimization procedure to explore the parameter space, and the design of mechanisms to ease local optimization in deep neural networks.
3. Generalization
Elucidate the role of unsupervised training as a tool for regularization and how it can contribute to a better generalization performance. Beside the compositional hypothesis underlying DL, what are other sensible general priors and how can we incorporate them into the learning system?