Events
Deep Learning Theory Kickoff Meeting

MPI für Mathematik in den Naturwissenschaften Leipzig

E1 05 (Leibniz-Saal)

conference

27.03.19 29.03.19

Deep Learning Theory Kickoff Meeting

conference

27.03.19 29.03.19

Deep Learning Theory Kickoff Meeting

This meeting aims to discuss mathematical topics in machine learning and deep learning and to kickoff the ERC project Deep Learning Theory at MPI MIS.

Deep Learning Theory: Geometric Analysis of Capacity, Optimization, and Generalization for Improving Learning in Deep Neural Networks.

Deep Learning is one of the most vibrant areas of contemporary machine learning and one of the most promising approaches to Artificial Intelligence. Deep Learning drives the latest systems for image, text, and audio processing, as well as an increasing number of new technologies. The goal of this project is to advance on key open problems in Deep Learning, specifically regarding the capacity, optimization, and regularization of these algorithms. The idea is to consolidate a theoretical basis that allows us to pin down the inner workings of the present success of Deep Learning and make it more widely applicable, in particular in situations with limited data and challenging problems in reinforcement learning. The approach is based on the geometry of neural networks and exploits innovative mathematics, drawing on information geometry and algebraic statistics. This is a quite timely and unique proposal which holds promise to vastly streamline the progress of Deep Learning into new frontiers.

Please see also the poster overview at Poster Session & Coffee & Tee

Speakers

Michael Arbel

Gatsby Computational Neuroscience Unit, University College London

Nihat Ay

Max Planck Institute for Mathematics in the Sciences

Pradeep Banerjee

Max Planck Institute for Mathematics in the Sciences

Eliana Duarte

Max Planck Institute for Mathematics in the Sciences

Yonatan Dukler

UCLA, Department of Mathematics

Asja Fischer

Ruhr-Universität Bochum

Tim Genewein

DeepMind London

Frederik Künstner

École Polytechnique Fédérale de Lausanne

Wuchen Li

UCLA, Department of Mathematics

Luigi Malagò

Romanian Institute of Science and Technology - RIST, Cluj-Napoca

Grégoire Montavon

Machine Learning, Technische Universität Berlin

Razvan Pascanu

DeepMind London

Johannes Rauh

Max Planck Institute for Mathematics in the Sciences

Nico Scherf

Max Planck Institute for Human Cognitive and Brain Sciences

Ingo Steinwart

Universität Stuttgart

Maurice Weiler

Machine Learning Lab, University of Amsterdam

Program

08:30 - 09:00	Registration / Coffee & Tee
09:00 - 09:15	Bernd Sturmfels (Max Planck Institute for Mathematics in the Sciences) Welcome Address (Bernd Sturmfels)
09:15 - 09:45	Guido Montúfar (Max Planck Institute for Mathematics in the Sciences) Introduction Deep Learning Theory Video (720p) Video (1080p)
09:45 - 10:00	Coffee & Tee
10:00 - 11:00	Nihat Ay (Max Planck Institute for Mathematics in the Sciences) On the Natural Gradient for Deep Learning The natural gradient method is one of the most prominent information-geometric methods within the field of machine learning. It was proposed by Amari in 1998 and uses the Fisher-Rao metric as Riemannian metric for the definition of a gradient within optimisation tasks. Since then it proved to be extremely efficient in the context of neural networks, reinforcement learning, and robotics. In recent years, attempts have been made to apply the natural gradient method for training deep neural networks. However, due to the huge number of parameters of such networks, the method is currently not directly applicable in this context. In my presentation, I outline ways to simplify the natural gradient for deep learning. Corresponding simplifications are related to the locality of learning associated with the underlying network structure.
11:00 - 12:00	Tim Genewein (DeepMind London) Neural Network Compression - model-capacity and parameter redundancy of neural networks Modern deep neural networks were recently shown to have surprisingly high capacity for memorization of random labels. On the other hand it is well known in the field of neural network compression that networks trained on classification tasks with non-random labels often have significant parameter redundancy which can be effectively "compressed". Understanding this discrepancy from a theoretical viewpoint is an important open question. The aim of this talk is to introduce some modern neural network compression methods, in particular Bayesian approaches to neural network compression. The latter have some interesting theoretical properties which are also observed in practice - for instance effective capacity regularization during training, thus effectively removing the potential to fit large sets of randomly labelled data points. Video (720p) Video (1080p)
12:00 - 13:00	Lunch Buffet at the Institute
13:00 - 14:00	Ingo Steinwart (Universität Stuttgart) A Sober Look at Neural Network Initializations Initializing the weights and the biases is a key part of the training process of a neural network. Unlike the subsequent optimization phase, however, the initialization phase has gained only limited attention in the literature. In the first part of the talk, I will discuss some consequences of commonly used initialization strategies for vanilla DNNs with ReLU activations. Based on these insights I will then introduce an alternative initialization strategy, and finally I will present some large scale experiments assessing the quality of the new initialization strategy. Video (720p) Video (1080p)
14:00 - 15:00	Grégoire Montavon (Machine Learning, Technische Universität Berlin) Explaining the Decisions of Deep Neural Networks ML models such as deep neural networks (DNNs) are capable of producing complex real-world predictions. In order to get insight into the workings of the model and verify that the model is not overfitting the data, it is often desirable to explain its predictions. For linear and mildly nonlinear models, simple techniques based on Taylor expansions can be used, however, for highly nonlinear DNN models, the task of explanation becomes more difficult. In this talk, we first discuss some motivations for explaining predictions, and specific challenges in producing them. We then introduce the LRP technique which explains by reverse-propagating the prediction in the network through a set of engineered propagation rules. The reverse propagation procedure can be interpreted as a ‘deep Taylor decomposition’ where the explanation is the outcome of a sequence of Taylor expansions performed at each layer of the DNN model. Video (720p) Video (1080p)
15:00 - 15:30	Coffee & Tee
15:30 - 16:30	Frederik Künstner (École Polytechnique Fédérale de Lausanne) Limitations of the Empirical Fisher Approximation Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, has recently received attention as a way to capture partial second-order information. Several works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We caution against this argument by discussing the limitations of the empirical Fisher, showing that—unlike the Fisher— it does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that the pathologies of the empirical Fisher can have undesirable effects. This leaves open the question as to why methods based on the empirical Fisher have been shown to outperform gradient descent in some settings. As a step towards understanding this effect, we show that methods based on the empirical Fisher can be interpreted as a way to adapt the descent direction to the variance of the gradients. Video (720p) Video (1080p)
16:30 - 18:00	Poster Session & Coffee & Tee 3264 Conics in a SecondPaul BreidingMax Planck Institute for Mathematics in the Sciences (Leipzig), GermanyJoint work: Paul Breiding, Bernd Sturmfels and Sascha Timme. In 1848 Jakob Steiner asked "How many conics are tangent to five conics?" In 2019 we ask "Which conics are tangent to your five conics?" The answer is at juliahomotopycontinuation.org/do-it-yourself/Reconstructions by Variational AutoEncoders as a Defense Strategy against Adversarial ExamplesPetru HlihorRomanian Institute of Science and Technology (Cluj-Napoca), RomaniaAdversarial Examples for classification tasks are inputs provided to a machine learning model, specifically designed to produce a wrong classification. They are usually obtained by malicious perturbations of a sample in a dataset, which are difficult to be recognized even by a human. In this poster we study the use of Variational AutoEncoders to preprocess images before classification, as a strategy to defend against adversarial examples. In our preliminary experiments we show that by reconstructing images with a Variational AutoEncoder, the accuracy of the classifier improves significantly, even against some of the most powerful attacks in the literature. As opposed to regular autoencoders, previously proposed in the literature as a defense mechanism, the presence of a stochastic layer plays a key role in the defense, which is not trivial to be circumvented by an attacker.Multiagent Deep Reinforcement Learning for Market MakingPankaj KumarCopenhagen Business School (Frederiksberg), DenmarkMarket Making is high frequency trading strategy in which an agent provides liquidity simultaneously quoting a bid (buy) price and an ask (sell) price on an asset. Market Makers reaps profits in the form of the spread between the quoted price placed on the buy and sell prices. Due to complexity in inventory risk, counterparties to trades and information asymmetry, understating of market making algorithms is relatively unexplored by academicians. Quite a few body of literature, in particular in single deep reinforcement learning (DRL), has studied the problem of optimal execution and prediction market. The success of such single DRL’s can be accredited to the use of experience replay memories, which legitimate Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, outmost care is required in multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become obsolete when agents update their policies in parallel Motivated by above, in this talk, I will introduce a novel reformulation of the multi-agent deep reinforcement learning (MA-DRL) simulation framework for market making, which allows many agents interactions without fail. Using simple state reformulation of multi-agent like image, innovative multi-agent training and agent ambiguity, convolution neural network for the Q-value function approximation is used to learn distributed multi-agent policies. This approach alleviates convergence, non-stationarity training, and scalability issues encountered in the literature for multi-agent systems. Also, the market maker agents successfully reproduce stylized facts in historical trade data from each simulation.Interventional Markov Equivalence for Mixed Graph ModelsLiam SolusKTH Royal Institute of Technology (Stockholm), SwedenWe will discuss the problem of characterizing Markov equivalence of graphical models under general interventions. Recently, Yang et al. (2018) gave a graphical characterization of interventional Markov equivalence for DAG models that relates to the global Markov properties of DAGs. Based on this, we extend the notion of interventional Markov equivalence using global Markov properties of loopless mixed graphs and generalize their graphical characterization to ancestral graphs. On the other hand, we also extend the notion of interventional Markov equivalence via modifications of factors of distributions Markov to acyclic directed mixed graphs. We prove these two generalizations coincide at their intersection; i.e., for directed ancestral graphs. This yields a graphical characterization of interventional Markov equivalence for causal models that incorporate latent confounders and selection variables under assumptions on the intervention targets that are reasonable for biological applications.Learning Latent Representations for Audio Signals through Variational AutoencodersCsongor-Huba VaradyRomanian Institute of Science and Technology (Cluj Napoca), RomaniaIn this short paper we are interested in exploring generative models for audio signals, with a particular focus on signal reconstruction and learning explainable latent representations. Similarly to the work of Engel et al., we consider generative models characterized by a Wavenet decoder, which produce in output an autoregressive model conditioned on the past signal as well as on the latent representation. The main contribution of our work consists in the proposal of an architecture based on Variational AutoEncoders, which allow us to define an approximate posterior able to explicitly capture the time dependence of the latent encoding over time. Moreover, the possibility to introduce variational bounds for the training of the model could possibly lead to disentangled representations for audio signals, and thus the learning of latent encoding easier to be interpreted.
18:00 - 18:30	Discussions
19:00 - 22:00	Dinner (together in Thueringer Hof, Burgstraße 19, 04109 Leipzig), departure from MPI main entrance at 18:30, 20 min walk

09:00 - 10:00	Coffee & Tee
10:00 - 11:00	Razvan Pascanu (DeepMind London) Looking at data efficiency in RL Deep Reinforcement Learning (DRL), while providing some impressive results (e.g. on Atari, Go, etc.), is notoriously data inefficient. This is partially due to the function approximators used (deep networks) but also due to the weak learning signal (based on observing rewards). This talk will focus on the potential role transfer learning could play in DRL for improving data efficiency. In particular the core of the talk will be centered around the different uses of the KL-regularized RL formulation explored in recent works (e.g. https://arxiv.org/abs/1707.04175, https://arxiv.org/abs/1806.01780, https://openreview.net/forum?id=S1lqMn05Ym). Time permitting, I will extend the discussion to focus on some work in progress observation about learning dynamics of neural networks (particularly in RL) and how to exploit the piecewise structure of the neural network (particularly the folding of the space) for efficiently learn generative models. Video (720p) Video (1080p)
11:00 - 12:00	Maurice Weiler (Machine Learning Lab, University of Amsterdam) Gauge Equivariant Convolutional Networks The idea of equivariance to symmetry transformations provides one of the first theoretically grounded principles for neural network architecture design. Equivariant networks have shown excellent performance and data efficiency on vision and medical imaging problems that exhibit symmetries. We extend this principle beyond global symmetries to local gauge transformations, thereby enabling the development of equivariant convolutional networks on general manifolds. We show that gauge equivariant convolutional networks give a unified description of equivariant and geometric deep learning by deriving a wide range of models as special cases of our theory. To illustrate our theory on a simple example and highlight the interplay between local and global symmetries we discuss an implementation for signals defined on the icosahedron, which provides a reasonable approximation of spherical signals. We evaluate the Icosahedral CNN on omnidirectional image segmentation and climate pattern segmentation, and find that it outperforms previous methods. Video (720p) Video (1080p)
12:00 - 13:00	Lunch Buffet at the Institute
13:00 - 14:00	Johannes Rauh (Max Planck Institute for Mathematics in the Sciences) Synergy, redundancy and unique information New information measures are needed to analyze how information is distributed over a complex system (such as a deep neural network). In 2010, Williams and Beer presented the idea of a general information decomposition framework to organize such measures. So far, the framework is missing a generally accepted realization. The talk discusses the current status of the Williams and Beer program. Video (720p) Video (1080p)
14:00 - 15:00	Michael Arbel (Gatsby Computational Neuroscience Unit, University College London) Kernel Distances for Deep Generative Models Generative adversarial networks (GANs) achieve state-of-the-art performance for generating high quality images. Key to GAN performance is the critic, which learns to discriminate between real and artificially generated images. Various divergence families have been proposed for such critics, including f-divergences (the f-gan family) and integral probability metrics (the Wasserstein and MMD GANs). In recent GAN training approaches, these critic divergence measures have been learned using gradient regularisation strategies, which have contributed significantly to their success. In this talk, we will introduce and analyze a data-adaptive gradient gradient penalty as a critic regularizer for the MMD GAN. We propose a method to constrain the gradient analytically and relate it to the weak continuity of a distributional loss functional. We also demonstrate experimentally that such a regularized functional improves on the existing state of the art methods for unsupervised image generation on CelebA and ImageNet. Based on joint work with Dougal Sutherland, Mikołaj Bińkowski, and Arthur Gretton. Video (720p) Video (1080p)
15:00 - 16:00	Poster Session & Coffee & Tee 3264 Conics in a SecondPaul BreidingMax Planck Institute for Mathematics in the Sciences (Leipzig), GermanyJoint work: Paul Breiding, Bernd Sturmfels and Sascha Timme. In 1848 Jakob Steiner asked "How many conics are tangent to five conics?" In 2019 we ask "Which conics are tangent to your five conics?" The answer is at juliahomotopycontinuation.org/do-it-yourself/Reconstructions by Variational AutoEncoders as a Defense Strategy against Adversarial ExamplesPetru HlihorRomanian Institute of Science and Technology (Cluj-Napoca), RomaniaAdversarial Examples for classification tasks are inputs provided to a machine learning model, specifically designed to produce a wrong classification. They are usually obtained by malicious perturbations of a sample in a dataset, which are difficult to be recognized even by a human. In this poster we study the use of Variational AutoEncoders to preprocess images before classification, as a strategy to defend against adversarial examples. In our preliminary experiments we show that by reconstructing images with a Variational AutoEncoder, the accuracy of the classifier improves significantly, even against some of the most powerful attacks in the literature. As opposed to regular autoencoders, previously proposed in the literature as a defense mechanism, the presence of a stochastic layer plays a key role in the defense, which is not trivial to be circumvented by an attacker.Multiagent Deep Reinforcement Learning for Market MakingPankaj KumarCopenhagen Business School (Frederiksberg), DenmarkMarket Making is high frequency trading strategy in which an agent provides liquidity simultaneously quoting a bid (buy) price and an ask (sell) price on an asset. Market Makers reaps profits in the form of the spread between the quoted price placed on the buy and sell prices. Due to complexity in inventory risk, counterparties to trades and information asymmetry, understating of market making algorithms is relatively unexplored by academicians. Quite a few body of literature, in particular in single deep reinforcement learning (DRL), has studied the problem of optimal execution and prediction market. The success of such single DRL’s can be accredited to the use of experience replay memories, which legitimate Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, outmost care is required in multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become obsolete when agents update their policies in parallel Motivated by above, in this talk, I will introduce a novel reformulation of the multi-agent deep reinforcement learning (MA-DRL) simulation framework for market making, which allows many agents interactions without fail. Using simple state reformulation of multi-agent like image, innovative multi-agent training and agent ambiguity, convolution neural network for the Q-value function approximation is used to learn distributed multi-agent policies. This approach alleviates convergence, non-stationarity training, and scalability issues encountered in the literature for multi-agent systems. Also, the market maker agents successfully reproduce stylized facts in historical trade data from each simulation.Interventional Markov Equivalence for Mixed Graph ModelsLiam SolusKTH Royal Institute of Technology (Stockholm), SwedenWe will discuss the problem of characterizing Markov equivalence of graphical models under general interventions. Recently, Yang et al. (2018) gave a graphical characterization of interventional Markov equivalence for DAG models that relates to the global Markov properties of DAGs. Based on this, we extend the notion of interventional Markov equivalence using global Markov properties of loopless mixed graphs and generalize their graphical characterization to ancestral graphs. On the other hand, we also extend the notion of interventional Markov equivalence via modifications of factors of distributions Markov to acyclic directed mixed graphs. We prove these two generalizations coincide at their intersection; i.e., for directed ancestral graphs. This yields a graphical characterization of interventional Markov equivalence for causal models that incorporate latent confounders and selection variables under assumptions on the intervention targets that are reasonable for biological applications.Learning Latent Representations for Audio Signals through Variational AutoencodersCsongor-Huba VaradyRomanian Institute of Science and Technology (Cluj Napoca), RomaniaIn this short paper we are interested in exploring generative models for audio signals, with a particular focus on signal reconstruction and learning explainable latent representations. Similarly to the work of Engel et al., we consider generative models characterized by a Wavenet decoder, which produce in output an autoregressive model conditioned on the past signal as well as on the latent representation. The main contribution of our work consists in the proposal of an architecture based on Variational AutoEncoders, which allow us to define an approximate posterior able to explicitly capture the time dependence of the latent encoding over time. Moreover, the possibility to introduce variational bounds for the training of the model could possibly lead to disentangled representations for audio signals, and thus the learning of latent encoding easier to be interpreted.
16:00 - 17:00	Yonatan Dukler (UCLA, Department of Mathematics) Wasserstein of Wasserstein Loss for Learning Generative Models In this talk we investigate the use of the Wasserstein ground metric in generative models. The Wasserstein distance serves as a loss function for unsupervised learning which depends on the choice of a ground metric on sample space. We propose to use a Wasserstein distance as the ground metric on the sample space of images. This ground metric is known as an effective distance for image retrieval, since it correlates with human perception. We derive the Wasserstein ground metric on image space and define a Riemannian Wasserstein gradient penalty to be used in the Wasserstein Generative Adversarial Network (WGAN) framework. The new gradient penalty is computed efficiently via convolutions on the L^2 (Euclidean) gradients with negligible additional computational cost. The new formulation is more robust to the natural variability of images and provides for a more continuous discriminator in sample space. Video (720p) Video (1080p)
17:00 - 18:00	Poster Session & Coffee & Tee 3264 Conics in a SecondPaul BreidingMax Planck Institute for Mathematics in the Sciences (Leipzig), GermanyJoint work: Paul Breiding, Bernd Sturmfels and Sascha Timme. In 1848 Jakob Steiner asked "How many conics are tangent to five conics?" In 2019 we ask "Which conics are tangent to your five conics?" The answer is at juliahomotopycontinuation.org/do-it-yourself/Reconstructions by Variational AutoEncoders as a Defense Strategy against Adversarial ExamplesPetru HlihorRomanian Institute of Science and Technology (Cluj-Napoca), RomaniaAdversarial Examples for classification tasks are inputs provided to a machine learning model, specifically designed to produce a wrong classification. They are usually obtained by malicious perturbations of a sample in a dataset, which are difficult to be recognized even by a human. In this poster we study the use of Variational AutoEncoders to preprocess images before classification, as a strategy to defend against adversarial examples. In our preliminary experiments we show that by reconstructing images with a Variational AutoEncoder, the accuracy of the classifier improves significantly, even against some of the most powerful attacks in the literature. As opposed to regular autoencoders, previously proposed in the literature as a defense mechanism, the presence of a stochastic layer plays a key role in the defense, which is not trivial to be circumvented by an attacker.Multiagent Deep Reinforcement Learning for Market MakingPankaj KumarCopenhagen Business School (Frederiksberg), DenmarkMarket Making is high frequency trading strategy in which an agent provides liquidity simultaneously quoting a bid (buy) price and an ask (sell) price on an asset. Market Makers reaps profits in the form of the spread between the quoted price placed on the buy and sell prices. Due to complexity in inventory risk, counterparties to trades and information asymmetry, understating of market making algorithms is relatively unexplored by academicians. Quite a few body of literature, in particular in single deep reinforcement learning (DRL), has studied the problem of optimal execution and prediction market. The success of such single DRL’s can be accredited to the use of experience replay memories, which legitimate Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, outmost care is required in multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become obsolete when agents update their policies in parallel Motivated by above, in this talk, I will introduce a novel reformulation of the multi-agent deep reinforcement learning (MA-DRL) simulation framework for market making, which allows many agents interactions without fail. Using simple state reformulation of multi-agent like image, innovative multi-agent training and agent ambiguity, convolution neural network for the Q-value function approximation is used to learn distributed multi-agent policies. This approach alleviates convergence, non-stationarity training, and scalability issues encountered in the literature for multi-agent systems. Also, the market maker agents successfully reproduce stylized facts in historical trade data from each simulation.Interventional Markov Equivalence for Mixed Graph ModelsLiam SolusKTH Royal Institute of Technology (Stockholm), SwedenWe will discuss the problem of characterizing Markov equivalence of graphical models under general interventions. Recently, Yang et al. (2018) gave a graphical characterization of interventional Markov equivalence for DAG models that relates to the global Markov properties of DAGs. Based on this, we extend the notion of interventional Markov equivalence using global Markov properties of loopless mixed graphs and generalize their graphical characterization to ancestral graphs. On the other hand, we also extend the notion of interventional Markov equivalence via modifications of factors of distributions Markov to acyclic directed mixed graphs. We prove these two generalizations coincide at their intersection; i.e., for directed ancestral graphs. This yields a graphical characterization of interventional Markov equivalence for causal models that incorporate latent confounders and selection variables under assumptions on the intervention targets that are reasonable for biological applications.Learning Latent Representations for Audio Signals through Variational AutoencodersCsongor-Huba VaradyRomanian Institute of Science and Technology (Cluj Napoca), RomaniaIn this short paper we are interested in exploring generative models for audio signals, with a particular focus on signal reconstruction and learning explainable latent representations. Similarly to the work of Engel et al., we consider generative models characterized by a Wavenet decoder, which produce in output an autoregressive model conditioned on the past signal as well as on the latent representation. The main contribution of our work consists in the proposal of an architecture based on Variational AutoEncoders, which allow us to define an approximate posterior able to explicitly capture the time dependence of the latent encoding over time. Moreover, the possibility to introduce variational bounds for the training of the model could possibly lead to disentangled representations for audio signals, and thus the learning of latent encoding easier to be interpreted.
19:00 - 22:00	Dinner (individually)

09:00 - 10:00	Coffee & Tee
10:00 - 11:00	Nico Scherf (Max Planck Institute for Human Cognitive and Brain Sciences) On Open Problems for Deep Learning in Biomedical Image Analysis Deep Learning has thoroughly transformed the field of computer vision within the past years. Many standard problems such as image restoration, segmentation or registration, that were based on quite different modelling and optimisation approaches (e.g. PDEs, Markov Random Fields, Random Forests, ...), can now be solved within the framework of Deep Neural Networks with astonishing accuracy and speed (at prediction time). One important advantage of Deep Learning is its ability to capture the often complex statistical dependencies in image data and leverage this information for improving prediction, regression, or classification, given enough annotated data. However, in the biomedical domain, one major limitation is the scarceness of suitably annotated data that rules out a lot of solutions from the computer vision domain. Here, approaches such as manifold learning, generative models, or using deep networks as structural priors are promising directions for weakly supervised or unsupervised learning in biomedical imaging. Another important aspect, in particular for medical image analysis is the interpretability (or the lack thereof) of the fitted model. In this talk I am going to present a selection of problems in biomedical image analysis, that would greatly benefit from Deep Learning approaches, but lack the typically required amount of annotated data. I will focus on examples from high-resolution in-vivo MRI imaging of brain structure, microscopic analysis of anatomical microstructure of the human cortex and large-scale live microscopy for stem cell biology and developmental biology. Video (720p) Video (1080p)
11:00 - 12:00	Wuchen Li (UCLA, Department of Mathematics) Wasserstein Information Geometry Optimal transport (Wasserstein metric) nowadays play important roles in data science. In this talk, we brief review its development and applications in machine learning. In particular, we will focus its induced differential structure. We will introduce the Wasserstein natural gradient in parametric models. The metric tensor in probability density space is pulled back to the one on parameter space. We derive the Wasserstein gradient flows and proximal operator in parameter space. We demonstrate that the Wasserstein natural gradient works efficiently in several statistical machine learning problems, including Boltzmann machine, generative adversary models (GANs) and variational Bayesian statistics. Video (720p) Video (1080p)
12:00 - 13:00	Lunch Buffet at the Institute
13:00 - 14:00	Eliana Duarte (Max Planck Institute for Mathematics in the Sciences) Discrete Statistical Models with Rational Maximum Likelihood Estimator A discrete statistical model is a subset of a probability simplex. Its maximum likelihood estimator (MLE) is a retraction from that simplex onto the model. We characterize all models for which this retraction is a rational function. This is a contribution via real algebraic geometry which rests on results due to Huh and Kapranov on Horn uniformization. We present an algorithm for constructing models with rational MLE, and we demonstrate it on a range of instances. Our focus lies on models like Bayesian networks, decomposable graphical models, and staged trees. Video (720p) Video (1080p)
14:00 - 15:00	Pradeep Banerjee (Max Planck Institute for Mathematics in the Sciences) The Blackwell Information Bottleneck I will talk about a new bottleneck method for learning data representations based on channel deficiency, rather than the more traditional information sufficiency. A variational upper bound allows us to implement this method efficiently. The bound itself is bounded above by the variational information bottleneck objective, and the two methods coincide in the regime of single-shot Monte Carlo approximations. The notion of deficiency provides a principled way of approximating complicated channels by relatively simpler ones. Deficiencies have a rich heritage in the theory of comparison of statistical experiments and have an operational interpretation in terms of the optimal risk gap of decision problems. Experiments demonstrate that the deficiency bottleneck can provide advantages in terms of minimal sufficiency as measured by information bottleneck curves, while retaining a good test performance in a classification task. I will also talk about an unsupervised generalization and relation to variational autoencoders. Finally, I discuss the utility of our method in estimating a quantity called the unique information which quantifies a deviation from the Blackwell order. (Joint work with Guido Montufar, Departments of Mathematics and Statistics, UCLA) Video (720p) Video (1080p)
15:00 - 15:30	Coffee & Tee
15:30 - 16:30	Luigi Malagò (Romanian Institute of Science and Technology - RIST, Cluj-Napoca) On the Information Geometry of Word Embeddings Word embeddings are a set of techniques commonly used in natural language processing to map the words of a dictionary to a real vector space. Such mapping is commonly learned through the contexts of the words in a text corpora, by the estimation of a set of conditional probability distributions - of a context word given the central word - for each word of the dictionary. These conditional probability distributions form a Riemannian statistical manifold, where word analogies can be computed through the comparison between vectors in the tangent bundle of the manifold. In this presentation we introduce a geometric framework for the study of word embeddings in the general setting of Information Geometry, and we show how the choice of the geometry allows to define different expressions for word similarities and word analogies. The presentation is based on a joint work with Riccardo Volpi.
16:30 - 17:00	Concluding Remarks

Participants

Nader Aldoj

Charité - Universitätsmedizin Berlin

Hector Andrade Loarca

Technische Universität Berlin

Michael Arbel

Gatsby Computational Neuroscience Unit, University College London

Nihat Ay

Max Planck Institute for Mathematics in the Sciences

Pradeep Banerjee

Max Planck Institute for Mathematics in the Sciences

Paul Breiding

Max Planck Institute for Mathematics in the Sciences

Felicia Burtscher

Technische Universität Berlin

Goffredo Chirco

Max Planck Institute for Gravitational Physics, Albert Einstein Institute Potsdam

Florio M. Ciaglia

Max Planck Institute for Mathematics in the Sciences

Claus Diem

Universität Leipzig

Eliana Duarte

Max Planck Institute for Mathematics in the Sciences

Yonatan Dukler

UCLA, Department of Mathematics

Christoph Eikemeier

Max Planck Institute for Mathematics in the Sciences

Domenico Felice

Max Planck Institute for Mathematics in the Sciences

Diogo R. Ferreira

IST, University of Lisbon

Asja Fischer

Ruhr-Universität Bochum

Jan Gairing

Ludwig-Maximilians-Universität München

Tim Genewein

DeepMind London

Maximilian Gerwien

University of Applied Sciences, Leipzig

Alex Goeßmann

Technische Universität Berlin

Volker Göhler

TU Bergakademie Freiberg

Christiane Görgen

Max Planck Institute for Mathematics in the Sciences

Paul Görlach

Max Planck Institute for Mathematics in the Sciences

Gaëtan Hadjeres

Sony Computer Science Laboratories, Paris

Petru Hlihor

Romanian Institute of Science and Technology

Danijela Horak

AIG

Andreas Kofler

Charité - Universitätsmedizin Berlin

Pankaj Kumar

Copenhagen Business School

Frederik Künstner

École Polytechnique Fédérale de Lausanne

Christian Lehn

Chemnitz University of Technology

Wuchen Li

UCLA, Department of Mathematics

Luigi Malagò

Romanian Institute of Science and Technology - RIST, Cluj-Napoca

Orlando Marigliano

Max Planck Institute for Mathematics in the Sciences

Jörg Martin

Physikalisch Technische Bundesanstalt

Grégoire Montavon

Machine Learning, Technische Universität Berlin

Guido Montúfar

Max Planck Institute for Mathematics in the Sciences

Johannes Müller

Albert Ludwig University of Freiburg

Dominik Otto

Fraunhofer IZI

Katerina Papagiannouli

Humboldt-Universität zu Berlin

Razvan Pascanu

DeepMind London

Kornelius Podranski

Max Planck Institute for Human Cognitive and Brain Sciences

Johannes Rauh

Max Planck Institute for Mathematics in the Sciences

Yue Ren

Max Planck Institute for Mathematics in the Sciences

Upasana Roy

University of Leipzig

Nico Scherf

Max Planck Institute for Human Cognitive and Brain Sciences

Ekkehard Schnoor

RWTH Aachen

Martin Skrodzki

Freie Universität Berlin

Liam Solus

KTH Royal Institute of Technology

Ingo Steinwart

Universität Stuttgart

Bernd Sturmfels

Max Planck Institute for Mathematics in the Sciences

Omri Tal

Max Planck Institute for Mathematics in the Sciences

Konstantin Thierbach

Max Planck Institute for Human Cognitive and Brain Sciences

Tat Dat Tran

Max Planck Institute for Mathematics in the Sciences

Csongor-Huba Varady

Romanian Institute of Science and Technology

Nathaniel Virgo

Earth-Life Science Institute (ELSI), Tokyo

Julien Vitay

Technische Universität Chemnitz

Christian Wald

Charité - Universitätsmedizin Berlin

Maurice Weiler

Machine Learning Lab, University of Amsterdam

Felix Weiske

University of Applied Sciences, Leipzig

Scientific Organizers

Guido Montúfar

Max Planck Institute for Mathematics in the Sciences

Administrative Contact

Valeria Hünniger

Max-Planck-Institut für Mathematik in den Naturwissenschaften Contact via Mail