# GAMM AG Workshop Computational and Mathematical Methods in Data Science

## Abstracts for the talks

**Marco Cuturi ***Google Brain & Institut Polytechnique de Paris***Supervised Quantile Normalization for Matrix Factorization using Optimal Transport**

We present in this recent work (https://arxiv.org/pdf/2002.03229.pdf) a recent application of our framework to carry out "soft" sorting and ranking using regularized OT. We expand this framework to include "soft" quantile normalization operators that can be differentiated efficiently, and apply it to the problem of dimensionality reduction: we ask how features can be normalized with a target distribution of quantiles to recover "easy to factorize" matrices. We provide algorithms to do this, as well as empirical evidence of recovery.

**Axel Klawonn ***University of Cologne***Machine learning in adaptive domain decomposition methods**

The convergence rate of domain decomposition methods is generally determined by the eigenvalues of the preconditioned system. For second-order elliptic partial differential equations, coefficient discontinuities with a large contrast can lead to a deterioration of the convergence rate. A remedy can be obtained by enhancing the coarse space with elements, which are often called constraints, that are computed by solving small eigenvalue problems on portions of the interface of the domain decomposition, i.e., edges in two dimensions or faces and edges in three dimensions. In the present work, without restriction of generality, the focus is on two dimensions. In general, it is difficult to predict where these constraints have to be computed, i.e., on which edges. Here, a machine learning based strategy using neural networks is suggested to predict the geometric location of these edges in a preprocessing step. This reduces the number of eigenvalue problems that have to be solved during the iteration. Numerical experiments for model problems and realistic microsections using regular decompositions as well as those from graph partitioners are provided, showing very promising results.

This is joint work with Alexander Heinlein, Martin Lanser, and Janine Weber.

**Kathlén Kohn ***KTH Stockholm***Invariant theory and scaling algorithms for maximum likelihood estimation**

The task of fitting data to a model is fundamental in statistics. For this, a widespread approach is finding a maximum likelihood estimate (MLE), where one maximizes the likelihood of observing the data as we range over the model. For two common statistical settings (log-linear models and Gaussian transformation families), we show that this approach is equivalent to a capacity problem in invariant theory: finding a point of minimal norm in an orbit under a corresponding group action. The existence of the MLE can then be characterized by stability notions under the action. This dictionary between invariant theory and statistics has already led to the solution of long-standing questions concerning the MLE of matrix normal models. Moreover, algorithms from statistics can be used in invariant theory, and vice versa.

This talk is based on joint work with Carlos Améndola, Philipp Reichenbach and Anna Seigal.

**Daniel Kressner ***EPF Lausanne***Randomized trace estimates for indefinite matrices with an application to determinants**

The need for estimating the determinant of a large-scale symmetric positive definite (SPD) matrix A features prominently in several machine learning tasks, most notably in maximum likelihood estimation for Gaussian process regression. Because of log(det(A)) = trace( log(A) ), estimating the log-determinant is equivalent to estimating the trace of the matrix logarithm. This simple relation allows for the application of Hutchinson's randomized trace estimator, which approximates the trace of a symmetric matrix B by computing the average of x'*B*x for many samples of a random vector x. When estimating determinants, two major obstacles arise: (1) The matrix B=log(A) can be indefinite even when A is SPD. This complicates the analysis; nearly all existing tail bounds only apply to SPD matrices. (2) The exact evaluation of x'*log(A)*x requires the computation of the matrix logarithm, which is expensive. Recent work by Ubaru, Chen, and Saad considers the use of Rademacher random vectors and addresses (1) by shifting the matrix and (2) by using Lanczos quadrature. In this talk, we will explain why the tail bounds obtained from shifting are overly pessimistic. A new bound is presented, which reduces the required number of samples by up to a factor n, where n is the size of the matrix. We will also extend these results to Gaussian random vectors, which requires some careful consideration for the Lanczos quadrature because of the unboundedness of such vectors.

This talk is based on joint work with Alice Cortinovis.

**Gitta Kutyniok ***TU Berlin***The Mathematics of Deep Learning: Can we Open the Black Box of Deep Neural Networks?**

Despite the outstanding success of deep neural networks in real-world applications, most of the related research is empirically driven and a comprehensive mathematical foundation is still missing. Regarding deep learning as a statistical learning problem, the necessary theory can be divided into the research directions of expressivity, learning, and generalization. Recently, the new direction of interpretability became important as well.

In this talk, we will provide an introduction into those four research foci. We will then delve a bit deeper into the area of expressivity, namely the approximation capacity of neural network architectures as one of the most developed mathematical theories so far, and discuss some recent work. Finally, we will provide a survey about the novel and highly relevant area of interpretability, which aims at developing an understanding how a given network reaches decisions, and discuss the very first mathematically founded approach to this problem.

**Guido Montúfar ***Max Planck Institute for Mathematics in the Sciences***Computing the Unique Information**

Given a pair of predictor variables and a response variable, how much information do the predictors have about the response, and how is this information distributed between unique, redundant, and synergistic components? Recent work has proposed to quantify the unique component of the decomposition as the minimum value of the conditional mutual information over a constrained set of information channels. We present an efficient iterative divergence minimization algorithm to solve this optimization problem with convergence guarantees and evaluate its performance against other techniques.

This is joint work with Pradeep Kr. Banerjee and Johannes Rauh.

**Christoph Schnörr ***Heidelberg University***Assignment Flows for Data Labeling and Pattern Formation on Graphs**

Assignment flows are smooth dynamical systems for labeling data on graphs. After introducing the basic approach, I explain a recent extension to unsupervised scenarios, self-assignment flows, that combine computation of prototypes for data coding and data labeling using these prototypes. A brief outlook on parameter learning and the problem to understand their function conclude the talk.

**Ingo Steinwart ***University of Stuttgart***Density-Based Cluster Analysis**

A central, initial task in data science is cluster analysis, where the goal is to find clusters in unlabeled data. One widely accepted definition of clusters has its roots in a paper by Carmichael et al., where clusters are described to be densely populated areas in the input space that are separated by less populated areas. The mathematical translation of this idea usually assumes that the data is generated by some unknown probability measure that has a density with respect to the Lebesgue measure. Given a threshold level, the clusters are then defined to be the connected components of the density level set. However, choosing this threshold and possible width parameters of a density estimator, which is left to the user, is a notoriously difficult problem, typically only addressed by heuristics.

In this talk, I show how a simple algorithm based on a density estimator can find the smallest level for which there are more than one connected component in the level set. Unlike other cluster algorithms this approach is fully adaptive in the sense that it does not require the user to guess crucial hyper-parameters. Finally, I present some numerical illustrations.

**Joel Tropp ***California Institute of Technology***SketchySVD**

This talk asserts that randomized linear algebra is a natural tool for on-the-fly compression of data matrices that arise from large-scale scientific simulations and data collection. The technical contribution consists in a new algorithm for constructing an accurate low-rank approximation of a huge matrix from streaming data. Among other applications, we show how the SVD of a large-scale sea surface temperature dataset exposes features of the global climate.

## Date and Location

**September 10 - 11, 2020**

Max Planck Institute for Mathematics in the Sciences

Virtual event - Videobroadcast

## Scientific Organizers

**Max von Renesse**

Leipzig University**André Uschmajew**

MPI for Mathematics in the Sciences

## Administrative Contact

**Valeria Hünniger**

MPI for Mathematics in the Sciences

Contact by Email