# TAGS - Linking Topology to Algebraic Geometry and Statistics

## Abstracts for the talks

**Ulrich Bauer***Technical University of Munich***Persistent homology and the stability theorem**

Persistent homology, the homology of a filtration of topological spaces, is described up to isomorphism by the persistence diagram (barcode), which encodes the structure of its indecomposable summands. The stability theorem, a fundamental theorem in the theory of persistence, states that small changes in the input data lead only to small changes in the persistence barcode. We will explore a constructive algebraic approach to this result that admits a high level of generality.

**Magnus Botnan***Technical University of Munich***Multidimensional Persistence and Clustering**

It is widely known that multidimensional persistent homology is "hard". But what does this really mean? In the first part of the talk I will discuss this from the point of view of representation theory of quivers. Next I will show how multidimensional clustering gives rise to multidimensional persistence modules of a particularly appealing form. Given their simplicity, it is natural to wonder if we are able to carry out persistence computations like in 1-D in the setting of clustering. I will discuss recent work with Ulrich Bauer, Steffen Oppermann and Johan Steen, which shows that this restriction alleviates the aforementioned "hardness" in only very special cases. The last part of the talk will concern how much of this complexity we see in persistence modules coming from data.

**Herbert Edelsbrunner***Institute of Science and Technology Austria***The multi-cover persistence of Euclidean balls**

Given a locally ﬁnite set X in Rˆd and a positive radius, the k-fold cover of X and r consists of all points that have k or more points of X within distance r. The order-k Voronoi diagram decomposes the k-fold cover into convex regions, and we use the nerve of this decomposition to compute homology and persistence. In particular the persistence in depth is interesting from a geometric as well as algorithmic viewpoint. The main tool in understanding its structure is a rhomboid tiling in Rˆd+1 that combines the nerves for all values of k into one. We mention a straightforward consequence, namely that the cells in the nerve are generically not simplicial, unless k=1 or d=1,2. (Joint work with Georg Osang.)

**Heather Harrington***University of Oxford***Computational algebraic geometry for topological data analysis**

I will present how computational algebra can be used to study problems arising in topological data analysis. I will motivate each of these problems using applications. First I will present how we use numerical algebraic geometry to sample points on real algebraic varieties. Time permitting, I will propose interpretable invariants for multi-parameter persistent homology using computational commutative algebra.

**Kathryn Hess***École polytechnique fédérale de Lausanne***The many roles of topology in neuroscience**

I will present an overview of the many and varied applications of topology to neuroscience on which we have been working at the EPFL over the past three years.

**Stephan Huckemann***University of Göttingen***Some Statistical Challenges Coming with Non-Euclidean Data**

A wide range of statistical applications rely on the central limit theorem (CLT) which assures for random vectors, upon existence of second moments, asymptotic square-root-n-normality of the mean. For deviates taking values in a Riemannian manifold, the celebrated CLT of Bhattacharya and Patrangenaru (2005) extends this result to Fréchet means of the intrinsic distance, under a collection of additional assumptions.

In the sequel the Battacharya-Patrangenaru CLT has been extended to Fréchet means of more general distances, e.g. Procrustes distances on shape spaces to ”distances” between data and a data-descriptor, a geodesic say, and even to nested sequences of random descriptors, such as principal nested spheres.

In particular, the latter is a generalization of the classical asymptotics of principal components by Anderson (1963). Still, for all of these general scenarios, analogs of the above mentioned additional assumptions are required to hold true. Meanwhile, the geometric meaning of these assumptions has been exemplary studied: the distribution near the cut locus of the mean may be responsible for rates slower than square-root-n with non-normal limiting distributions, and we call this phenomenon smeariness. Asymptotic nonnormality may also be observed on noncomplete manifolds, even without cut loci. For nonmanifold spaces allowing only for a Riemannian stratiﬁcation, for example BHV phylogentic tree spaces, ”inﬁnitely fast” rates have been observed, making straightforward statistical testing impossible, this phenomenon is called stickiness.

**Vladimir Itskov***The Pennsylvania State University***Directed complexes, non-linear rank and convex sensing**

Convex sensing, i.e. sensing by quasiconvex functions, naturally arises in neuroscience and other contexts. One example of a convex sensing problem is measuring the underlying dimension of data. A related problem is computing the ''nonlinear rank'', i.e. the minimal rank of a matrix modulo the action by the group of row-wise nonlinear monotone transformations. While an exact algorithm for computing the nonlinear rank is unknown, it can be efficiently estimated using topological methods, as it is closely related to the geometry and topology of hyperplane arrangements.

A natural tool that captures the essence of convex sensing problems is directed complexes, which capture much of the relevant geometric information. For example, the nonlinear rank, as well as other geometric properties of data can be estimated from the homology of an associated directed complex. I will present recent results and conjectures about the directed complexes associated to some convex sensing problems.

**Jürgen Jost***Max Planck Institute for Mathematics in the Sciences***Ricci curvature as a tool for analyzing networks and other complexes**

In network science, concepts and quantities are often introduced in a rather ad hoc manner. I shall use analogies with another geometric discipline, Riemannian geometry, to develop network analysis tools in a systematic manner.

At the heart of Riemannian geometry are notions of curvature. In particular, in recent decades,fundamental aspects of Ricci curvature have been discovered and explored. In fact, there are two aspects of Ricci curvature that turn out to be useful in metric geometry. One is about the average dispersion of geodesics. The other is about comparing the distance between two points with the transportation cost between their local neighborhoods.

As I shall explain, both are useful for the theoretical investigation of networks and other complexes as well as for the analysis of empirical network data.

**Pierre Lairez***INRIA Saclay Île-de-France***Numerical computation of the homology of basic semialgebraic sets**

Semialgebraic sets (that is subsets of Euclidean spaces defined by polynomial equations and inequalities) come in a wide variety of shapes. This raises the problem of describing a given specimen, from basic features like the number of connected components, to finer ones, like homology.

Following a well-known approach, we compute the homology using approximations by point cloud. We show how to ensure correctness and give complexity bounds in terms of the conditioning of the input equations. A probabilistic analysis shows that outside of a vanishingly small subset of the input space, the complexity is single exponential with respect to the number of variables. This improves upon the previously-known double exponential complexity bounds.

**Michael Lesnick***Princeton University***Quantifying Genetic Innovation: Mathematical Foundations for the Topological Study of Reticulate Evolution**

In 2013 Chan, Carlsson, and Rabadan introduced a topological approach to the study of genetic recombination, based on persistent homology. In this work, we develop theoretical foundations for this. First, we clarify the motivation for the topological approach by presenting a novel formulation of the underlying inference problem. Specifically, we introduce and study the novelty profile, a simple, stable statistic of an evolutionary history which not only counts recombination events but also quantifies how recombination creates genetic diversity. We propose that the (hitherto implicit) goal of the topological approach to recombination is the estimation of novelty profiles.

We then develop a theory for the estimation of novelty profiles using barcodes. We focus on a low-recombination regime, where the ancestry of a population-genetic sample can be described using directed acyclic graphs called galled trees, which differ from trees only by isolated topological defects. We show that in this regime, under a complete sampling assumption, the 1st persistence barcode of evolutionary data yields a lower bound on the novelty profile, and hence on the number of historical recombination events. For i>1, the ith barcode is empty. In addition, we use the stability of persistent homology to extend these results to ones which hold for an arbitrary subsample of an arbitrary evolutionary history. We establish these results as corollaries of new mathematical results about the topology of certain Vietoris-Rips filtrations, which may be of independent interest.

Joint work with Raul Rabadan and Daniel Rosenbloom

**Pascal Massart***Université de Paris-Sud***Model selection and Geometry**

The idea of selecting a model via penalizing a log-likelihood type criterion is an old idea in statistics. It goes back to the early seventies with the pioneering works of Mallows and Akaike.

One can find many consistency results in the literature for such criteria. These results rely on limit theorems from probability theory and are asymptotic in the sense that one deals with a given number of models and the number of observations tends to infinity. This setting turns out to be irrelevant for many applications (especially for high-dimensional data). Taking as a central illustrative example the Gaussian white noise framework, we shall show that it is possible to develop a non asymptotic theory for model selection. This theory for model selection deeply relies on the concentration of mesure phenomenon which is known to be of a geometric nature. Conversely some important issues of geometrical inference can be solved by using the model selection approach. We shall illustrate this point by focusing on a specific example: the use of model section for simplicial approximation.

**Anthea Monod***Columbia University***Tropical Sufficient Statistics for Persistent Homology**

Topological data analysis has been shown to be an innovation in studying genetic reassortment events in RNA viruses --- a key factor in studying viral mutations, such as those in HIV. In this talk, I will show how a sufficient statistic for an important invariant in topological data analysis, persistence barcodes, constructed using tropical geometry allows for classical probability distributions to be assumed on the tropicalized representation of barcodes. This makes a variety of parametric statistical inference methods amenable to barcodes, all while maintaining their initial interpretations. More specificially, exponential family distributions may be assumed and likelihood functions for persistent homology may be constructed. We conceptually demonstrate sufficiency and illustrate its utility in dimensions 0 and 1, with a rigorous parametric statistical analysis of inter- and intra-subtype reassortment events in both HIV and avian influenza. This is joint work with Sara Kalisnik Verovsek, Juan Angel Patino-Galindo and Lorin Crawford, and supported by the US NIH under project number R01GM117591.

**Luke Oeding***Auburn University***Higher Order Partial Least Squares and an Application to Neuroscience**

Partial least squares (PLS) is a method to discover a functional dependence between two sets of variables X and Y. PLS attempts to maximize the covariance between X and Y by projecting both onto new subspaces. Higher order partial least squares (HOPLS) comes into play when the sets of variables have additional tensorial structure. Simultaneous optimization of subspace projections may be obtained by a multilinear singular value decomposition (MSVD). I'll review PLS and SVD, and explain their higher order counterparts. Finally I'll describe recent work with G. Deshpande, A. Cichocki, D. Rangaprakash, and X.P. Hu where we propose to use HOPLS and Tensor Decompositions to discover latent linkages between EEG and fMRI signals from the brain, and ultimately use this to drive Brain Computer Interfaces (BCI)'s with the low size, weight and power of EEG, but with the accuracy of fMRI.

**Wolfgang Polonik***University of California at Davis***Topological data analysis and statistics**

This talk will address various aspects of TDA from a more statistical perspective. It will include a brief survey of relevant approaches for statistical inference, and a critical discussion of some challenges involved in (i) the development of approaches for statistical inference; (ii) the derivation of statistically meaningful theory; and (iii) more practical challenges arising in scientific applications of TDA methods.

**Primoz Skraba***University of Primorska***New Results for the Stability of Persistence Diagrams**

Stability is certainly one most important aspect of persistence and there are numerous results. Most of the results are in terms of bottleneck distance between persistence diagrams. While applicable in general settings, it is a sup-norm, and using it to prove convergence can be difficult. In this talk, I will discuss some new results concentrating on a bound on the p-Wasserstein distance between persistence diagrams. The main result states that p-norm between two functions on a simplicial complex is an upper bound on the p-Wasserstein distance between the corresponding persistence diagrams. I will discuss some other related results as well as applications.

**Bernd Sturmfels***Max Planck Institute for Mathematics in the Sciences***Learning Algebraic Varieties from Samples**

Data sets often have an inherent geometric structure, revealed by approximate algebraic relations of low degree. We discuss a range of methods for studying such samples. One aim is to enhance the toolkit of manifold learning and applied topology with the power of algebraic geometry. This is an ongoing joint project with Paul Breiding, Sara Kalisnik and Madeleine Weinstein.

**André Uschmajew***Max Planck Institute for Mathematics in the Sciences***Topological problems in low-rank optimization**

In low-rank optimization one seeks to minimize functions on manifolds or varieties of low-rank matrices or tensors. This has useful applications in signal and image processing, or in high-dimensional scientific computing. Due to their parametrization by multilinear maps, sets of low-rank matrices and tensors exhibit a rich geometric structure, which sometimes makes it possible to go beyond generic results in the analysis of corresponding nonlinear optimization methods on these sets. This concerns for example necessary first-order optimality conditions or classification of critical points. In this talk we discuss some of these questions.

**Francesco Vaccarino***Politecnico di Torino***A persistent variation: data field theory via sheaves of incidence algebras**

We will extend persistent (co-)homology by considering diagrams of incidence algebras of finite posets. By using results of Gerstenhaber et al. we will show what is the link between diagram cohomology, Hochschild cohomology, and multidimensional persistence connecting seemingly uncorrelated research areas as topological data analysis, noncommutative geometry and topological field theory (via Cattaneo - Mnev - Reshetikhin work on cellular field theory).

**Max Wardetzky***University of Göttingen***Some statistical challenges of topological inference in the 1D case**

In this talk I will present some thoughts on challenges of inferring topological features from noisy data in the perhaps most simple setting: estimating the number of modes (i.e., local maxima) of a 1D signal. Already this simple setup illustrates some of the statistical obstacles when trying to infer topological information. A particular focus will be on inference without presmoothing the data. In particular, I will discuss some aspects of persistent topology in the 1D setting and present a somewhat alternative approach based on the Kolmogorov instead of the sup norm. I will highlight some of the advantages and disadvantages of these approaches from a statistical perspective.

**Jun Zhang***University of Michigan***Statistical Manifold and Entropy-Based Inference**

Information Geometry is the differential geometric study of the manifold of probability models, and promises to be a unifying geometric framework for investigating statistical inference, information theory, machine learning, etc. Instead of using metric for measuring distances on such manifolds, these applications often use “divergence functions” for measuring proximity of two points (that do not impose symmetry and triangular inequality), for instance Kullback-Leibler divergence, Bregman divergence, f-divergence, etc. Divergence functions are tied to generalized entropy (for instance, Tsallis entropy, Renyi entropy, phi-entropy) and cross-entropy functions widely used in machine learning and information sciences. It turns out that divergence functions enjoy pleasant geometric properties – they induce what is called “statistical structure” on a manifold M: a Riemannian metric g together with a pair of torsion-free affine connections D, D*, such that D and D* are both Codazzi coupled to g while being conjugate to each other. Divergence functions also induce a natural symplectic structure on the product manifold MxM for which M with statistical structure is a Lagrange submanifold. We recently characterize holomorphicity of D, D* in the (para-)Hermitian setting, and show that statistical structures (with torsion-free D, D*) can be enhanced to Kahler or para-Kahler manifolds. The surprisingly rich geometric structures and properties of a statistical manifold open up the intriguing possibility of geometrizing statistical inference, information, and machine learning in string-theoretic languages.

## Date and Location

**February 19 - 23, 2018**

Max Planck Institute for Mathematics in the Sciences

Inselstraße 22

04103 Leipzig

Germany

see travel instructions

## Scientific Organizers

**Christiane Görgen**

MPI für Mathematik in den Naturwissenschaften**Sara Kališnik Verovšek**

MPI für Mathematik in den Naturwissenschaften**Vlada Limic**, Université de Strasbou & CNRS, Paris

## Administrative Contact

**Saskia Gutzschebauch**

MPI für Mathematik in den Naturwissenschaften

Contact by Email

Phone: (+49)-341-9959-752