TAGS - Linking Topology to Algebraic Geometry and Statistics

Abstracts for the talks

Ulrich Bauer
Technical University of Munich
Persistent homology and the stability theorem

 slides 

Persistent homology, the homology of a filtration of topological spaces, is described up to isomorphism by the persistence diagram (barcode), which encodes the structure of its indecomposable summands. The stability theorem, a fundamental theorem in the theory of persistence, states that small changes in the input data lead only to small changes in the persistence barcode. We will explore a constructive algebraic approach to this result that admits a high level of generality.

Magnus Botnan
Technical University of Munich
Multidimensional Persistence and Clustering

 slides 

It is widely known that multidimensional persistent homology is "hard". But what does this really mean? In the first part of the talk I will discuss this from the point of view of representation theory of quivers. Next I will show how multidimensional clustering gives rise to multidimensional persistence modules of a particularly appealing form. Given their simplicity, it is natural to wonder if we are able to carry out persistence computations like in 1-D in the setting of clustering. I will discuss recent work with Ulrich Bauer, Steffen Oppermann and Johan Steen, which shows that this restriction alleviates the aforementioned "hardness" in only very special cases. The last part of the talk will concern how much of this complexity we see in persistence modules coming from data.

Herbert Edelsbrunner
Institute of Science and Technology Austria
The multi-cover persistence of Euclidean balls
Given a locally finite set X in Rˆd and a positive radius, the k-fold cover of X and r consists of all points that have k or more points of X within distance r. The order-k Voronoi diagram decomposes the k-fold cover into convex regions, and we use the nerve of this decomposition to compute homology and persistence. In particular the persistence in depth is interesting from a geometric as well as algorithmic viewpoint. The main tool in understanding its structure is a rhomboid tiling in Rˆd+1 that combines the nerves for all values of k into one. We mention a straightforward consequence, namely that the cells in the nerve are generically not simplicial, unless k=1 or d=1,2. (Joint work with Georg Osang.)

Heather Harrington
University of Oxford
Computational algebraic geometry for topological data analysis
I will present how computational algebra can be used to study problems arising in topological data analysis. I will motivate each of these problems using applications. First I will present how we use numerical algebraic geometry to sample points on real algebraic varieties. Time permitting, I will propose interpretable invariants for multi-parameter persistent homology using computational commutative algebra.

Kathryn Hess
École polytechnique fédérale de Lausanne
The many roles of topology in neuroscience

 slides 

I will present an overview of the many and varied applications of topology to neuroscience on which we have been working at the EPFL over the past three years.

Stephan Huckemann
University of Göttingen
Some Statistical Challenges Coming with Non-Euclidean Data

 slides 

A wide range of statistical applications rely on the central limit theorem (CLT) which assures for random vectors, upon existence of second moments, asymptotic square-root-n-normality of the mean. For deviates taking values in a Riemannian manifold, the celebrated CLT of Bhattacharya and Patrangenaru (2005) extends this result to Fréchet means of the intrinsic distance, under a collection of additional assumptions.

In the sequel the Battacharya-Patrangenaru CLT has been extended to Fréchet means of more general distances, e.g. Procrustes distances on shape spaces to ”distances” between data and a data-descriptor, a geodesic say, and even to nested sequences of random descriptors, such as principal nested spheres.

In particular, the latter is a generalization of the classical asymptotics of principal components by Anderson (1963). Still, for all of these general scenarios, analogs of the above mentioned additional assumptions are required to hold true. Meanwhile, the geometric meaning of these assumptions has been exemplary studied: the distribution near the cut locus of the mean may be responsible for rates slower than square-root-n with non-normal limiting distributions, and we call this phenomenon smeariness. Asymptotic nonnormality may also be observed on noncomplete manifolds, even without cut loci. For nonmanifold spaces allowing only for a Riemannian stratification, for example BHV phylogentic tree spaces, ”infinitely fast” rates have been observed, making straightforward statistical testing impossible, this phenomenon is called stickiness.

Vladimir Itskov
The Pennsylvania State University
Directed complexes, non-linear rank and convex sensing

 slides 

Convex sensing, i.e. sensing by quasiconvex functions, naturally arises in neuroscience and other contexts. One example of a convex sensing problem is measuring the underlying dimension of data. A related problem is computing the ''nonlinear rank'', i.e. the minimal rank of a matrix modulo the action by the group of row-wise nonlinear monotone transformations. While an exact algorithm for computing the nonlinear rank is unknown, it can be efficiently estimated using topological methods, as it is closely related to the geometry and topology of hyperplane arrangements.

A natural tool that captures the essence of convex sensing problems is directed complexes, which capture much of the relevant geometric information. For example, the nonlinear rank, as well as other geometric properties of data can be estimated from the homology of an associated directed complex. I will present recent results and conjectures about the directed complexes associated to some convex sensing problems.

Jürgen Jost
Max Planck Institute for Mathematics in the Sciences
Ricci curvature as a tool for analyzing networks and other complexes
In network science, concepts and quantities are often introduced in a rather ad hoc manner. I shall use analogies with another geometric discipline, Riemannian geometry, to develop network analysis tools in a systematic manner.
At the heart of Riemannian geometry are notions of curvature. In particular, in recent decades,fundamental aspects of Ricci curvature have been discovered and explored. In fact, there are two aspects of Ricci curvature that turn out to be useful in metric geometry. One is about the average dispersion of geodesics. The other is about comparing the distance between two points with the transportation cost between their local neighborhoods.
As I shall explain, both are useful for the theoretical investigation of networks and other complexes as well as for the analysis of empirical network data.

Pierre Lairez
INRIA Saclay Île-de-France
Numerical computation of the homology of basic semialgebraic sets

 slides 

Semialgebraic sets (that is subsets of Euclidean spaces defined by polynomial equations and inequalities) come in a wide variety of shapes. This raises the problem of describing a given specimen, from basic features like the number of connected components, to finer ones, like homology.
Following a well-known approach, we compute the homology using approximations by point cloud. We show how to ensure correctness and give complexity bounds in terms of the conditioning of the input equations. A probabilistic analysis shows that outside of a vanishingly small subset of the input space, the complexity is single exponential with respect to the number of variables. This improves upon the previously-known double exponential complexity bounds.

Michael Lesnick
Princeton University
Quantifying Genetic Innovation: Mathematical Foundations for the Topological Study of Reticulate Evolution
In 2013 Chan, Carlsson, and Rabadan introduced a topological approach to the study of genetic recombination, based on persistent homology. In this work, we develop theoretical foundations for this. First, we clarify the motivation for the topological approach by presenting a novel formulation of the underlying inference problem. Specifically, we introduce and study the novelty profile, a simple, stable statistic of an evolutionary history which not only counts recombination events but also quantifies how recombination creates genetic diversity. We propose that the (hitherto implicit) goal of the topological approach to recombination is the estimation of novelty profiles.

We then develop a theory for the estimation of novelty profiles using barcodes. We focus on a low-recombination regime, where the ancestry of a population-genetic sample can be described using directed acyclic graphs called galled trees, which differ from trees only by isolated topological defects. We show that in this regime, under a complete sampling assumption, the 1st persistence barcode of evolutionary data yields a lower bound on the novelty profile, and hence on the number of historical recombination events. For i>1, the ith barcode is empty. In addition, we use the stability of persistent homology to extend these results to ones which hold for an arbitrary subsample of an arbitrary evolutionary history. We establish these results as corollaries of new mathematical results about the topology of certain Vietoris-Rips filtrations, which may be of independent interest.

Joint work with Raul Rabadan and Daniel Rosenbloom

Pascal Massart
Université de Paris-Sud
Model selection and Geometry

 slides 

The idea of selecting a model via penalizing a log-likelihood type criterion is an old idea in statistics. It goes back to the early seventies with the pioneering works of Mallows and Akaike.

One can find many consistency results in the literature for such criteria. These results rely on limit theorems from probability theory and are asymptotic in the sense that one deals with a given number of models and the number of observations tends to infinity. This setting turns out to be irrelevant for many applications (especially for high-dimensional data). Taking as a central illustrative example the Gaussian white noise framework, we shall show that it is possible to develop a non asymptotic theory for model selection. This theory for model selection deeply relies on the concentration of mesure phenomenon which is known to be of a geometric nature. Conversely some important issues of geometrical inference can be solved by using the model selection approach. We shall illustrate this point by focusing on a specific example: the use of model section for simplicial approximation.

Anthea Monod
Columbia University
Tropical Sufficient Statistics for Persistent Homology
Topological data analysis has been shown to be an innovation in studying genetic reassortment events in RNA viruses --- a key factor in studying viral mutations, such as those in HIV. In this talk, I will show how a sufficient statistic for an important invariant in topological data analysis, persistence barcodes, constructed using tropical geometry allows for classical probability distributions to be assumed on the tropicalized representation of barcodes. This makes a variety of parametric statistical inference methods amenable to barcodes, all while maintaining their initial interpretations. More specificially, exponential family distributions may be assumed and likelihood functions for persistent homology may be constructed. We conceptually demonstrate sufficiency and illustrate its utility in dimensions 0 and 1, with a rigorous parametric statistical analysis of inter- and intra-subtype reassortment events in both HIV and avian influenza. This is joint work with Sara Kalisnik Verovsek, Juan Angel Patino-Galindo and Lorin Crawford, and supported by the US NIH under project number R01GM117591.

Luke Oeding
Auburn University
Higher Order Partial Least Squares and an Application to Neuroscience

 slides 

Partial least squares (PLS) is a method to discover a functional dependence between two sets of variables X and Y. PLS attempts to maximize the covariance between X and Y by projecting both onto new subspaces. Higher order partial least squares (HOPLS) comes into play when the sets of variables have additional tensorial structure. Simultaneous optimization of subspace projections may be obtained by a multilinear singular value decomposition (MSVD). I'll review PLS and SVD, and explain their higher order counterparts. Finally I'll describe recent work with G. Deshpande, A. Cichocki, D. Rangaprakash, and X.P. Hu where we propose to use HOPLS and Tensor Decompositions to discover latent linkages between EEG and fMRI signals from the brain, and ultimately use this to drive Brain Computer Interfaces (BCI)'s with the low size, weight and power of EEG, but with the accuracy of fMRI.

Wolfgang Polonik
University of California at Davis
Topological data analysis and statistics
This talk will address various aspects of TDA from a more statistical perspective. It will include a brief survey of relevant approaches for statistical inference, and a critical discussion of some challenges involved in (i) the development of approaches for statistical inference; (ii) the derivation of statistically meaningful theory; and (iii) more practical challenges arising in scientific applications of TDA methods.

Primoz Skraba
University of Primorska
New Results for the Stability of Persistence Diagrams

 slides 

Stability is certainly one most important aspect of persistence and there are numerous results. Most of the results are in terms of bottleneck distance between persistence diagrams. While applicable in general settings, it is a sup-norm, and using it to prove convergence can be difficult. In this talk, I will discuss some new results concentrating on a bound on the p-Wasserstein distance between persistence diagrams. The main result states that p-norm between two functions on a simplicial complex is an upper bound on the p-Wasserstein distance between the corresponding persistence diagrams. I will discuss some other related results as well as applications.

Bernd Sturmfels
Max Planck Institute for Mathematics in the Sciences
Learning Algebraic Varieties from Samples

 slides 

Data sets often have an inherent geometric structure, revealed by approximate algebraic relations of low degree. We discuss a range of methods for studying such samples. One aim is to enhance the toolkit of manifold learning and applied topology with the power of algebraic geometry. This is an ongoing joint project with Paul Breiding, Sara Kalisnik and Madeleine Weinstein.

André Uschmajew
Max Planck Institute for Mathematics in the Sciences
Topological problems in low-rank optimization
In low-rank optimization one seeks to minimize functions on manifolds or varieties of low-rank matrices or tensors. This has useful applications in signal and image processing, or in high-dimensional scientific computing. Due to their parametrization by multilinear maps, sets of low-rank matrices and tensors exhibit a rich geometric structure, which sometimes makes it possible to go beyond generic results in the analysis of corresponding nonlinear optimization methods on these sets. This concerns for example necessary first-order optimality conditions or classification of critical points. In this talk we discuss some of these questions.

Francesco Vaccarino
Politecnico di Torino
A persistent variation: data field theory via sheaves of incidence algebras
We will extend persistent (co-)homology by considering diagrams of incidence algebras of finite posets. By using results of Gerstenhaber et al. we will show what is the link between diagram cohomology, Hochschild cohomology, and multidimensional persistence connecting seemingly uncorrelated research areas as topological data analysis, noncommutative geometry and topological field theory (via Cattaneo - Mnev - Reshetikhin work on cellular field theory).

Max Wardetzky
University of Göttingen
Some statistical challenges of topological inference in the 1D case

 slides 

In this talk I will present some thoughts on challenges of inferring topological features from noisy data in the perhaps most simple setting: estimating the number of modes (i.e., local maxima) of a 1D signal. Already this simple setup illustrates some of the statistical obstacles when trying to infer topological information. A particular focus will be on inference without presmoothing the data. In particular, I will discuss some aspects of persistent topology in the 1D setting and present a somewhat alternative approach based on the Kolmogorov instead of the sup norm. I will highlight some of the advantages and disadvantages of these approaches from a statistical perspective.

Jun Zhang
University of Michigan
Statistical Manifold and Entropy-Based Inference
Information Geometry is the differential geometric study of the manifold of probability models, and promises to be a unifying geometric framework for investigating statistical inference, information theory, machine learning, etc. Instead of using metric for measuring distances on such manifolds, these applications often use “divergence functions” for measuring proximity of two points (that do not impose symmetry and triangular inequality), for instance Kullback-Leibler divergence, Bregman divergence, f-divergence, etc. Divergence functions are tied to generalized entropy (for instance, Tsallis entropy, Renyi entropy, phi-entropy) and cross-entropy functions widely used in machine learning and information sciences. It turns out that divergence functions enjoy pleasant geometric properties – they induce what is called “statistical structure” on a manifold M: a Riemannian metric g together with a pair of torsion-free affine connections D, D*, such that D and D* are both Codazzi coupled to g while being conjugate to each other. Divergence functions also induce a natural symplectic structure on the product manifold MxM for which M with statistical structure is a Lagrange submanifold. We recently characterize holomorphicity of D, D* in the (para-)Hermitian setting, and show that statistical structures (with torsion-free D, D*) can be enhanced to Kahler or para-Kahler manifolds. The surprisingly rich geometric structures and properties of a statistical manifold open up the intriguing possibility of geometrizing statistical inference, information, and machine learning in string-theoretic languages.

 

Date and Location

February 19 - 23, 2018
Max Planck Institute for Mathematics in the Sciences
Inselstraße 22
04103 Leipzig
Germany
see travel instructions

Scientific Organizers

Christiane Görgen
MPI für Mathematik in den Naturwissenschaften

Sara Kališnik Verovšek
MPI für Mathematik in den Naturwissenschaften

Vlada Limic, Université de Strasbou & CNRS, Paris

Administrative Contact

Saskia Gutzschebauch
MPI für Mathematik in den Naturwissenschaften
Contact by Email

Phone: (+49)-341-9959-752

07.03.2018, 13:12