The Geometry and Topology in Machine Learning (GTML) workshop brings together two rapidly evolving fields central to modern machine learning. Geometry and topology provide essential methods for describing data structure and frameworks for analyzing, unifying, and generalizing machine learning techniques to new settings.
The workshop will feature 10 keynote talks and 20 presentations by leading experts. By merging the Workshop on Geometry in Machine Learning (GaML) and the Workshop on Topological Methods in Data Analysis (TMDA), GTML creates a platform to foster collaboration and explore the interplay between geometry, topology, and machine learning.
We introduce a class of algebraic varieties naturally associated with ReLU neural networks, arising from the piecewise linear structure of their outputs across activation regions in input space, and the piecewise multilinear structure in parameter space. By analyzing the rank constraints on the network outputs within each activation region, we derive polynomial equations that characterize the functions representable by the network. We further investigate conditions under which these varieties attain their expected dimension, providing insight into the expressive and structural properties of ReLU networks. This is work with Yulia Alexandr.
In this talk, we present recent results on the derivation of effective models for the training dynamics of Riemannian stochastic gradient descent (SGD) in limits of small learning rates or large, shallow networks. The focus lies on developing effective limiting models that also capture the fluctuations inherent in Riemannian SGD. This will lead to novel concepts of stochastic modified flows and distribution-dependent modified flows. The advantage of these limiting models is that they match the SGD dynamics to higher order and recover the correct multi-point distributions. This is joint work with Vitalii Konarovskyi and Sebastian Kassing.
In this talk, we study the mean-field regime of the residual networks architectures, which are key to modern deep learning model. We show that the landscape of the loss function for standard quadratic loss is made "nicer" by using such an architecture. More precisely, in the time-continuous limit and two different over-parametrized regimes, we prove that the loss function satisfies a local Polyak-Lojasiewicz inequality, thereby guaranteeing that any critical point is a global minimum and a local convergence results. If time permits, we also present a recent result on feature learning in the context of a shallow network. This is joint work with Raphaël Barboni and Gabriel Peyré.
In this talk, we will present some applications of finite-dimensional and infinite-dimensional geometry to Artificial Intelligence. The growing need to process geometric data such as curves, surfaces or fibered structures in a resolution-independent way that is invariant to shape-preserving transformations gives rise to mathematical questions both in pure and applied differential geometry, but also in numerical analysis. To illustrate this, applications of shape analysis in medical imaging and temporal alignement of signals will be given. On the other hand, the extraction of geometric information from point clouds formed by data sets is an area where differential geometry meets probability, as in dimension reduction or manifold learning. Some of the challenges in these areas will be mentioned.
Despite the tremendous success of artificial intelligence (AI) in science, engineering, and technology in the past decade, its explainability and generalizability have been a major concern. The solution to these challenges holds the future of AI. Topological deep learning (TDL), a new frontier in rational learning introduced by us in 2017, offers interpretable and generalized AI approaches. TDL utilizes topological data analysis (TDA), which is originally rooted in persistent homology, an algebraic topology technique for point cloud data. Recently, much effort has been given to the generalization of TDA to combinatoric spectral theory, differential topology, and geometric topology to tackle data on graphs, differentiable manifolds, and curves embedded in 3-space, respectively (see arXiv:2507.19504 for a review). These approaches reduce dimensionality, simplify geometric complexity, capture high-order interactions, and provide interpretable AI models in a manner that cannot be achieved through other mathematical, statistical, and physical methodologies. I will discuss compelling examples and applications which consistently demonstrate the advantages of TDL over competing methods.
A large driver contributing to the undeniable success of deep-learning models is their ability to synthesise task-specific features from data. For a long time, the predominant belief was that 'given enough data, all features can be learned.' However, as large language models are hitting diminishing returns in output quality while requiring an ever-increasing amount of training data and compute, new approaches are required. One promising avenue involves focusing more on aspects of modelling, which involves the development of novel inductive biases such as invariances that cannot be readily gleaned from the data. This approach is particularly useful for data sets that model real-world phenomena, as well as applications where data availability is scarce. Given their dual nature, geometry and topology provide a rich source of potential inductive biases. In this talk, I will present novel advances in harnessing multi-scale geometrical-topological characteristics of data. A special focus will be given to show how geometry and topology can improve representation learning tasks. Underscoring the generality of a hybrid geometrical-topological perspective, I will furthermore showcase applications from a diverse set of data domains, including point clouds, graphs, and higher-order combinatorial complexes.
We introduce copresheaf topological neural networks (CTNNs), a powerful and unifying framework that encapsulates a wide spectrum of deep learning architectures, designed to operate on structured data including images, point clouds, graphs, meshes, and topological manifolds. While deep learning has profoundly impacted domains ranging from digital assistants to autonomous systems, the principled design of neural architectures tailored to specific tasks and data types remains one of the field's most persistent open challenges. CTNNs address this gap by grounding model design in the language of copresheaves, a concept from algebraic topology that generalizes and subsumes most practical deep learning models in use today. This abstract yet constructive formulation yields a rich design space from which theoretically sound and practically effective solutions can be derived to tackle core challenges in representation learning: long-range dependencies, oversmoothing, heterophily, and non-Euclidean domains. Our empirical results on structured data benchmarks demonstrate that CTNNs consistently outperform conventional baselines, particularly in tasks requiring hierarchical or localized sensitivity. These results underscore CTNNs as a principled, multi-scale foundation for the next generation of deep learning architectures.
When physical properties of molecules are being modeled with machine learning, it is desirable to incorporate SO(3)-covariance. While such models based on low body order features are not complete, we formulate and prove general completeness properties for higher order methods and show that 6k – 5 of these features are enough for up to k atoms. We also find that the Clebsch–Gordan operations commonly used in these methods can be replaced by matrix multiplications without sacrificing completeness, lowering the scaling from $O(l^6)$ to $O(l^3)$ in the degree of the features. We apply this to quantum chemistry, but the proposed methods are generally applicable for problems involving three-dimensional point configurations. (https://pubs.acs.org/doi/full/10.1021/acs.jpclett.4c)
Many of the services we use daily, both online and offline, rely on the processing of massive amounts of data in the background. Given the large scale of the usually distributed (cloud) IT infrastructure underlying this, we aim to minimize the employed computing resources, while maximizing the processing performance. This turns out to be a challenging optimization problem, especially for complex distributed infrastructures and dynamically changing amounts of data to be processed. In this talk, I will explain how we tackle this challenge with reinforcement learning. I will first explain how the infrastructure can be modeled as a graph, connecting different data stores by individual stochastic data processing steps. Based on this, we built a simulation, on which a reinforcement learning agent is trained to dynamically manage the computing resources. I will show first results of the efficiency and performance of the agent compared to a heuristic approach, both based on the simulation as well as tests on real IT infrastructure.
Geometric deep learning has emerged as a powerful framework for the study and design of modern machine learning models, with broad applications in chemistry, physics, robotics, and engineering. At its core, it leverages invariance and equivariance with respect to the symmetry groups underlying the data. In this presentation, I will highlight two recent contributions from our research group in this area.
In the first part, I will present our work on protein design, where we introduce a generative model for protein backbone generation that leverages geometric products and higher-order message passing. Building on FrameFlow, a state-of-the-art model for protein backbone generation, we represent the frames of the protein backbone as elements of the projective geometric algebra. This formulation enables the use of geometrically more expressive bilinear geometric products as a paradigm for higher order message passing. The proposed model achieves high designability and structural diversity, while generating protein backbones that more closely match the statistical distribution of secondary structures found in naturally occurring proteins - a capability so far only insufficiently achieved by state-of-the-art generative models. I will conclude this part with future directions for protein design and potential applications in materials science.
In the second part, I will present our contributions to a perhaps unexpected application area of geometric deep learning: large language models (LLMs). While LLMs demonstrate impressive capabilities across numerous applications, their robustness and factual correctness remain a critical concern. Another vulnerability of LLMs is their order sensitivity, i.e., a bias towards the sequence in which options or documents are presented. This issue manifests in multiple-choice reasoning, automated evaluation tasks, and retrieval-augmented generation, where input order significantly impacts reliability. To address this, we propose a modification of the transformer architecture that enables the processing of mixed set and text inputs with permutation invariance guarantees. This adaptation improves performance on tasks such as multi-document summarization and multi-document question answering, while preserving the runtime efficiency of the original model and eliminating order sensitivity.
Over the past decade or so, tools from algebraic topology have been shown to be very useful for the analysis and characterization of networks, in particular for exploring the relation of structure to function. I will describe some of these tools and illustrate their utility in neuroscience, primarily in the framework of a collaboration with the Blue Brain Project.
The Persistent Homology Transform (PHT) is a topological transform introduced by Turner, Mukherjee and Boyer in 2014. Its input is a shape embedded in Euclidean space; then to each unit vector the transform assigns the persistence module of the height function over that shape with respect to that direction. The PHT is injective on piecewise-linear subsets of Euclidean space, and it has been demonstrably useful in diverse applications as it provides a landmark-free method for quantifying the distance between shapes. One shortcoming is that shapes with different essential homology (i.e., Betti numbers) have an infinite distance between them. The theory of extended persistence for Morse functions on a manifold was developed by Cohen-Steiner, Edelsbrunner and Harer in 2009 to quantify the support of the essential homology classes. By using extended persistence modules of height functions over a shape, we obtain the extended persistent homology transform (XPHT) which provides a finite distance between shapes even when they have different Betti numbers. t may seem that the XPHT requires significant additional computational effort, but recent work by Katharine Turner and myself shows that when A is a compact n-manifold with boundary X, embedded in n-dimensional Euclidean space, the XPHT of A can be derived from the PHT of X, and a signature for each local minimum of the height function on X. James Morgan has implemented the required algorithms for 2-dimensional binary images as an R-package. This talk will provide an outline of our results and illustrate their application to shape clustering, and symmetry quantification. These applications were studied by former students Jency Jiang and Nicholas Bermingham.
We apply mapper graphs—a widely used tool in topological data analysis and visualization—to investigate the topological structures of large language model (LLM) embedding spaces. The mapper’s taxonomy includes elements such as nodes, edges, paths, components, and trajectories. We introduce the Explainable Mapper workspace and two mapper agents to support embedding investigation. These agents utilize summarization, comparison, and perturbation operations to generate and verify explanations of mapper elements such as the linguistic aspect of clusters, connectivity, and transitions. This is a joint work involving Xinyuan Yan, Rita Sevastjanova, Sinie van der Ben, Mennatallah El-Assady, and Bei Wang. http://arxiv.org/abs/2507.18607
Geometry reconstruction, the recovering of geometric structures in 3D from 2D images, is a central challenge in computer vision. Stereo vision techniques, especially when applied to random-dot stereograms, underscore how crucial prior information is in resolving ambiguous depth cues and achieving reliable shape inference. At the other end, Gaussian Splatting offers a highly efficient representation for novel-view synthesis by modeling scenes as collections of 3D Gaussians. Yet, it remains difficult to extract coherent geometry directly from these representations. A pragmatic solution is to render stereo-image pairs from these Gaussian models, apply pre-trained stereo-matching networks to infer depth maps, and fuse these maps into realistic meshes yielding impressive reconstructions.
Turning to partial shape matching, I will first revisit the Wormhole Loss framework, which offers a principled strategy by aligning manifold fragments using intrinsic and extrinsic cues, specifically geodesic distances and proximity to boundaries. Next, we’ll explore a spectral matching approach, first introduced by Rampini et al.. The method encodes partiality masks within Hamiltonian operators, then, aligns correspondences by matching operator spectra, yielding a robust solution for partial matching.
These advances offer a leap in both full-surface reconstruction via efficient Gaussian-based pipelines and partial-shape matching using spectral geometry.
Gromov-Wasserstein (GW) distances comprise a family of metrics on the space of (isomorphism classes of) metric measure spaces. Driven by specialized applications, there have been a large number of variants of GW distance introduced in the literature in recent years, each of which is designed to provide meaningful comparisons between certain data objects with complex structure. These complex data objects include (attributed) graphs, hypergraphs, point clouds endowed with preferred persistent homology cycles, and many others. In this talk, I will survey some of these variants, focusing on those with connections to applied and computational topology. I will also describe recent joint work with Bauer, Mémoli and Nishino, which introduces a general framework that captures several of these variants, allowing us to derive broadly applicable theoretical properties.
Steering diffusion processes towards a data distribution is an integral part of diffusion models in generative AI. For geometric data such as shape data, diffusion processes appear as models for stochastic dynamics of e.g. species change through evolution, or for generating data distributions on non-linear spaces, e.g. when defining constructs such as the diffusion mean that relies on geometric equivalents of the Gaussian distribution. Score learning is here key for conditioning on observed data. Thus, score learning provides a connection between generative models and geometric statistics. The talk will concern this connection, bridge simulation on geometric spaces, and application of score learning in geometric contexts. A specific example of this is conditioning diffusion processes in infinite dimensions allowing shape observations to be used for phylogenetic inference in evolutionary biology.
We introduce a framework of representation learning that uses group representations from underlying group action in data generation. We assume that the data consists of examples of the group action, comprising a point and its transformation under a group element, or sequences generated through the successive application of a group element. Utilizing an autoencoder architecture, our approach maps the data to a latent space in a manner that is equivariant to the group action, achieving an approximate group representation leaned from data. By applying block-diagonalization, we decompose the representation into irreducible representations, which we call the Neural Fourier Transform. This presents a generalized, data-driven approach to Fourier transform. We validate our framework across various scenarios such as image sequences, demonstrating that our derived irreducible representations effectively disentangle the underlying generative processes of the data. Theoretical results supporting our methodology are also presented.
The rich spectral information of the graph Laplacian has been instrumental in graph theory, machine learning, and graph signal processing for a diverse range of applications. Arguably this is due to the fact that the Laplacian spectrum encodes important topological and geometric properties. In this talk we will argue that these ideas can be naturally generalised to discrete Hodge-Laplacian matrices, whose eigenvectors and eigenvalues provide additional information. Specifically, the eigenvectors and eigenvectors of the Hodge-Laplacians provide us with vectorial representations that encode important topological information such as homology. We illustrate how these insights can be used in a range of applications, including clustering or the signal processing of flows on discrete spaces.
Graph Neural Networks (GNNs) excel at learning from relational data, but real-world systems—like biological or social networks—involve complex multiway interactions beyond simple pairwise relationships. The emerging field of Topological Deep Learning (TDL) captures these higher-order structures, yet lacks the standardized tools that made GNNs so accessible.
In this talk, I will introduce TopoTune: a lightweight framework that lets practitioners build and train powerful TDL models using any existing GNN—with unprecedented ease. Theoretical results show TopoTune generalizes the entire landscape of traditional TDL models, while experiments demonstrate it consistently matches or outperforms prior models, often with less complexity. I will also showcase new research in the community that leverages TopoTune for drastic reductions in computational cost as well as new scientific applications.
While dealing with scalar fields on surface meshes has been a staple of geometry processing, the need for discrete tangent vector fields on triangulated surface has grown steadily over the last two decades: they are crucial to encode both directions and sizing on surfaces as commonly required in tasks such as texture synthesis, non-photorealistic rendering, digital grooming (i.e., creating hair and fur on characters), and meshing. In this talk, we explain how Cartan's moving frame method can be easily discretized on triangle meshes, giving rise to intuitive notions of parallel transport, connection, holonomy, and torsion. We show how to combine these definitions to design tangent vector fields (or frame fields as well) on discrete surfaces with full control of the singularities through only linear algebra. We also show how the same ideas can be exploited in higher dimensions to generalize the well-known Isomap approach for nonlinear dimensionality reduction to non geodesically convex sampled domains, removing a long-standing limitation of manifold learning.
Bayesian analysis of deep neural networks has largely been unsuccessful, often resulting in significant under- or over-fitting. We argue that this is due to inappropriate treatment of the overparametrization that trivially follows from increasing network depth. We give a geometric characterization of overparametrization and show that respecting this geometric structure results in significant improvements to Bayesian approximations. We further discuss the numerical aspects.
When dealing with geometric models, turning a theoretical algorithm into a reliable computer program might become a tricky and sometimes frustrating task. Even when an algorithm is provably correct, its actual implementation might still fail because it approximates real numbers with finite floating point representations. Exact geometric predicates and adaptive precision may help, but the cost in terms of performance loss can be relevant.
This talk analyses and discusses how recent results have tackled this fundamental problem. Starting from Shewchuk's seminal work and CGAL approach to robustness, I will show how geometric algorithms can be made robust with virtually no performance penalty. We will see how the concept of "indirect geometric predicate" has enabled the development of a whole new family of modern algorithms that resolve long lasting problems within the geometry processing community, including 3D arrangements, boolean operations, cascaded editing, volume meshing and collision detection.
The data of computational anatomy are usually organs shapes extracted from medical images. In order to analyze them independently of their parametrisation, one have to deal with equivalence classes of sets of points, curves, surfaces or images under the action of a reparametrisation group. In neuroimaging, connectomes extracted from functional MRI are encoded by the correlation between signals at difference parcels of the brain, that is, the quotient of the SPD covariance matrices by diagonal rescalings, which is once again a quotient space. But quotient spaces are almost always non-linear spaces, while statistics where essentially developed in a Euclidean setting. Thus, there is a need for redefining a consistent statistical framework for objects living in manifolds and Lie groups, a field which is now called geometric statistics. The objective of this talk is to give an overview of geometric statistics methods in Riemannian manifolds and some of its recent developments. The talk is motivated and illustrated by applications in medical image analysis, such as the regression of simple and efficient models of the atrophy of the brain in Alzheimer's disease using the parallel transport of image deformations and the recently proposed metrics on spaces of correlation matrices with applications in connectomics. We will also show that classical statistical tools like Principal Component Analysis (PCA) most often suffer in practice from a stability and interpretability problem (the curse of isotropy) and should be replaced by Principal Subspace analysis, a relaxation of PCA based on flag spaces (sequences of nested subspaces generalizing Grassmannians).
While widespread, Transformers lack inductive biases for geometric symmetries common in science and computer vision. Existing equivariant methods often sacrifice the efficiency and flexibility that make Transformers so effective through complex, computationally intensive designs. We introduce the Platonic Transformer to resolve this trade-off. By defining attention relative to reference frames from Platonic solid symmetry groups, our method induces a principled weight-sharing scheme. This enables combined equivariance to continuous translations and Platonic symmetries, while preserving the exact architecture and computational cost of a standard Transformer. Furthermore, we show this attention is formally equivalent to a dynamic group convolution, which reveals that the model learns adaptive geometric filters and enables a highly scalable, linear-time convolutional variant. Across diverse benchmarks in computer vision (CIFAR-10), 3D point clouds (ScanObjectNN), and molecular property prediction (QM9, OMol25), the Platonic Transformer achieves competitive performance by leveraging these geometric constraints at no additional cost.
The effective resistance originates from electric circuit analysis and becomes an important concept in graph theory due to its connection to random walks and random spanning trees.Notions of effective resistance for simplicial complexes have been introduced in various ways in the literature, as products of matrices acting on the simplices.The relationships among these definitions are not immediately evident. In this talk, we generalize the notion of effective resistance in simplicial complexes by providing a basis-free definition, which encompasses the existing matrix representations above, and we describe its theoretical properties.
This is joint work with Inés Garcia Redondo (Imperial College), Sarah Percival (New Mexico U), Anda Skeja (Uppsala U), Bei Wang (Utah U), and Ling Zhou (Duke U).
Topological data analysis (TDA) has been successfully applied to study life sciences data — often high-dimensional, noisy, and heterogeneous. In this talk, I will present recent applications of TDA in combination with machine learning to spatial data, drawing on both synthetic and real-world examples. I will introduce techniques from relational TDA that we develop to encode spatial heterogeneity of multispecies data, i.e. datasets with multiple subtypes of data points. These approaches can reveal meaningful biological patterns and integrate naturally with modern machine learning methods, such as graph neural networks (GNNs). I will discuss how combining relational TDA with GNNs can enhance performance and provide deeper insights into spatially structured data. Topological data analysis (TDA) has been successfully applied to study life sciences data — often high-dimensional, noisy, and heterogeneous. In this talk, I will present recent applications of TDA in combination with machine learning to spatial data, drawing on both synthetic and real-world examples. I will introduce techniques from relational TDA that we develop to encode spatial heterogeneity of multispecies data, i.e. datasets with multiple subtypes of data points. These approaches can reveal meaningful biological patterns and integrate naturally with modern machine learning methods, such as graph neural networks (GNNs). I will discuss how combining relational TDA with GNNs can enhance performance and provide deeper insights into spatially structured data.
In this talk, I will discuss two recent studies on Mapper graphs. In the first one, we build on a recently proposed optimization framework incorporating topology to provide the first filter optimization scheme for Mapper graphs. In order to achieve this, we propose a relaxed and more general version of the Mapper graph, whose convergence properties are investigated. In the second one, we focus on finding an appropriate, density-aware, metric for comparing Reeb and Mapper graphs seen as metric measure spaces, in order to, e.g., quantify the rate of convergence of the Mapper graph to the Reeb graph. We focus on the use of Gromov-Wasserstein metrics to compare these graphs directly in order to better incorporate the probability measures that data points are sampled from.