# Low-rank Optimization and Applications

## Abstracts for the talks

**Elena Angelini ***Università di Siena***On the identifiability of symmetric tensors beyond the Kruskal's bound**

I will discuss how the study of the Hilbert function and Cayley-Bacharach property of finite subsets in the complex projective space provide information on the minimality and identifiability of symmetric tensors, even beyond the range of applicability of Kruskal's criterion. As an application, I will describe some examples in the case of ternary forms. (joint works with Luca Chiantini, Andrea Mazzon, Nick Vannieuwenhoven).

**Grey Ballard ***Wake Forest University***Parallel Algorithms for CP and Tucker Decompositions**

Multidimensional data, coming from scientific applications such as numerical simulation or multimodal imaging, can often overwhelm the memory or computational resources of a single workstation. In this talk, we’ll describe parallel algorithms and software implementations for computing both CP and Tucker decompositions of large, dense tensors. The open-source software is designed for clusters of computers and has been benchmarked on various supercomputers. The algorithms are scalable, able to process terabyte-sized tensors and maintain high computational efficiency for 100s to 1000s of processing nodes. We will detail the data distribution and parallelization strategies for the key computational kernels within the algorithms, which include the matricized-tensor times Khatri-Rao product, the tensor times matrix product, and computation of (structured) Gram matrices.

**Rafael Ballester-Ripoll ***ETH Zürich***Tensor networks, weighted automata, and ANOVA-driven modeling**

Low-rank tensor decompositions can be extremely insightful when a model has high degrees of interaction between its inputs (in the ANOVA sense). Thanks to certain connections between tensor networks and weighted automata, we can now design efficient algorithms that measure and control the interplay between input variables. This can be done using any of a wide range of sensitivity metrics: effective dimensions, dimension distribution, Sobol indices and arbitrary aggregations thereof, and more. We will review those techniques and discuss how they open up new possibilities for informed modeling, optimization, and visualization of high-dimensional tensors.

**Paul Breiding ***Max Planck Institute for Mathematics in the Sciences***Pencil-based algorithms for tensor rank decomposition are not stable**

I will discuss the existence of an open set of n1× n2× n3 tensors of rank r on which a popular and eﬃcient class of algorithms for computing tensor rank decompositions is numerically unstable. Algorithm of this class are based on a reduction to a linear matrix pencil, typically followed by a generalized eigendecomposition. The analysis shows that the unstability is caused by the fact that the condition number of the tensor rank decomposition can be much larger for n1×n2×2 tensors than for the n1×n2×n3 input tensor. Joint work with Carlos Beltran and Nick Vannieuwenhoven.

**Jérémy Cohen ***INRIA Rennes***Identifiability of Low-Rank Sparse Component Analysis**

In this presentation, I will present a new result on the identifiability of Sparse Component Analysis (SCA). SCA is the following problem: given an input matrix M and an integer r, find a dictionary D with r columns and a sparse matrix B with r rows such that M=DB. A key issue in SCA is identifiability, that is, characterizing the conditions under which D and B are essentially unique (that is, they are unique up to permutation and scaling of the columns of D and rows of B). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of data points (that is, columns of M) that leads to identifiability. In a recent work, we provided new results in the deterministic scenario when the data has a low-rank structure, that is, when D has rank r, drastically improving with respect to previous results. In particular, we show that if each column of B contains at least s zeros then O(r^3/s^2) data points are sufficient to obtain an essentially unique decomposition, as long as these data points are well spread among the subspaces spanned by r-1 columns of D. This implies for example that for a fixed proportion of zeros (constant and independent of r, e.g., 10% of zero entries in B), one only requires O(r) data points to guarantee identifiability.

**Lieven De Lathauwer ***KU Leuven***Canonical Polyadic Decomposition of Incomplete Tensors: an Algebraic Study**

We discuss a new approach to find the CPD of a tensor that is not fully available. While standard techniques for tensor completion rely on random sampling of entries or on sampling full columns, rows as well as fibers (e.g. cross approximation), we follow a way in between --- we sample fibers in one mode and one mode only. We explain that under mild conditions this type of sampling allows us to reduce the computation of the CPD to rank-1 matrix completion. The reduction relies on linear algebra only. We also present generalizations, for fiber-sampled incomplete tensors, of algebraic conditions for CPD uniqueness.

This is joint work with Mikael Sorensen (University of Virginia, VA).

**Evgeny Frolov ***Skoltech Moscow***A straightforward generalization of low rank approximation approach for hybrid recommender systems**

We propose a new hybrid approach for matrix- and tensor-based recommender systems. Unlike the majority of hybrid recommenders, it directly ties collaborative user behavior with additional side information in an intuitive and straightforward way. It not only helps to address the problem of extreme data sparsity, but also allows to naturally exploit patterns in the observed interactions for constructing a compact and meaningful representation of user intents. We demonstrate the effectiveness of the proposed model on several standard benchmark datasets. The general formulation of the approach imposes no restrictions on the type of observed interactions and makes it potentially applicable for joint modelling of any type of contextual information along with side data.

**Jochen Garcke ***University of Bonn, and Fraunhofer SCAI***A Geometrical Method for Low-Dimensional Representations of Simulations**

We present a data analysis approach where simulations of an industrial product are contained in the space of surface meshes embedded in R^3. In this setting, a discrete Laplace-Beltrami operator can be constructed on the mesh, which is invariant to isometric transformations and therefore valid for all simulations. The eigenfunctions of such an operator are used as a common basis for all (isometric) simulations and one can use the projection coefficients instead of the full simulations for further analysis.

The data analysis approach is applied to time dependent datasets from numerical car crash simulations. One observes that only a few spectral coefficients are necessary to describe the data variability and low dimensional structures are obtained. The eigenvectors are seen to recover different independent variation modes such as translation, rotation, or global and local deformations. An effective analysis of the data from bundles of numerical simulations is made possible, in particular an analysis for many simulations in time.

**Griet Goovaerts ***KU Leuven***The power of tensors in cardiac applications**

The electrocardiogram (ECG) is a biomedical signal that is widely used to monitor the heart and diagnose cardiac problems. Depending on the clinical need, the ECG is recorded with one or multiple leads (or channels) from different body locations. The signals from different ECG leads represent the cardiac activity in different spatial directions and are thus complementary to each other. In traditional methods, the ECG signal is represented as a vector or a matrix and processed to analyze temporal information. When multiple leads are present, most methods process each lead individually and combine decisions from all leads in a later stage. While this approach is popular, it fails to exploit the structural information captured by the different leads.

The use of tensors allows the analysis of multiple modes simultaneously, using both temporal and spatial information. Tensor decomposition methods have been applied on ECG signals to solve different clinical challenges. We discuss the power of tensor-based methods for different ECG appications through various examples. In order to highlight the exible nature of tensor-based methods, we showcase various ways to construct and decompose tensors, depending on the signal characteristics and the particular application.

**Venera Khoromskaia ***Max-Planck Institute for Mathematics in the Sciences***Tensor numerical methods in computational quantum chemistry**

**Boris Khoromskij ***Max Planck Institute for Mathematics in the Sciences***QTT and range-separated tensor formats in scientific computing and data science**

**Sebastian Krämer ***RWTH Aachen***A Geometrical Description of Feasible Singular Values in Tree Tensor Formats**

Tree tensor networks such as the tensor train format are a common tool for high dimensional problems. The associated multivariate rank and accordant tuples of singular values are based on diﬀerent matricizations of the same tensor. While the behavior of these singular values is as essential as in the matrix case, here the question about the feasibility of speciﬁc constellations arises: which prescribed tuples can be realized as singular values of a tensor and what is this feasible set? We ﬁrst show the equivalence of the tensor feasibility problem (TFP) to the quantum marginal problem (QMP). In higher dimensions, in case of the tensor train (TT-)format, the conditions for feasibility can be decoupled. By present results for three dimensions for the QMP, it then follows that the tuples of squared, feasible TT-singular values form polyhedral cones. We further establish a connection to eigenvalue relations of sums of Hermitian matrices, which in turn are described by sets of interlinked, so called honeycombs, as they have been introduced by Knutson and Tao. Besides a large class of universal, necessary inequalities as well as the vertex description for a special, simpler instance, we present a linear programming algorithm to check feasibility and a simple, heuristic algorithm to construct representations of tensors with prescribed, feasible TT-singular values in parallel.

**Daniel Kressner ***EPF Lausanne***On adaptive cross approximation**

Adaptive cross approximation (ACA) is a popular techniques to construct low-rank approximations from entry-wise queries of a matrix or function. In the first part of this talk, we present a non-asymptotic a priori error analysis of ACA. This allows us to identify a nontrivial class of matrices for which the error of ACA is guaranteed to stay close to the best low-rank approximation error. In the second part of this talk, we present a novel extension of ACA to parameter-dependent matrices, as they arise - for example - in uncertainty quantification. This also yields a new variant of ACA for tensors of third order. If time permits, the modified ACA is compared with Riemannian optimizaton techniques. This talk is based on joint work with Alice Cortinovis, Stefano Massei, and others.

**Lek-Heng Lim ***University of Chicago***Complex tensor approximations**

We show that in finite-dimensional nonlinear approximations, the best r-term approximant of a function almost always exists over complex numbers but that the same is not true over the reals. Our result extends to functions that possess special properties like symmetry or skew-symmetry under permutations of arguments. For the case where we use separable functions for approximations, the problem becomes that of best rank-r tensor approximations. We show that over the complex numbers, any tensor almost always has a unique best rank-r approximation. This extends to other notions of tensor ranks such as symmetric rank and alternating rank, to best r-block-terms approximations, and to best approximations by tensor networks. When applied to sparse-plus-low-rank approximations, we obtain that for any given r and k, a general tensor has a unique best approximation by a sum of a rank-r tensor and a k-sparse tensor with a fixed sparsity pattern. The existential (but not the uniqueness) part of our result also applies to best approximations by a sum of a rank-r tensor and a k-sparse tensor with no fixed sparsity pattern, as well as to tensor completion problems. This is joint work with Mateusz Michalek and Yang Qi.

**Morten Mørup ***Technical University of Denmark***Tensor Decomposition using Variational Bayesian Inference**

Variational Bayesian inference has become a prominent framework within machine learning to account for parameter uncertainty rather than relying on maximum likelihood point estimates. In this talk it will be demonstrated that modeling uncertainty in tensor decomposition can have a profound impact in terms of 1) the tensor model's ability to predict missing data particularly when dealing with sparsely observed data, and 2) providing robustness to model misspecification when the number of components are incorrectly specified. This will be highlighted in the context of three prominent tensor decomposition approaches; the (non-negative) PARAFAC/Canonical Polyadic Decomposition (CPD), the PARAFAC2 model, and the Tensor Train Decomposition (TTD). The talk will further outline ongoing efforts in generating a general probabilistic n-way toolbox. The approaches are examined considering applications within neuroimaging and chemometrics.

**Maxim Rakhuba ***ETH Zürich***Robust numerical algorithms for electronic structure calculation using the quantized TT decomposition**

The idea of reshaping an array with 2^{d} elements into a multidimensional 2 ×× 2 array and then applying tensor train (TT) decomposition is known under the name quantized TT decomposition (QTT).

It has been shown in a number of works that arrays arising in the discretization of certain PDEs allow QTT representation with a small number of parameters. However, the quest for robust and at the same time eﬃcient QTT algorithm to solve PDEs with three (and more) physical dimensions is not over yet. In this talk, we address this problem using the example of PDEs arising in electronic structure calculation with a new algorithm, which is capable of solving PDEs discretized using 2^{100} grid points within minutes of computational time on a laptop.

**Reinhold Schneider ***TU Berlin***Variational Monte Carlo - Bridging between Numerics and Statistical Learning**

For solving solve high-dimensional PDE’s which can be casted into a variational framework restricted to appropriate possibly non-linear and even non-convex model classes. In Variational Monte Carlo we replace our objective functional by an empirical functional, in a similar way as for risk minimization or loss functions in statistical learning. For the optimization we need only computable gradients at sample points.At a price, we consider convergence in probability, i.e. error estimates holds with high probability. The analysis is carried out in the spirit of Cucker and Smale, and ﬁrst estimates about covering numbers for hierarchical tensors are available. As a model class we consider hierarchical tensors (HT) mainly in TT form. This approach has been carried out earlier for parametric problems in uncertainty quantifcation (with M. Eigel, P. Trunschke and S. Wolf). We present in this talk an application to approximate the meta-stable eigenfunctions of the corresponding Backward Kolmogorov operator by numerical approximation of the transfer operator (also know as Koopman operator), and vice versa the Fokker Planck operator (with K. Fackeldey, F. Nüske and F. Noe). We further consider the approximation of the value function for a closed loop bilinear control problem, described by the Hamilton Jacobi equation. The policy iteration from dynamic programming consists in alternating between updating the optimal feedback law and the solution of an inhomogneous Backward Kolmogorov equation. We solve the latter by means of HT/TT tensors and variational Monte Carlo. First numerical results are presented (with L. Sallandt and M. Oster).

**Hanie Sedghi ***Google Brain***The Singular Values of Convolutional Layers**

We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their eﬃcient computation. This characterization also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. We show that this is an eﬀective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2% to 5.3%.

**Volker Tresp ***LMU Munich***Deep Neural Networks in Representation Learning and Tensor Factorization**

Representation learning is hugely successful in natural language processing and knowledge graph modelling. The basic concept is that entities or words are mapped to latent vectors of a given dimension and a shallow or a deep neural network then maps those representations to one or several probabilistic statements. We discuss relationships between representation learning and tensor factorization. We show how deep neural networks can be used to learn representations for new entities and new events in the industrial TIA selection tool and in visual perception. We will also discuss how tensor-trains can be used in combination with recurrent neural networks for video classification.

**Madeleine Udell ***Cornell University***A practical one-pass algorithm for tensor decomposition**

We describe a practical one-pass algorithm for Tucker decomposition. The algorithm forms a linear sketch of the tensor and uses this sketch to construct approximations of low or fixed Tucker rank. The linear sketch is easy to compute online or using distributed computing resources, which enables Tucker decomposition of extremely large scale tensors, even those too large to fit in memory. Applications include compression of PDE simulations, large scale sensor networks, and video. We provide the first rigorous theoretical guarantees for any one pass Tucker decomposition method when the linear sketch is formed from a random Gaussian matrix. We also propose to use a lower memory sketch, the Tensor Dimension Reduction Map, which achieves similar approximation error in experiments and significantly reduces memory and storage costs.

**Bart Vandereycken ***University of Geneva***Critical points of quadratic low-rank optimization problems**

The absence of spurious local minima in certain non-convex minimization problems, e.g. in the context of recovery problems in compressed sensing, has recently triggered much interest due to its important implications on the global convergence of optimization algorithms. One example is low-rank matrix sensing under rank restricted isometry properties. It can be formulated as a minimization problem for a quadratic cost function constrained to a low-rank matrix manifold, with a positive semidefinite Hessian acting like a perturbation of identity on cones of low-rank matrices. We present an approach to show strict saddle point properties and absence of spurious local minima for such problems under improved conditions on the restricted isometry constants. This is joint work with André Uschmajew.

**Nico Vervliet ***KU Leuven***On the decomposition of tensors that are given implicitly**

In practice, tensor decomposition algorithms commonly try to minimize the least-squares error between the model and a given tensor. However, in some cases, the tensor is not available as such, but is given only implicitly as a solution of a linear system or is defined by a subspace of a matrix. Such problems appear, for example, in multilinear classification, tensor regression, compressed sensing, blind system identification, systems of polynomial equations and algebraic tensor decomposition algorithms. As the explicit construction of the tensor may not be possible due to nonuniqueness or may be undesirable because of the curse of dimensionality, new types of algebraic and optimization-based algorithms are required. In this talk, I will discuss the challenges encountered when developing algorithms for these implicitly given tensors, and I will illustrate the results for some applications.

**Rose Yu ***Northeastern University***Fast and Interpretable Tensor Methods for Spatiotemporal Analysis**

Multivariate spatiotemporal data is ubiquitous in science and engineering, from sports analytics to neuroscience. Such data can be naturally represented as a multiway tensor. Tensor latent factor models provide a powerful tool for reducing the dimensionality and discovering the higher-order latent structures from data. However, existing tensor models are often slow or fail to yield latent factors that are easy to interpret by domain experts. In this talk, I will demonstrate advances in tensor methods to generate interpretable latent factors for high-dimensional spatiotemporal data. In particular, I will discuss (1) a multiresolution tensor learning algorithm, that can leverage the multicale property of high resolution spatial data, to speed up training and learn interpretable patterns. (2) a tensor latent feature learning algorithm that can learn binary representations of data that are both memory efficient and easy to interpret. We provide theoretical guarantees for our optimization algorithms and demonstrate their applications to real-world data from basketball plays and neuroscience.

## Date and Location

**April 01 - 05, 2019**

Max Planck Institute for Mathematics in the Sciences

Inselstraße 22

04103 Leipzig

Germany

## Scientific Organizers

**Evrim Acar**

Simula Metropolitan Center for Digital Engineering**André Uschmajew**

MPI for Mathematics in the Sciences**Nick Vannieuwenhoven**

KU Leuven

## Administrative Contact

**Saskia Gutzschebauch**

MPI für Mathematik in den Naturwissenschaften

Contact by Email