We discuss a new approach to find the CPD of a tensor that is not fully available. While standard techniques for tensor completion rely on random sampling of entries or on sampling full columns, rows as well as fibers (e.g. cross approximation), we follow a way in between --- we sample fibers in one mode and one mode only. We explain that under mild conditions this type of sampling allows us to reduce the computation of the CPD to rank-1 matrix completion. The reduction relies on linear algebra only. We also present generalizations, for fiber-sampled incomplete tensors, of algebraic conditions for CPD uniqueness.
This is joint work with Mikael Sorensen (University of Virginia, VA).
We describe a practical one-pass algorithm for Tucker decomposition. The algorithm forms a linear sketch of the tensor and uses this sketch to construct approximations of low or fixed Tucker rank. The linear sketch is easy to compute online or using distributed computing resources, which enables Tucker decomposition of extremely large scale tensors, even those too large to fit in memory. Applications include compression of PDE simulations, large scale sensor networks, and video. We provide the first rigorous theoretical guarantees for any one pass Tucker decomposition method when the linear sketch is formed from a random Gaussian matrix. We also propose to use a lower memory sketch, the Tensor Dimension Reduction Map, which achieves similar approximation error in experiments and significantly reduces memory and storage costs.
In practice, tensor decomposition algorithms commonly try to minimize the least-squares error between the model and a given tensor. However, in some cases, the tensor is not available as such, but is given only implicitly as a solution of a linear system or is defined by a subspace of a matrix. Such problems appear, for example, in multilinear classification, tensor regression, compressed sensing, blind system identification, systems of polynomial equations and algebraic tensor decomposition algorithms. As the explicit construction of the tensor may not be possible due to nonuniqueness or may be undesirable because of the curse of dimensionality, new types of algebraic and optimization-based algorithms are required. In this talk, I will discuss the challenges encountered when developing algorithms for these implicitly given tensors, and I will illustrate the results for some applications.
The electrocardiogram (ECG) is a biomedical signal that is widely used to monitor the heart and diagnose cardiac problems. Depending on the clinical need, the ECG is recorded with one or multiple leads (or channels) from different body locations. The signals from different ECG leads represent the cardiac activity in different spatial directions and are thus complementary to each other. In traditional methods, the ECG signal is represented as a vector or a matrix and processed to analyze temporal information. When multiple leads are present, most methods process each lead individually and combine decisions from all leads in a later stage. While this approach is popular, it fails to exploit the structural information captured by the different leads.
The use of tensors allows the analysis of multiple modes simultaneously, using both temporal and spatial information. Tensor decomposition methods have been applied on ECG signals to solve different clinical challenges. We discuss the power of tensor-based methods for different ECG appications through various examples. In order to highlight the exible nature of tensor-based methods, we showcase various ways to construct and decompose tensors, depending on the signal characteristics and the particular application.
The idea of reshaping an array with $2^d$ elements into a multidimensional $2\times \dots \times 2$ array and then applying tensor train (TT) decomposition is known under the name quantized TT decomposition (QTT). It has been shown in a number of works that arrays arising in the discretization of certain PDEs allow QTT representation with a small number of parameters. However, the quest for robust and at the same time efficient QTT algorithm to solve PDEs with three (and more) physical dimensions is not over yet. In this talk, we address this problem using the example of PDEs arising in electronic structure calculation with a new algorithm, which is capable of solving PDEs discretized using $2^{100}$ grid points within minutes of computational time on a laptop.
Representation learning is hugely successful in natural language processing and knowledge graph modelling. The basic concept is that entities or words are mapped to latent vectors of a given dimension and a shallow or a deep neural network then maps those representations to one or several probabilistic statements. We discuss relationships between representation learning and tensor factorization. We show how deep neural networks can be used to learn representations for new entities and new events in the industrial TIA selection tool and in visual perception. We will also discuss how tensor-trains can be used in combination with recurrent neural networks for video classification.
Multivariate spatiotemporal data is ubiquitous in science and engineering, from sports analytics to neuroscience. Such data can be naturally represented as a multiway tensor. Tensor latent factor models provide a powerful tool for reducing the dimensionality and discovering the higher-order latent structures from data. However, existing tensor models are often slow or fail to yield latent factors that are easy to interpret by domain experts. In this talk, I will demonstrate advances in tensor methods to generate interpretable latent factors for high-dimensional spatiotemporal data. In particular, I will discuss (1) a multiresolution tensor learning algorithm, that can leverage the multicale property of high resolution spatial data, to speed up training and learn interpretable patterns. (2) a tensor latent feature learning algorithm that can learn binary representations of data that are both memory efficient and easy to interpret. We provide theoretical guarantees for our optimization algorithms and demonstrate their applications to real-world data from basketball plays and neuroscience.
We show that in finite-dimensional nonlinear approximations, the best r-term approximant of a function almost always exists over complex numbers but that the same is not true over the reals. Our result extends to functions that possess special properties like symmetry or skew-symmetry under permutations of arguments. For the case where we use separable functions for approximations, the problem becomes that of best rank-r tensor approximations. We show that over the complex numbers, any tensor almost always has a unique best rank-r approximation. This extends to other notions of tensor ranks such as symmetric rank and alternating rank, to best r-block-terms approximations, and to best approximations by tensor networks. When applied to sparse-plus-low-rank approximations, we obtain that for any given r and k, a general tensor has a unique best approximation by a sum of a rank-r tensor and a k-sparse tensor with a fixed sparsity pattern. The existential (but not the uniqueness) part of our result also applies to best approximations by a sum of a rank-r tensor and a k-sparse tensor with no fixed sparsity pattern, as well as to tensor completion problems. This is joint work with Mateusz Michalek and Yang Qi.
Low-rank tensor decompositions can be extremely insightful when a model has high degrees of interaction between its inputs (in the ANOVA sense). Thanks to certain connections between tensor networks and weighted automata, we can now design efficient algorithms that measure and control the interplay between input variables. This can be done using any of a wide range of sensitivity metrics: effective dimensions, dimension distribution, Sobol indices and arbitrary aggregations thereof, and more. We will review those techniques and discuss how they open up new possibilities for informed modeling, optimization, and visualization of high-dimensional tensors.
Mazen Ali Ulm University, GermanyLow-Rank Approximability and Entropy Area Laws for PDEs
Henrik Eisenmann Technische Universität Berlin, GermanyA Jacobi-Davidson Method on the manifold of Tensors with fixed TT-rank
We present a generalization of the Jacobi-Davidson method to manifolds of tensors with fixed tensor-train rank. If eigenvectors of symmetric Operators are known to be of approximately low rank, this method inherits the advantages of the original Jacobi-Davidson method while staying computational feasible. For this purpose a correction equation is derived by approximating the Riemannian Hessian of the Rayleigh-quotient on the tensor-train manifold for a Riemannian Newton method. It is shown how to approximately solve the correction equation, and finally ground states of molecular Hamiltonians are computed as numerical experiments.
Eric Evert KU Leuven, BelgiumExistence of best low rank approximations for tensors of order 3
Low rank tensor decomposition is a fundamental tool in many applications, includingdata analysis and machine learning. In practice we work with a measurement of a lowrank tensor which is corrupted by additive noise. Generically, this measurement has highrank. High rank tensors may fail to have best low rank approximations, and low rankdecompositions are uninterpretable when this failure occurs. We provide guarantees for theexistence of best low rank approximations over $\mathcal{R}$.
Zeynep Gündoğar Fatih Sultan Mehmet Vakıf Üniversity, TurkeyStatistics Rooted Decomposition Method: Tridiagonal Folmat Enhanced Multivariance Products Representation
This research presents a novel statistics-rooted decomposition method for multidimensional arrays which is called “Tridiagonal Folmat Enhanced Multivariance Products Representation (TFEMPR)”. It has been developed for decomposing multidimensional arrays with the philosophy behind Tridiagonal Matrix Enhanced Multivariance Products Representation (TMEMPR). TMEMPR method decomposes a matrix having finitely or infinitely many rows and columns to a sum of outer products whose coefficients can be gathered into a core matrix which is tridiagonal in contrast to the diagonal matrix structure of the Singular Value Decomposition (SVD). TFEMPR can be considered as higher order analogue of matrix decomposition since it focuses on folmats and folvecs which are in fact high order arrays counterparts of ordinary linear algebraic matrices and vectors.TFEMPR has widespread application area such as image and video processing, reconstruction problems, data compression and etc. In this research, some experimental results of applications will be given.
Kirandeep Kour Max Planck Institute Magdeburg, GermanyNonlinear classification: A Kernelized Support Tensor Train Machine
Patrick Kürschner KU Leuven, BelgiumMultivariate polynomial root finding with tensor techniques
We show how the root finding problem for systems of multivariate polynomials can solved in terms of tensor decompositions. One way is the reformulation of the polynomial system as linear system of equations where the solution vectors allow a representation as canonical polyadic decomposition of a tensor. As a consequence, this enables the application of specialized numerical solvers which exploit this structure. Another approach is to use the Macaulay matrix, whose null space admits a representation as tensor decomposition which reveals the roots.
Lana Perisa EPFL, SwitzerlandRandomized methods for recompresion of low rank tensors
Many basic linear algebra operations with low-rank tensors, like addition, matrix-vector multiplication or element-wise product, have a tendency to significantly increase the rank, even though the resulting tensors admit a good low-rank approximation. We use randomized algorithm to recompress these tensors when dealing with low-rank formats such as Tucker and Tensor Train, by employing random vectors with rank-1 structure which matches the structure of the tensors. In case of element-wise product of tensors, this has shown to significantly reduce the computational effort, while achieving a similar accuracy as the corresponding deterministic techniques.
Luca Sodomaco University of Florence, ItalyThe ED polynomial of the dual varieties of Segre-Veronese varieties
We outline the main properties of the ED polynomial of a real algebraic variety, where ED stands for "Euclidean Distance". Then we focus on ED polynomials of dual varieties of Segre-Veronese varieties, showing a close relationship with the Spectral Theory of partially-symmetric tensors. Their roots correspond to the singular values of a partially symmetric tensor. In particular, we investigate their lowest and highest coefficients and their corresponding vanishing loci.
Markus Wageringel Osnabrück University, GermanyMoment ideals of local Dirac mixtures
Moments are quantities that measure the shape of statistical or stochastic objects and have recently been studied from a more algebraic and combinatorial point of view. We study a special case of finite mixture models, the Dirac local mixtures, highlighting connections to algebraic statistics and signal processing. We focus on the moment varieties of first order local mixtures, providing generators for the ideals and showing a connection to moment ideals of Pareto distributions.Further, we consider mixture models of these distributions and investigate the problem of recovering the parameters of such a distribution from its low-rank moment matrix, using an extension of Prony's method. We showcase our results with an application in signal processing.This is joint work with Alexandros Grosdos Koutsoumpelias (Osnabrück University).
For solving solve high-dimensional PDE's which can be casted into a variational framework restricted to appropriate possibly non-linear and even non-convex model classes. In Variational Monte Carlo we replace our objective functional by an empirical functional, in a similar way as for risk minimization or loss functions in statistical learning. For the optimization we need only computable gradients at sample points.At a price, we consider {\em convergence in probability}, i.e. error estimates holds with high probability. The analysis is carried out in the spirit of Cucker and Smale, and first estimates about covering numbers for hierarchical tensors are available. As a model class we consider hierarchical tensors (HT) mainly in TT form.
This approach has been carried out earlier for parametric problems in uncertainty quantifcation (with M. Eigel, P. Trunschke and S. Wolf). We present in this talk an application to approximate the meta-stable eigenfunctions of the corresponding Backward Kolmogorov operator by numerical approximation of the transfer operator (also know as Koopman operator), and vice versa the Fokker Planck operator (with K. Fackeldey, F. N\"uske and F. Noe). We further consider the approximation of the value function for a closed loop bilinear control problem, described by the Hamilton Jacobi equation. The policy iteration from dynamic programming consists in alternating between updating the optimal feedback law and the solution of an inhomogneous Backward Kolmogorov equation. We solve the latter by means of HT/TT tensors and variational Monte Carlo. First numerical results are presented (with L. Sallandt and M. Oster).
The absence of spurious local minima in certain non-convex minimization problems, e.g. in the context of recovery problems in compressed sensing, has recently triggered much interest due to its important implications on the global convergence of optimization algorithms. One example is low-rank matrix sensing under rank restricted isometry properties. It can be formulated as a minimization problem for a quadratic cost function constrained to a low-rank matrix manifold, with a positive semidefinite Hessian acting like a perturbation of identity on cones of low-rank matrices. We present an approach to show strict saddle point properties and absence of spurious local minima for such problems under improved conditions on the restricted isometry constants. This is joint work with André Uschmajew.
In this presentation, I will present a new result on the identifiability of Sparse Component Analysis (SCA). SCA is the following problem: given an input matrix M and an integer r, find a dictionary D with r columns and a sparse matrix B with r rows such that M=DB. A key issue in SCA is identifiability, that is, characterizing the conditions under which D and B are essentially unique (that is, they are unique up to permutation and scaling of the columns of D and rows of B). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of data points (that is, columns of M) that leads to identifiability. In a recent work, we provided new results in the deterministic scenario when the data has a low-rank structure, that is, when D has rank r, drastically improving with respect to previous results. In particular, we show that if each column of B contains at least s zeros then O(r^3/s^2) data points are sufficient to obtain an essentially unique decomposition, as long as these data points are well spread among the subspaces spanned by r-1 columns of D. This implies for example that for a fixed proportion of zeros (constant and independent of r, e.g., 10% of zero entries in B), one only requires O(r) data points to guarantee identifiability.
I will discuss how the study of the Hilbert function and Cayley-Bacharach property of finite subsets in the complex projective space provide information on the minimality and identifiability of symmetric tensors, even beyond the range of applicability of Kruskal's criterion. As an application, I will describe some examples in the case of ternary forms. (joint works with Luca Chiantini, Andrea Mazzon, Nick Vannieuwenhoven).
Variational Bayesian inference has become a prominent framework within machine learning to account for parameter uncertainty rather than relying on maximum likelihood point estimates. In this talk it will be demonstrated that modeling uncertainty in tensor decomposition can have a profound impact in terms of 1) the tensor model's ability to predict missing data particularly when dealing with sparsely observed data, and 2) providing robustness to model misspecification when the number of components are incorrectly specified. This will be highlighted in the context of three prominent tensor decomposition approaches; the (non-negative) PARAFAC/Canonical Polyadic Decomposition (CPD), the PARAFAC2 model, and the Tensor Train Decomposition (TTD). The talk will further outline ongoing efforts in generating a general probabilistic n-way toolbox. The approaches are examined considering applications within neuroimaging and chemometrics.
Adaptive cross approximation (ACA) is a popular techniques to construct low-rank approximations from entry-wise queries of a matrix or function. In the first part of this talk, we present a non-asymptotic a priori error analysis of ACA. This allows us to identify a nontrivial class of matrices for which the error of ACA is guaranteed to stay close to the best low-rank approximation error. In the second part of this talk, we present a novel extension of ACA to parameter-dependent matrices, as they arise - for example - in uncertainty quantification. This also yields a new variant of ACA for tensors of third order. If time permits, the modified ACA is compared with Riemannian optimizaton techniques. This talk is based on joint work with Alice Cortinovis, Stefano Massei, and others.
We propose a new hybrid approach for matrix- and tensor-based recommender systems. Unlike the majority of hybrid recommenders, it directly ties collaborative user behavior with additional side information in an intuitive and straightforward way. It not only helps to address the problem of extreme data sparsity, but also allows to naturally exploit patterns in the observed interactions for constructing a compact and meaningful representation of user intents. We demonstrate the effectiveness of the proposed model on several standard benchmark datasets. The general formulation of the approach imposes no restrictions on the type of observed interactions and makes it potentially applicable for joint modelling of any type of contextual information along with side data.
Multidimensional data, coming from scientific applications such as numerical simulation or multimodal imaging, can often overwhelm the memory or computational resources of a single workstation. In this talk, we’ll describe parallel algorithms and software implementations for computing both CP and Tucker decompositions of large, dense tensors. The open-source software is designed for clusters of computers and has been benchmarked on various supercomputers. The algorithms are scalable, able to process terabyte-sized tensors and maintain high computational efficiency for 100s to 1000s of processing nodes. We will detail the data distribution and parallelization strategies for the key computational kernels within the algorithms, which include the matricized-tensor times Khatri-Rao product, the tensor times matrix product, and computation of (structured) Gram matrices.
We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation. This characterization also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. We show that this is an effective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2\% to 5.3\%.
We present a data analysis approach where simulations of an industrial product are contained in the space of surface meshes embedded in R^3. In this setting, a discrete Laplace-Beltrami operator can be constructed on the mesh, which is invariant to isometric transformations and therefore valid for all simulations. The eigenfunctions of such an operator are used as a common basis for all (isometric) simulations and one can use the projection coefficients instead of the full simulations for further analysis.
The data analysis approach is applied to time dependent datasets from numerical car crash simulations. One observes that only a few spectral coefficients are necessary to describe the data variability and low dimensional structures are obtained. The eigenvectors are seen to recover different independent variation modes such as translation, rotation, or global and local deformations. An effective analysis of the data from bundles of numerical simulations is made possible, in particular an analysis for many simulations in time.
Tree tensor networks such as the tensor train format are a common tool for high dimensional problems. The associated multivariate rank and accordant tuples of singular values are based on different matricizations of the same tensor. While the behavior of these singular values is as essential as in the matrix case, here the question about the \textit{feasibility} of specific constellations arises: which prescribed tuples can be realized as singular values of a tensor and what is this feasible set?
We first show the equivalence of the \textit{tensor feasibility problem (TFP)} to the \textit{quantum marginal problem (QMP)}. In higher dimensions, in case of the tensor train (TT-)format, the conditions for feasibility can be decoupled. By present results for three dimensions for the QMP, it then follows that the tuples of squared, feasible TT-singular values form polyhedral cones. We further establish a connection to eigenvalue relations of sums of Hermitian matrices, which in turn are described by sets of interlinked, so called \textit{honeycombs}, as they have been introduced by Knutson and Tao.
Besides a large class of universal, necessary inequalities as well as the vertex description for a special, simpler instance, we present a linear programming algorithm to check feasibility and a simple, heuristic algorithm to construct representations of tensors with prescribed, feasible TT-singular values in parallel.
I will discuss the existence of an open set of n1× n2× n3 tensors of rank r on which a popular and eﬃcient class of algorithms for computing tensor rank decompositions is numerically unstable. Algorithm of this class are based on a reduction to a linear matrix pencil, typically followed by a generalized eigendecomposition. The analysis shows that the unstability is caused by the fact that the condition number of the tensor rank decomposition can be much larger for n1×n2×2 tensors than for the n1×n2×n3 input tensor. Joint work with Carlos Beltran and Nick Vannieuwenhoven.