In the field of causality we want to understand how a system reacts under interventions (e.g. in gene knock-out experiments). These questions go beyond statistical dependences and can therefore not be answered by standard regression or classification techniques. In this tutorial you will learn about the interesting problem of causal inference and recent developments in the field. No prior knowledge about causality is required.
Part 1: We introduce structural causal models and formalize interventional distributions. We define causal effects and show how to compute them if the causal structure is known.
Part 2: We present three ideas that can be used to infer causal structure from data: (a) finding (conditional) independences in the data, (b) restricting structural equation models and (c) exploiting the fact that causal models remain invariant in different environments.
Part 3: We show ideas on how more classical machine learning problems could benefit from causal concepts.
Markov chain Monte Carlo (MCMC), more than any other tool, has fueled the Bayesian revolution in applied statistics. However MCMC can go badly wrong, in particular in high dimensions. Hamiltonian Monte Carlo (HMC) is an approach that simulates a physical system in order to adaptively and efficiently sample from a high-dimension (10s-of-thousands of parameters) target distribution. I'll introduce the approach, show a minimal working algorithm, and discuss some of the most common difficulties in implementation.
One of the main aims of categorical data analysis is to infer the association structure in multivariate discrete distributions that can be described using a contingency table. Multiple measures of association were proposed for the simplest case, the 2x2 table, with the odds ratio being the only measure that is variation independent from the marginal distributions of the table. The interaction in higher-dimensional tables can also be described using odds ratios of different types, and the variation independence entails that the lower order marginal distributions in a contingency table do not carry any information about higher order interactions. As illustrated by examples, the conditional independence does not necessarily follow from the marginal independence, and a reversal in the direction of association between marginal and conditional distributions, known as Simpson’s paradox, may also occur.
Hierarchical log-linear models are a conventional tool for describing association in a multiway complete contingency table, and can be specified by setting certain odds ratios in the table equal to one. When the data are represented by an incomplete table, the traditional log-linear models and their quasi variants do not always provide a good description of the association structure. These models assume the existence of a parameter common to all cells, the overall effect, which is not necessarily justified when the absent cells do not exist logically or in a particular population. Some examples, where models without the overall effect arise naturally, are given, and the consequences of adding a normalizing constant (the overall effect) to these models are discussed.
Johannes Ernst-Emanuel Buck TUM, GermanyRecursive max-linear models with propagating noise
James Cheshire Otto-von-Guericke-Universität, Magdeburg, GermanyOn the Thresholding Bandit Problem
We are concerned with a specific pure exploration stochastic bandit problem. The learner is presented with a number of "arms", each corresponding to some unknown distribution. The learner samples arms sequentially up to a fixed time horizon with the goal finding the set of arms whose means are above a given threshold. We consider two cases, structured and unstructured. In the structured case one has the additional information that the means form a monotonically increasing sequence. During this poster we first present the problem, then propose an approach in the unstructured case. Finally we consider the difficulties that arise when one wishes to extend to the structured case.
Dakota Cintron University of Connecticut, USAOn the Relation of Latent Variable Models and Netwok Psychometrics
Tim Fuchs TU Munich, GermanySpherical t-designs
A spherical t-design is a finite set on the sphere such that the discrete average of any polynomial of degree at most t equals its uniform average (i.e. the normalized integral over the sphere). Spherical designs have further properties which make them useful for seemingly unrelated problems, e.g., derandomization of PhaseLift.While the existence of spherical t-designs for any t and dimension n can be shown, explicit examples are only known for a small range of parameters t and n. Therefore, the focus is on so called approximate spherical t-designs for which the discrete average only nearly equals the uniform average.We show that sufficiently large samples of the uniform distribution on the unit sphere yield an approximate t-design with high probability.
Felix Gnettner Otto-von-Guericke-Universität Magdeburg, GermanyDepth-based two sample testing
Alexandros Grosdos Universität Osnabrück, GermanyCertifying Solutions in Log-Concave Maximum Likelihood Estimation
In nonparametric statistics one abandons the requirement that a probability density function belongs to a statistical model. Given a sample of points, we want to find the best log-concave distribution that fits this data. This problem was originally studied by statisticians, who found that the optimal solutions have probability density functions whose logarithm is piecewise-linear and used numerical methods to approximate the pdf. In this work we use algebraic and combinatorial methods to provide exact solutions to this problem. At the same time we use tools from algebraic geometry to test if the solutions provided by statistical software can be certified to be correct.This is joint work with Alexander Heaton, Kaie Kubjas, Olga Kuznetsova, Georgy Scholten, and Miruna-Stefana Sorea. This poster is complementary to the one by Olga Kuznetsova.
Lucas Kania Università della Svizzera italiana (USI), SwitzerlandOptimal Experimental Design for Causal Discovery
Given a target covariate Y and a set of covariates X, our goal is to discover if a causal set of covariates exists in X.Previous research (J. Peters et al. Causal inference using invariant prediction: identification and confidence intervals, Journal of the Royal Statistical Society, Series B 78(5):947-1012, 2016) assumed that the experimenters already had in their possession a sample of the joint distribution of all covariates (i.e. a sample from the observational distribution), and samples from different interventions at one or more covariates from X.A more realistic framework is to consider that the experimenters possess a sample from the joint distribution of the covariates, but have not performed any intervention yet. Thus, a set of interventions must be determined under an experimental budget (i.e. only N experiments can be performed).In this work, we restrict ourselves to the particular case of linear structural equations with additive noise and derive a test that tests if a particular set is the true casual set based on the invariance principle. The power of this test depends on the particular experiments that are performed. Thus, we determine the set of N experiments that provides the most power.
Sven Klaaßen University of Hamburg, GermanyUniform Inference in Gerneralized Additive Models
Philipp Klein Otto-von-Guericke-Universität Magdeburg, GermanyUsing MOSUM statistics to estimate change points in processes satisfying strong invariance principles
Change point analysis aims at finding structural breaks in stochastic processes and therefore plays an important role in a variety of fields, e.g. in finance, neuroscience and engineering.In this work, we consider stochastic point processes that satisfy strong invariance principles, e.g. renewal processes or partial sum processes.We present a procedure first introduced by Messer et al. (2014) that uses moving sum (MOSUM) statistics in order to estimate the location of changes in the mean within these processes. With the help of the strong invariance principles, we are able to prove convergence rates for the rescaled estimators both in the case of fixed and local mean changes and are able to show that those rates cannot be improved in general.
Jens Kley-Holsteg Univeristy of Duisburg-Essen, GermanyProbabilisitc short-term water demand forcasting with lasso
Water demand is a highly important variable for operational control and decisionmaking. Hence, the development of accurate forecasts is a valuable field of research tofurther improve the efficiency of water utilities. Focusing on probabilistic multi-stepahead forecasting, a time series model is introduced, to capture typical autoregressive, calendar and seasonal effects, to account for time-varying variance, and to quantify the uncertainty and path-dependency of the water demand process. To deal with the high complexity of the water demand process a high-dimensional predictor space is applied, which is efficiently tuned by an automatic shrinkage and selection operator (lasso). It allows to obtain an accurate, simple interpretable and fast computable forecasting model, which is well suited for real-time applications. Moreover, as in practice the control of storage capacities for balancing demand peaks or ensuring the smooth operation of pumps is of considerable relevance, the probabilistic forecasting framework allows not only for simulating the expected demand and marginal properties, but also the correlation structure between hours within the forecasting horizon. To appropriately evaluate the forecasting performance of the considered models, the energy score (ES) as a strictly proper multidimensional evaluation criterion, is introduced. The methodology is applied to the hourly water demand data of a German water supplier.
Peter Koleoso University of Ibadan, NigeriaA Three-Parameter Gompertz-Lindley Distribution: Its Properties And Applications
Abstract: This research proposed a three-parameter probability distribution called Gompertz-Lindley distribution using Gompertz generalized (Gompertz-G) family of distributions. The distribution was proposed as an improvement on the Lindley distribution, for greater flexibility when modelling real life and survival data. The mathematical properties of the distribution such as moment, moment generating function, survival function and hazard function were derived. The parameters of the distribution were estimated using the method of maximum likelihood and the distribution was applied to model the strength of glass fibres. The proposed Gompertz-Lindley distribution performed best in modelling the strength of glass fibres (AIC = 62.8537), followed by Generalized Lindley distribution (AIC = 77.6237). The Lindley distribution had the least performance with the highest information criterion values.Keywords: Gompertz distribution; Lindley distribution; moment; survival function; hazard function.ReferencesD. V. Lindley (1958): Fiducial Distributions and Bayes' Theorem. Journal of Royal Statistical Society, Series B (Methodological) 20(1):102-107.M. Alizadeh, G. M. Cordeiro, L. G. B. Pinho et al. (2017): The Gompertz-G Family of Distributions. Journal of Statistical Theory and Practice 11(1):179–207. doi:10.1080/15598608.2016.1267668.M. Bourguignon, R. B. Silva and G. M. Cordeiro (2014): The Weibull-G Family of Probability Distributions. Journal of Data Science 12:53-68.M. Mansour, G. Aryal, A. Z. Afify et al. (2018): The Kumaraswamy Exponentiated Frechet Distribution. Pakistan Journal of Statistics 34(3):177-193.P. E. Oguntunde, O. S. Balogun, H. I. Okagbue et al. (2015): The Weibull-Exponential Distribution: its Properties and Applications. Journal of Applied Sciences 15(11):1305-1311.W. Barreto-Souza, G. M. Cordeiro and A. B. Simas (2011): Some Results for Beta Frechet Distribution. Communications in Statistics - Theory and Methods 40(5):798-811.
Insan-Aleksandr Latipov National Research University Higher School of Economics, RussiaStudying the Effect of Multilateral Trade Flows in a Gravity Trade Model: A Network Approach
This research provides a new way to estimate gravity model of international trade by taking into account the multilateral trade flows. A classical gravity trade model neglects the fact that each country maintains the trade relationships with other partners at the same time. It was revealed that omitting this fact leads to inconsistencies in model estimation. The main impediment for practical applying and interpretation of gravity models lies in the fact that there is no theoretical foundation for fitting trade barriers and trade costs factors into the model. The absence of theoretical basis for such features results into breaking cause-and-effect relationships and biased statistical estimates. This research develops Anderson and van Wincoop’s theoretical justification for gravity model of trade by using a network analysis approach. By assuming that centrality measures of International Trade Network (ITN) reflects the extent of integration into multilateral trade it is possible to improve foundation for Anderson’s multilateral trade resistance factor. It is presented in estimating the contribution of multilateral trade flows to the size of bilateral trade by using centrality measures. As for bilateral trade resistance factors, the CEPII BACI’s methodology was applied to give a strong rationale for choosing corresponding estimations. The gravity model of trade was used in case of group of countries with presenting the evaluation of multilateral trade flows contribution. Coherence with theoretical foundation makes a network approach more robust in case of trade between number of countries.
Sourav Majumdar Indian Institute of Management, Ahmedabad, IndiaFinancial time series classification using topological data analysis
We propose a method for financial time series classification. We use the framework of persistent homology and time delay embedding to analyse time series data. We consider time series of stocks labelled by their sectors. We then use topological features, persistence landscapes, to predict the sector from the time series of the stock. We find that our method is very accurate on the sample considered. Our method outperform several existing methods in time series classification.
Sascha Meyen Universität Tübingen, GermanyA Theoretical Perspective on the Allocation of Informational Resources in Classification Tasks
In classification tasks, an optimal allocation of limited resources is desirable. Much work has been done to achieve good classification accuracy in a wide variety of tasks and with different architectures, for example, neural networks classifying image data. However, there is a lack of fundamental theoretical consideration regarding the optimal allocation of limited informational resources: Should all image types be learned equally well or should architectures specialize to certain image types? To approach this topic, I go beyond the typical performance measure of classification accuracy and turn to Shannon’s Information Theory. I show how the two measures, classification accuracy and mutual information, are related and that specialization to certain types of images represents an optimal resource allocation in the sense that mutual information is increased even if classification accuracy remains constant. One benefit of specialization can be observed in the framework of confidence weighted majority voting, where multiple architectures perform the same classification task and their predictions are combined. When combining specialized architectures with the same classification accuracy, their ensemble classification performance exceeds the performance of ensembles consisting out of unspecialized architectures. This result provides theoretical guidance for improving architectures in classification tasks.
Gilles Mordant UCLouvain, BelgiumMultivariate goodness-of-fit tests based on Wasserstein distance
On this poster, we will present goodness-of-fit tests based on the Wasserstein distance and exhibit some of their properties as well as some results from a simulation study. The results exhibit that the proposed tests enjoy good discriminating power between the null and the alternative hypotheses, both for simple and composite null hypotheses. Further, the empirical Wasserstein distance is proved to converge to zero uniformly over some families of models, a result of potential independent interest.
Hoang Nguyen Örebro University, SwedenVariational Inference for high dimensional structured factor copulas
Factor copula models have been recently proposed for describing the joint distribution of a large number of variables in terms of a few common latent factors. In this paper, we employ a Bayesian procedure to make fast inferences for multi-factor and structured factor copulas. To deal with the high dimensional structure, a Variational Inference (VI) algorithm is applied to estimate different specifications of factor copula models. Compared to the Markov Chain Monte Carlo (MCMC) approach, the variational approximation is much faster and could handle a size-able problem in limited time. Another issue of factor copula models is that the bivariate copula functions connecting the variables are unknown in high dimensions. We derive an automatic procedure to recover the hidden dependence structure. By taking advantage of the posterior modes of the latent variables, we select the bivariate copula functions based on minimizing the Bayesian Information Criterion (BIC). Simulation studies in different contexts show that the procedure of bivariate copula selection could be very accurate in comparison to the true generated copula model. The proposed procedure is illustrated with two high dimensional real datasets.
Claude Renaux ETH Zürich, SwitzerlandGroup Inference in High Dimensions with Applications to Hierarchical Testing
The development of high-dimensional group inference is an essential part of statistical methods for analyzing complex data sets, including hierarchical testing, tests of interaction, detection of heterogeneous treatment effects and local heritability. Group inference in regression models can be measured with respect to a weighted quadratic functional of the regression sub-vector corresponding to the group. Asymptotically unbiased estimators of these weighted quadratic functionals are constructed and a procedure using these estimator for inference is proposed. We derive its asymptotic Gaussian distribution which allows to construct asymptotically valid confidence intervals and tests which perform well in terms of length or power.
Maximilian Steffen Universität Hamburg, GermanyPAC-Bayesian Estimation in High-Dimensional Multi-Index Models
Bennet Ströh Ulm University, GermanyWeak dependence of mixed moving average fields and statistical applications
We introduce an easily accessible new notion of weak dependence for random fields suitable for a general concept of causality (so called θ-lex-weak dependence). Based on this measure of dependence we derive a central limit theorem for stationary random fields under mild moment assumptions.As example we consider a mixed moving average (MMA) field X driven by a Lévy basis and prove that it is weakly dependent with rates computable in terms of the moving aver- age kernel and the characteristic quadruple of the Lévy basis. Using this property, we show conditions ensuring that the sample mean and autocovariances of X have a limiting normal distribution.
Theja Tulabandhula University of Illinois at Chicago, USABlock-Structure Based Time-Series Models For Graph Sequences
Although the computational and statistical trade-off for modeling single graphs, for instance, using block models is relatively well understood, extending such results to sequences of graphs has proven to be difficult. In this work, we take a step in this direction by proposing two models for graph sequences that capture: (a) link persistence between nodes across time, and (b) community persistence of each node across time. In the first model, we assume that the latent community of each node does not change over time, and in the second model we relax this assumption suitably. For both of these proposed models, we provide statistically and computationally efficient inference algorithms, whose unique feature is that they leverage community detection methods that work on single graphs. We also provide experimental results validating the suitability of our models and methods on synthetic and real instances.
Lea Wegner Otto-von-Guericke Universität Magdeburg, GermanyOptimal order of regret for Partial Monitoring
The lectures give an introduction to linear structural equation models with a focus on issues arising from the presence of latent variables or feedback loops. The opening lecture will highlight the models’ causal interpretation, their representation in terms of directed graphs, and the rich algebraic structure that emerges in the special case of linear structural equations. The subsequent two lectures will treat problems involving latent variables or feedback loops. We will present methods to decide parameter identifiability, review results on conditional independence relations and their use in model selection methods, and discuss relations among covariances that go beyond conditional independence.
Johannes Ernst-Emanuel Buck TUM, GermanyRecursive max-linear models with propagating noise
James Cheshire Otto-von-Guericke-Universität, Magdeburg, GermanyOn the Thresholding Bandit Problem
We are concerned with a specific pure exploration stochastic bandit problem. The learner is presented with a number of "arms", each corresponding to some unknown distribution. The learner samples arms sequentially up to a fixed time horizon with the goal finding the set of arms whose means are above a given threshold. We consider two cases, structured and unstructured. In the structured case one has the additional information that the means form a monotonically increasing sequence. During this poster we first present the problem, then propose an approach in the unstructured case. Finally we consider the difficulties that arise when one wishes to extend to the structured case.
Dakota Cintron University of Connecticut, USAOn the Relation of Latent Variable Models and Netwok Psychometrics
Tim Fuchs TU Munich, GermanySpherical t-designs
A spherical t-design is a finite set on the sphere such that the discrete average of any polynomial of degree at most t equals its uniform average (i.e. the normalized integral over the sphere). Spherical designs have further properties which make them useful for seemingly unrelated problems, e.g., derandomization of PhaseLift.While the existence of spherical t-designs for any t and dimension n can be shown, explicit examples are only known for a small range of parameters t and n. Therefore, the focus is on so called approximate spherical t-designs for which the discrete average only nearly equals the uniform average.We show that sufficiently large samples of the uniform distribution on the unit sphere yield an approximate t-design with high probability.
Felix Gnettner Otto-von-Guericke-Universität Magdeburg, GermanyDepth-based two sample testing
Alexandros Grosdos Universität Osnabrück, GermanyCertifying Solutions in Log-Concave Maximum Likelihood Estimation
In nonparametric statistics one abandons the requirement that a probability density function belongs to a statistical model. Given a sample of points, we want to find the best log-concave distribution that fits this data. This problem was originally studied by statisticians, who found that the optimal solutions have probability density functions whose logarithm is piecewise-linear and used numerical methods to approximate the pdf. In this work we use algebraic and combinatorial methods to provide exact solutions to this problem. At the same time we use tools from algebraic geometry to test if the solutions provided by statistical software can be certified to be correct.This is joint work with Alexander Heaton, Kaie Kubjas, Olga Kuznetsova, Georgy Scholten, and Miruna-Stefana Sorea. This poster is complementary to the one by Olga Kuznetsova.
Lucas Kania Università della Svizzera italiana (USI), SwitzerlandOptimal Experimental Design for Causal Discovery
Given a target covariate Y and a set of covariates X, our goal is to discover if a causal set of covariates exists in X.Previous research (J. Peters et al. Causal inference using invariant prediction: identification and confidence intervals, Journal of the Royal Statistical Society, Series B 78(5):947-1012, 2016) assumed that the experimenters already had in their possession a sample of the joint distribution of all covariates (i.e. a sample from the observational distribution), and samples from different interventions at one or more covariates from X.A more realistic framework is to consider that the experimenters possess a sample from the joint distribution of the covariates, but have not performed any intervention yet. Thus, a set of interventions must be determined under an experimental budget (i.e. only N experiments can be performed).In this work, we restrict ourselves to the particular case of linear structural equations with additive noise and derive a test that tests if a particular set is the true casual set based on the invariance principle. The power of this test depends on the particular experiments that are performed. Thus, we determine the set of N experiments that provides the most power.
Sven Klaaßen University of Hamburg, GermanyUniform Inference in Gerneralized Additive Models
Philipp Klein Otto-von-Guericke-Universität Magdeburg, GermanyUsing MOSUM statistics to estimate change points in processes satisfying strong invariance principles
Change point analysis aims at finding structural breaks in stochastic processes and therefore plays an important role in a variety of fields, e.g. in finance, neuroscience and engineering.In this work, we consider stochastic point processes that satisfy strong invariance principles, e.g. renewal processes or partial sum processes.We present a procedure first introduced by Messer et al. (2014) that uses moving sum (MOSUM) statistics in order to estimate the location of changes in the mean within these processes. With the help of the strong invariance principles, we are able to prove convergence rates for the rescaled estimators both in the case of fixed and local mean changes and are able to show that those rates cannot be improved in general.
Jens Kley-Holsteg Univeristy of Duisburg-Essen, GermanyProbabilisitc short-term water demand forcasting with lasso
Water demand is a highly important variable for operational control and decisionmaking. Hence, the development of accurate forecasts is a valuable field of research tofurther improve the efficiency of water utilities. Focusing on probabilistic multi-stepahead forecasting, a time series model is introduced, to capture typical autoregressive, calendar and seasonal effects, to account for time-varying variance, and to quantify the uncertainty and path-dependency of the water demand process. To deal with the high complexity of the water demand process a high-dimensional predictor space is applied, which is efficiently tuned by an automatic shrinkage and selection operator (lasso). It allows to obtain an accurate, simple interpretable and fast computable forecasting model, which is well suited for real-time applications. Moreover, as in practice the control of storage capacities for balancing demand peaks or ensuring the smooth operation of pumps is of considerable relevance, the probabilistic forecasting framework allows not only for simulating the expected demand and marginal properties, but also the correlation structure between hours within the forecasting horizon. To appropriately evaluate the forecasting performance of the considered models, the energy score (ES) as a strictly proper multidimensional evaluation criterion, is introduced. The methodology is applied to the hourly water demand data of a German water supplier.
Peter Koleoso University of Ibadan, NigeriaA Three-Parameter Gompertz-Lindley Distribution: Its Properties And Applications
Abstract: This research proposed a three-parameter probability distribution called Gompertz-Lindley distribution using Gompertz generalized (Gompertz-G) family of distributions. The distribution was proposed as an improvement on the Lindley distribution, for greater flexibility when modelling real life and survival data. The mathematical properties of the distribution such as moment, moment generating function, survival function and hazard function were derived. The parameters of the distribution were estimated using the method of maximum likelihood and the distribution was applied to model the strength of glass fibres. The proposed Gompertz-Lindley distribution performed best in modelling the strength of glass fibres (AIC = 62.8537), followed by Generalized Lindley distribution (AIC = 77.6237). The Lindley distribution had the least performance with the highest information criterion values.Keywords: Gompertz distribution; Lindley distribution; moment; survival function; hazard function.ReferencesD. V. Lindley (1958): Fiducial Distributions and Bayes' Theorem. Journal of Royal Statistical Society, Series B (Methodological) 20(1):102-107.M. Alizadeh, G. M. Cordeiro, L. G. B. Pinho et al. (2017): The Gompertz-G Family of Distributions. Journal of Statistical Theory and Practice 11(1):179–207. doi:10.1080/15598608.2016.1267668.M. Bourguignon, R. B. Silva and G. M. Cordeiro (2014): The Weibull-G Family of Probability Distributions. Journal of Data Science 12:53-68.M. Mansour, G. Aryal, A. Z. Afify et al. (2018): The Kumaraswamy Exponentiated Frechet Distribution. Pakistan Journal of Statistics 34(3):177-193.P. E. Oguntunde, O. S. Balogun, H. I. Okagbue et al. (2015): The Weibull-Exponential Distribution: its Properties and Applications. Journal of Applied Sciences 15(11):1305-1311.W. Barreto-Souza, G. M. Cordeiro and A. B. Simas (2011): Some Results for Beta Frechet Distribution. Communications in Statistics - Theory and Methods 40(5):798-811.
Insan-Aleksandr Latipov National Research University Higher School of Economics, RussiaStudying the Effect of Multilateral Trade Flows in a Gravity Trade Model: A Network Approach
This research provides a new way to estimate gravity model of international trade by taking into account the multilateral trade flows. A classical gravity trade model neglects the fact that each country maintains the trade relationships with other partners at the same time. It was revealed that omitting this fact leads to inconsistencies in model estimation. The main impediment for practical applying and interpretation of gravity models lies in the fact that there is no theoretical foundation for fitting trade barriers and trade costs factors into the model. The absence of theoretical basis for such features results into breaking cause-and-effect relationships and biased statistical estimates. This research develops Anderson and van Wincoop’s theoretical justification for gravity model of trade by using a network analysis approach. By assuming that centrality measures of International Trade Network (ITN) reflects the extent of integration into multilateral trade it is possible to improve foundation for Anderson’s multilateral trade resistance factor. It is presented in estimating the contribution of multilateral trade flows to the size of bilateral trade by using centrality measures. As for bilateral trade resistance factors, the CEPII BACI’s methodology was applied to give a strong rationale for choosing corresponding estimations. The gravity model of trade was used in case of group of countries with presenting the evaluation of multilateral trade flows contribution. Coherence with theoretical foundation makes a network approach more robust in case of trade between number of countries.
Sourav Majumdar Indian Institute of Management, Ahmedabad, IndiaFinancial time series classification using topological data analysis
We propose a method for financial time series classification. We use the framework of persistent homology and time delay embedding to analyse time series data. We consider time series of stocks labelled by their sectors. We then use topological features, persistence landscapes, to predict the sector from the time series of the stock. We find that our method is very accurate on the sample considered. Our method outperform several existing methods in time series classification.
Sascha Meyen Universität Tübingen, GermanyA Theoretical Perspective on the Allocation of Informational Resources in Classification Tasks
In classification tasks, an optimal allocation of limited resources is desirable. Much work has been done to achieve good classification accuracy in a wide variety of tasks and with different architectures, for example, neural networks classifying image data. However, there is a lack of fundamental theoretical consideration regarding the optimal allocation of limited informational resources: Should all image types be learned equally well or should architectures specialize to certain image types? To approach this topic, I go beyond the typical performance measure of classification accuracy and turn to Shannon’s Information Theory. I show how the two measures, classification accuracy and mutual information, are related and that specialization to certain types of images represents an optimal resource allocation in the sense that mutual information is increased even if classification accuracy remains constant. One benefit of specialization can be observed in the framework of confidence weighted majority voting, where multiple architectures perform the same classification task and their predictions are combined. When combining specialized architectures with the same classification accuracy, their ensemble classification performance exceeds the performance of ensembles consisting out of unspecialized architectures. This result provides theoretical guidance for improving architectures in classification tasks.
Gilles Mordant UCLouvain, BelgiumMultivariate goodness-of-fit tests based on Wasserstein distance
On this poster, we will present goodness-of-fit tests based on the Wasserstein distance and exhibit some of their properties as well as some results from a simulation study. The results exhibit that the proposed tests enjoy good discriminating power between the null and the alternative hypotheses, both for simple and composite null hypotheses. Further, the empirical Wasserstein distance is proved to converge to zero uniformly over some families of models, a result of potential independent interest.
Hoang Nguyen Örebro University, SwedenVariational Inference for high dimensional structured factor copulas
Factor copula models have been recently proposed for describing the joint distribution of a large number of variables in terms of a few common latent factors. In this paper, we employ a Bayesian procedure to make fast inferences for multi-factor and structured factor copulas. To deal with the high dimensional structure, a Variational Inference (VI) algorithm is applied to estimate different specifications of factor copula models. Compared to the Markov Chain Monte Carlo (MCMC) approach, the variational approximation is much faster and could handle a size-able problem in limited time. Another issue of factor copula models is that the bivariate copula functions connecting the variables are unknown in high dimensions. We derive an automatic procedure to recover the hidden dependence structure. By taking advantage of the posterior modes of the latent variables, we select the bivariate copula functions based on minimizing the Bayesian Information Criterion (BIC). Simulation studies in different contexts show that the procedure of bivariate copula selection could be very accurate in comparison to the true generated copula model. The proposed procedure is illustrated with two high dimensional real datasets.
Claude Renaux ETH Zürich, SwitzerlandGroup Inference in High Dimensions with Applications to Hierarchical Testing
The development of high-dimensional group inference is an essential part of statistical methods for analyzing complex data sets, including hierarchical testing, tests of interaction, detection of heterogeneous treatment effects and local heritability. Group inference in regression models can be measured with respect to a weighted quadratic functional of the regression sub-vector corresponding to the group. Asymptotically unbiased estimators of these weighted quadratic functionals are constructed and a procedure using these estimator for inference is proposed. We derive its asymptotic Gaussian distribution which allows to construct asymptotically valid confidence intervals and tests which perform well in terms of length or power.
Maximilian Steffen Universität Hamburg, GermanyPAC-Bayesian Estimation in High-Dimensional Multi-Index Models
Bennet Ströh Ulm University, GermanyWeak dependence of mixed moving average fields and statistical applications
We introduce an easily accessible new notion of weak dependence for random fields suitable for a general concept of causality (so called θ-lex-weak dependence). Based on this measure of dependence we derive a central limit theorem for stationary random fields under mild moment assumptions.As example we consider a mixed moving average (MMA) field X driven by a Lévy basis and prove that it is weakly dependent with rates computable in terms of the moving aver- age kernel and the characteristic quadruple of the Lévy basis. Using this property, we show conditions ensuring that the sample mean and autocovariances of X have a limiting normal distribution.
Theja Tulabandhula University of Illinois at Chicago, USABlock-Structure Based Time-Series Models For Graph Sequences
Although the computational and statistical trade-off for modeling single graphs, for instance, using block models is relatively well understood, extending such results to sequences of graphs has proven to be difficult. In this work, we take a step in this direction by proposing two models for graph sequences that capture: (a) link persistence between nodes across time, and (b) community persistence of each node across time. In the first model, we assume that the latent community of each node does not change over time, and in the second model we relax this assumption suitably. For both of these proposed models, we provide statistically and computationally efficient inference algorithms, whose unique feature is that they leverage community detection methods that work on single graphs. We also provide experimental results validating the suitability of our models and methods on synthetic and real instances.
Lea Wegner Otto-von-Guericke Universität Magdeburg, GermanyOptimal order of regret for Partial Monitoring
In this lecture we survey statistical methodology for change point problems, i.e. estimation and detection problems where abrupt changes (discontinuities) have to be recovered from random data. Applications are broad, ranging from statistical finance to network analysis I medial imaging and genomics. We provide a principled approach based on statistical estimation and testing methodology. In the first lecture we survey classical results and methods for simple change point recovery, which then will be extended to more recent developments for multiscale change point detection in the second lecture. In the third lecture we will show how these multiscale methods can be used to analyze specific blind source separation problems. Special emphasis will be put on the underlying combinatorial (linear) algebraic structure of the model. Theory will be accompanied by various real data examples form cancer genetics and physiology and comments to software and algorithms.
In nonparametric statistics one abandons the requirement that a probability density function belongs to a statistical model with finitely many parameters, and instead requires that it satisfies certain constraints. In this talk, we consider log-concave densities. The logarithm of the log-concave maximum likelihood estimate has been shown to be a piecewise linear function. We study exact solutions to log-concave maximum likelihood estimation. This talk is based on joint work with Alex Grosdos, Alex Heaton, Olga Kuznetsova, Georgy Scholten and Miruna-Stefana Sorea.