# Spring School on Mathematical Statistics

## Abstracts for the talks

**Monday Osagie Adenomon ***Nasarawa State University, Keffi, Nigeria***On the Volatility Half-life and Persistence of Daily Stocks of a Nigerian Bank: Evidence from GARCH Models.**

The role of probability distributions in financial time series analysis cannot be emphasized because of its characteristics such as volatility, skewness, kurtosis, Arch effects etc. GARCH (1,1) Models with three distributions such as Normal, Student t and Skewed Student t were applied to fit the daily stock returns of Zenith Bank Nigeria Plc. with observations from October 21, 2004 to May 8, 2017. Using the Akaike information criterion, the results revealed the superiority of TGARCH (1,1) model with skewed student t distribution over the other models with normal and student t distributions.

**Application of calculus in valuation of a firm**

Mathematics,Statistics and Finance go hand in hand in business applications. They are integrated. Calculus is a mathematical concept which is used much applied in Finance. Valuation of a firm can be found aggregating value parameters or disintegrating them. Thus integral and differential calculus can be applied accordingly to value a firm. This poster will attempt to highlight this application and concept.

**Sadiah Aljeddani ***University of Umm Al QUra***Variable Selection for Split-plot design experiment using Bayesian Analysis**

The selection of the best subset of variables, which will have a strong eﬀect on an outcome of interest, is fundamental when avoiding overﬁtting in statistical modelling. However, when there are many variables, it is computationally diﬃcult to ﬁnd this best subset. The diﬃculties of variable selection would be more complex when designs are with restricted randomization. The Stochastic Search Variable Selection (SSVS) technique is used as Bayesian approach. This performs variable selection and model estimation simultaneously where the variance of all active factors will be sampled from one posterior distribution. As two diﬀerent strata in split-plot design are existed, the SSVS approach to perform Bayesian variable selection is extended for the analysis of data from restricted randomized experiments by introducing the Stochastic Search Variable Selection for Split-Plot Design (SSVS-SPD) in which the variances of the active subplot and whole-plot factors are sampled from two diﬀerent posterior distributions. We develop the function of SSVS-SPD using R program to make the sampling. Markov Chain Monte Carlo (MCMC) is used to estimate the fixed parameters through the Metropolis-Hastings within Gibbs sampling. The usefulness of Bayesian approaches can be summarized as it supports the utilization of SSVS-SPD method for the statistical analysis of data from experiments subject to restricted randomization.

**Mathias Drton ***Technical University Munich***Latent variables and feedback loops in linear structural equation models**

The lectures give an introduction to linear structural equation models with a focus on issues arising from the presence of latent variables or feedback loops. The opening lecture will highlight the models’ causal interpretation, their representation in terms of directed graphs, and the rich algebraic structure that emerges in the special case of linear structural equations. The subsequent two lectures will treat problems involving latent variables or feedback loops. We will present methods to decide parameter identifiability, review results on conditional independence relations and their use in model selection methods, and discuss relations among covariances that go beyond conditional independence.

**How Mathematics is shaping the Machine Learning**

Mathematics has always played important role in human history. Nowadays the machine learning is leading all the innovative ideas has its core concept lies in mathematics and statistics. Mathematics in machine learning answers what is happening and why it’s happening, and how the optimum solution can be derived. The computer understands the knowledge representation in the form of linear algebra (as matrices). The calculus (Multivariate calculus, or partial differentiation) is used for the optimization of a given convex function. The probability concepts are required for machine learning in the form of Bayes theorems. In fact without knowledge about linear algebra, calculus, and probability a machine learning scientists cannot think of the best solution to a machine learning problems.

**Anna Klimova ***TU Dresden***Deceptive Simplicity: some paradoxes and challenges in categorical data**

One of the main aims of categorical data analysis is to infer the association structure in multivariate discrete distributions that can be described using a contingency table. Multiple measures of association were proposed for the simplest case, the 2x2 table, with the odds ratio being the only measure that is variation independent from the marginal distributions of the table. The interaction in higher-dimensional tables can also be described using odds ratios of different types, and the variation independence entails that the lower order marginal distributions in a contingency table do not carry any information about higher order interactions. As illustrated by examples, the conditional independence does not necessarily follow from the marginal independence, and a reversal in the direction of association between marginal and conditional distributions, known as Simpson’s paradox, may also occur.

Hierarchical log-linear models are a conventional tool for describing association in a multiway complete contingency table, and can be specified by setting certain odds ratios in the table equal to one. When the data are represented by an incomplete table, the traditional log-linear models and their quasi variants do not always provide a good description of the association structure. These models assume the existence of a parameter common to all cells, the overall effect, which is not necessarily justified when the absent cells do not exist logically or in a particular population. Some examples, where models without the overall effect arise naturally, are given, and the consequences of adding a normalizing constant (the overall effect) to these models are discussed.

**Kaie Kubjas ***Aalto University***Exact solutions in log-concave maximum likelihood estimation**

In nonparametric statistics one abandons the requirement that a probability density function belongs to a statistical model with finitely many parameters, and instead requires that it satisfies certain constraints. In this talk, we consider log-concave densities. The logarithm of the log-concave maximum likelihood estimate has been shown to be a piecewise linear function. We study exact solutions to log-concave maximum likelihood estimation. This talk is based on joint work with Alex Grosdos, Alex Heaton, Olga Kuznetsova, Georgy Scholten and Miruna-Stefana Sorea.

**Adewale Folaranmi Lukman ***Landmark University***Combining modified ridge type and principal component regression estimators**

There is inadequacy in the performance of ordinary least squares estimator (OLSE) when there is multicollinearity (MC) in a linear regression model. The principal components regression and the modified ridge type estimator have been proposed at a different time to handle the problem of MC. However, in this paper, we developed a new estimator by combining these two estimators and derived the necessary and sufficient condition for its superiority over other competing estimators. Furthermore, we establish the dominance of this new estimator over other estimators through a numerical example, and simulation study in terms of the estimated mean square error.

**Richard McElreath ***MPI for Evolutionary Anthropology***Better Sampling with Physics: A very short introduction to Hamiltonian Monte Carlo**

Markov chain Monte Carlo (MCMC), more than any other tool, has fueled the Bayesian revolution in applied statistics. However MCMC can go badly wrong, in particular in high dimensions. Hamiltonian Monte Carlo (HMC) is an approach that simulates a physical system in order to adaptively and efficiently sample from a high-dimension (10s-of-thousands of parameters) target distribution. I'll introduce the approach, show a minimal working algorithm, and discuss some of the most common difficulties in implementation.

**Axel Munk ***Georg August University Göttingen & Max Planck Institute for Biophysical Chemistry***Multiscale Change Point Inference**

In this lecture we survey statistical methodology for change point problems, i.e. estimation and detection problems where abrupt changes (discontinuities) have to be recovered from random data. Applications are broad, ranging from statistical finance to network analysis I medial imaging and genomics. We provide a principled approach based on statistical estimation and testing methodology. In the first lecture we survey classical results and methods for simple change point recovery, which then will be extended to more recent developments for multiscale change point detection in the second lecture. In the third lecture we will show how these multiscale methods can be used to analyze specific blind source separation problems. Special emphasis will be put on the underlying combinatorial (linear) algebraic structure of the model. Theory will be accompanied by various real data examples form cancer genetics and physiology and comments to software and algorithms.

**Jonas Peters ***University of Copenhagen***Causality**

In the field of causality we want to understand how a system reacts

under interventions (e.g. in gene knock-out experiments). These

questions go beyond statistical dependences and can therefore not be

answered by standard regression or classification techniques. In this

tutorial you will learn about the interesting problem of causal

inference and recent developments in the field. No prior knowledge about

causality is required.

Part 1: We introduce structural causal models and formalize

interventional distributions. We define causal effects and show how to

compute them if the causal structure is known.

Part 2: We present three ideas that can be used to infer causal

structure from data: (a) finding (conditional) independences in the

data, (b) restricting structural equation models and (c) exploiting the

fact that causal models remain invariant in different environments.

Part 3: We show ideas on how more classical machine learning problems

could benefit from causal concepts.

**High Dimensional Regression**

High Dimension Regression is a hot topic for the last few decades. I have tried to explore different regression techniques available in the literature for high dimensional settings (when n<<p). We will try to explain various situations (advantages and drawbacks) for these techniques through simulation studies.<br />

This work is going on in my master thesis.

## Date and Location

**March 30 - April 03, 2020**

Max Planck Institute for Mathematics in the Sciences

Inselstr. 22

04103 Leipzig

## Scientific Organizers

**Carlos Améndola**

Technical University Munich**Eliana Duarte Gelvez**

MPI for Mathematics in the Sciences**Orlando Marigliano**

MPI for Mathematics in the Sciences

## Administrative Contact

**Saskia Gutzschebauch**

MPI für Mathematik in den Naturwissenschaften

Contact by Email