The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In disciplines dealing with complex dynamical systems, such as the Earth system, replicated real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal inference methods beyond the commonly adopted correlation techniques. In this talk I will present a recent Perspective Paper in Nature Communications giving an overview of causal inference methods and identify key tasks and major challenges where causal methods have the potential to advance the state-of-the-art in Earth system sciences. I will also present the causal inference benchmark platform www.causeme.net that aims to assess the performance of causal inference methods and to help practitioners choose the right method for a particular problem.
Runge, J., S. Bathiany, E. Bollt, G. Camps-Valls, D. Coumou, E. Deyle, C. Glymour, M. Kretschmer, M. D. Mahecha, J. Muñoz-Marı́, E. H. van Nes, J. Peters, R. Quax, M. Reichstein, M. Scheffer, B. Schölkopf, P. Spirtes, G. Sugihara, J. Sun, K. Zhang, and J. Zscheischler (2019). Inferring causation from time series in earth system sciences. Nature Communications 10 (1), 2553

The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In disciplines dealing with complex dynamical systems, such as the Earth system, replicated real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal inference methods beyond the commonly adopted correlation techniques. In this talk I will present a recent Perspective Paper in Nature Communications giving an overview of causal inference methods and identify key tasks and major challenges where causal methods have the potential to advance the state-of-the-art in Earth system sciences. I will also present the causal inference benchmark platform www.causeme.net that aims to assess the performance of causal inference methods and to help practitioners choose the right method for a particular problem.
Runge, J., S. Bathiany, E. Bollt, G. Camps-Valls, D. Coumou, E. Deyle, C. Glymour, M. Kretschmer, M. D. Mahecha, J. Muñoz-Marı́, E. H. van Nes, J. Peters, R. Quax, M. Reichstein, M. Scheffer, B. Schölkopf, P. Spirtes, G. Sugihara, J. Sun, K. Zhang, and J. Zscheischler (2019). Inferring causation from time series in earth system sciences. Nature Communications 10 (1), 2553.

We will critically review the insights of Kelly in what was arguably the first convincing attempt at adopting concepts and tools from information theory for analysis in a different field – and this in light of Shannon’s famous caution against jumping on ‘the bandwagon’ of information theory. How deep were the links that Kelly had identified and what was perhaps missed? We will then see how such ideas -- optimizing the expected exponential rate of return, proportional betting and the use of a side-information channel – were treated in the context of evolutionary models of fitness maximization under fluctuating environments. Possible analytic limitations of such models and the potential for further extensions will be discussed.

At its core, Darwinian evolution can be viewed as flow in the space of fitness distributions. (Here "fitness" refers to the reproductive success of organisms in a given environment.) I will show that this process has universal features, including the convergence to limiting distributions and the emergence of dynamical scaling. I will argue that these features are definite predictions of evolution through natural selection, which can be tested in silico and, soon, in vivo. Finally I will outline a connection to information theory and entropy optimization principles.

Biologists and social scientists have long argued that some ideas, beliefs, and behavior spread because such ideas are adapted to and manipulate human psychology. These "memes" may evolve to spread at the expense of the human psychology that makes them possible. However, many scholars are equally skeptical of the meme concept, suggesting that human genes keep memes on a rather short leash. Unfortunately, there are so far no formal models of these arguments. Here I present my first attempts to mathematically model the coevolutionary dynamics of social learning and parasitic behavior. These first models are exceedingly simple, but like most simple models, they are still able to surprise us. I find that selfish memes can readily invade a population of socially learning organisms and, on short time scales, can impose substantial fitness costs on the population. However, coevolution between psychology and behavior on longer time scales is equally important.

Unlike other primates, humans specialize in skill intensive subsistence requiring decades to learn and generations to invent. To understand the evolution of the human niche, we must integrate the cultural evolution of subsistence skills and technology with the genetic evolution of slow human life history. I present ongoing work analyzing the development of foraging skills, leveraging 15-thousand foraging records from 600 human foragers in 15 societies around the globe. These data exhibit many statistical maladies, including imbalance and missing values. Hamiltonian Monte Carlo allows fitting high-dimension (more than 20-thousand parameter) Bayesian life history models that respect variation both within and between societies, while also imputing missing values and respecting measurement uncertainty. I'll outline the background theory, describe technical challenges, and present intermediate results.

Complex systems are increasingly being viewed as distributed information processing systems, particularly in Artificial Life, computational neuroscience and bioinformatics. This trend has resulted in a strong uptake of information-theoretic measures to analyse the dynamics of complex systems in these fields. This talk will briefly review the use of these measures as applied to complex systems, and then introduce a software toolkit for conducting such analysis -- the Java information dynamics toolkit (JIDT). JIDT provides a standalone, open-source code implementation of measures for information dynamics, i.e. measures to quantify information storage, transfer and modification, and the dynamics of these operations in space and time. Principally, the toolkit implements the transfer entropy, (conditional) mutual information and active information storage, for both discrete and continuous-valued data. Various types of estimator (e.g. Gaussian, Kraskov-Stoegbauer-Grassberger) are provided for each measure. Furthermore, while written in Java, the toolkit can be used directly in Matlab, Octave and Python. I will describe how to install the JIDT software, how to get started with typical usage scenarios, and where to seek further support information. We will describe how to get started analysing your own data sets, as well as showcasing more complex demonstrations, e.g. analysing information dynamics in cellular automata.

The nonlinear Penner type external interaction is introduced and studied in the random matrix model of homo Ribo Nucleic Acid (RNA). The Penner interaction originally appeared in the studies of moduli space of punctured surfaces and has been applied here (for the first time) in addressing the problem of interacting RNA folding. An exact analytic formula for the generating function is derived using the orthogonal polynomial method. The partition function derived from the generating function for a given length enumerates all possible interacting RNA structure of possible topologies as well as the pairing. A numerical technique is developed to study the partition function and a general formula is obtained for all lengths. The asymptotic large length distribution functions are found and show a change in the critical exponent of the secondary structure contribution from $L^3/2$ for large $N$ (size of matrix, $N > L$, where $L$ is the length of the RNA chain ) to $L^1/2$ for small $N$. This observation in the nonlinear model is similar to that observed in the unfolding experiments on RNA with osmolytes and monovalent cations.Preliminary results on biological networks for an enzyme will be briey discussed.[1] H. Orland, A. Zee, Nucl. Phys. B [FS]620, 456 (2002).[2] G. Vernizzi, Henri Orland, A. Zee, Phys. Rev. Lett. 94, 168103 (2005).[3] P. Bhadola, I. Garg and N. Deo, Nucl. Phys. B [FS]870, 384 (2013).[4] P. Bhadola and N. Deo, Phys. Rev. E 88, 032706 (2013).[5] P. Bhadola and N. Deo, In preparation.

Symbiotic technology can be defined as a technology that adapts to individual learning, group dynamics, and evolutionary changes in animal populations while managing them over possibly many generations. As an example, I will discuss a proposed project that studies the use of robots to manage fish swarm similar to the way dogs are used for shepherding. Just like those dogs were bred, the robots can be designed using evolutionary algorithms, where selection is achieved by means of both predefined fitness functions and interaction with human users. I will discuss in detail experiments on how the complexity of behavior can grow in symbiotic coevolution.

Full and accurate reconstruction of dynamics from time-series data---e.g., via delay-coordinate embedding---is a real challenge in practice. In this talk, I will illustrate---for forecasting purposes---information can be gleaned from incomplete embeddings. In particular, I will provide a proof of concept for a stream-forecasting technique using a tau-return map embedding of the data. Even though correctness of the topology is not guaranteed for these incomplete reconstructions, near-neighbor forecasts in these reduced-order spaces are as (or more) effective than using a traditional embedding. I will illustrate the efficiency of this method on synthetic time series generated from the Lorenz-96 atmospheric model, as well as on experimental data.

We analyze the dynamics of agent-based models (ABMs) from a Markovian perspective and derive explicit statements about the possibility of linking a microscopic agent model to the dynamical processes of macroscopic observables.
On the basis of a formalization of ABMs as random walks on graphs, we use well-known conditions for lumpability to establish the cases where the macro model is still Markov.
For such a purpose a crucial role is played by the type of probability distribution used to implement the stochastic part of the model.
The symmetries of this distribution translate into regularities of the micro chain corresponding to the ABM, and this means that certain ensembles of agent configurations can be interchanged without affecting the probabilistic structure.
If a favored level of observation is compatible with the symmetries of the distribution, we obtain a complete picture of the macro dynamics including the transient stage.
If it is not, a certain amount of memory is introduced by the transition from the micro to the macro level, and this is the fingerprint of emergence in ABMs.
We describe our analysis in detail with some specific models of opinion dynamics.

In this talk I explore the relationship between Machine-Learning (ML), Monte Carlo Integral Estimation (MC), and 'blackbox' optimization (BO - the kind of optimization one typically addresses with genetic algorithms or simulated annealing).
To begin, I show how to to apply ML to automatically set annealing temperatures and other hyperparameters of *any* stochastic BO algorithm. Then I show how to extend this, first to transform any BO problem into a particular MC problem, and then to show that this MC problem is formally identical to the problem of how to do supervised learning. This extension allows us to apply *all* of the powerful techniques that have created for supervised learning to solve BO problems. I demonstrate the power of this in experiments (movies!).
I end by showing how to improve the convergence of any of a broad set of MC algorithms, by using the ML technique of stacking to learn control variates.

I will first recall main features of the mutual information (MI) as a measure for similarity or statistical dependence. In particular, I will discuss its embodiments in two versions of information theory: Probabilistic (Shannon) versus algorithmic (Kolmogorov). I will compare two different strategies for estimating the algorithmic complexity of "texts", one involving sequence alignment and file compression ("zipping"), the other just zipping alone. When applied to mitochondrial DNA, both versions will lead to distance measures which outperform other distance currently in use. The last part of the talk will be devoted to estimating Shannon MI from real-valued data, and an application to microarray gene expression measurements. In particular, I will show that large differences between dependencies estimated from MI and from linear correlation measures may hint to interesting structures which can then be further explored.

The statistical analysis of the frequency of different words reveals numerous similarities between language usage and other complex systems. Two prominent examples are Zipf's law, the power-law decay of the word-frequency distribution, and the presence of long-range correlations in texts. In this talk I will propose simple models which explain these two well-known empirical observations and shed some light on their origin. The unprecedented amount of written texts available for investigation (e.g., in the Internet) provides new motivations and opportunities to the quantitative investigation of these and other problems in statistical natural language.

The presentation will offer an updated account of integrated information theory of consciousness (IIT) and some of its implications. IIT stems from thought experiments that lead to phenomenological axioms and ontological postulates. The information axiom asserts that every experience is specific – it is what it is by differing in its particular way from a large repertoire of alternatives. The integration axiom asserts that each experience is unified – it cannot be reduced to independent components. The exclusion axiom asserts that every experience is definite – it is limited to particular things and not others and flows at a particular speed and resolution. IIT formalizes these intuitions with three postulates. The information postulate states that only “differences that make a difference” from the intrinsic perspective of a system matter: a mechanism generates cause-effect information if its present state has specific past causes and specific future effects within a system. The integration postulate states that only information that is irreducible matters: mechanisms generate integrated information only to the extent that the information they generate cannot be partitioned into that generated within independent components. Theexclusion postulate states that only maxima of integrated information matter: a mechanism specifies only one maximally irreducible set of past causes and future effects - a concept. A complex is a set of elements specifying a maximally irreducible constellation of concepts, where the maximum is evaluated at the optimal spatio-temporal scale. Its concepts specify a maximally integrated conceptual information structureor quale, which is identical with an experience. Finally, changes in information integration upon exposure to the environment reflect a system’s ability to match the causal structure of the world. The presentation will briefly summarize how IIT accounts for empirical findings about the neural substrate of consciousness, address how various aspects of phenomenology may in principle be addressed in terms of the geometry of information integration, and consider some aspects of the relationship between information integration and causation.

While it is an important problem to identify the existence of causal associations between two components of a multivariate time series, it is even more important to assess the strength of their association in a meaningful way. In the present article we focus on the problem of defining a meaningful coupling strength using information theoretic measures and demonstrate the short-comings of the well-known mutual information and transfer entropy. Instead, we propose a certain time-delayed conditional mutual information, the momentary information transfer (MIT), as a measure of association that is general, causal and lag-specific, reflects a well interpretable notion of coupling strength and is practically computable. MIT is based on the fundamental concept of source entropy, which we utilize to yield a notion of coupling strength that is, compared to mutual information and transfer entropy, well interpretable, in that for many cases it solely depends on the interaction of the two components at a certain lag. We formalize and prove this idea analytically and numerically for a general class of nonlinear stochastic processes and illustrate the potential of MIT on climatological data. The idea is also applicable to non-time series data. References: Preprint of "Quantifying Causal Coupling Strength" in arXiv:1210.2748 [physics.data-an] J. Runge, J. Heitzig, V. Petoukhov, and J. Kurths, Phys. Rev. Lett. 108, 258701 (2012). B. Pompe and J. Runge, Phys. Rev. E 83, 051122 (2011).

Inherent uncertainties in models and initial conditions of complex systems render deterministic predictions useless - they are most certainly wrong. A fair assessment of the uncertainty of the future given the (lack of) knowledge of a complex system and its current state is possible by probabilistic predictions. The forecast product is a probability distribution which is supposed to characterize our knowledge about the future. Evidently, in practice also such predicted probabilities suffer from inaccuracies, i.e., they have to be validated by forecast/observation pairs. In this talk, I will use the example of temperature forecasts to illustrate the concept, methods for verification, and difficulties of this approach.

The interest in the modeling approach to sleep regulation has increased over the past decade. Models help delineate the processes involved in the regulation of sleep and thereby offer a conceptual framework for the analysis of existing and new data. Sleep homeostasis denotes a basic principle of sleep regulation. A sleep deficit elicits a compensatory increase in the intensity and duration of sleep, while excessive sleep reduces sleep propensity. It is as though 'sleep pressure' is maintained within a range delimited by an upper and lower threshold. Sleep homeostasis is represented in the two-process model of sleep regulation by process S that increases during waking and declines during sleep. The timing and propensity of sleep are modulated also by a circadian process. Electroencephalographic (EEG) slow-wave activity (SWA) serves as an indicator of sleep homeostasis in non-rapid eye movement sleep. The level of SWA, a correlate of sleep intensity, is determined by the duration of prior sleep and waking. Evidence is accumulating for the existence of a local, use-dependent facet of sleep regulation.

In an intuitive understanding an open loop, i.e. non-directed, evolution of an economy means has no generally valid structural, or regularity, characteristics. In other words, there are no “general laws of evolution”. Transformed into a theoretical model setting this means more precisely that generally valid regularity characteristics exist neither for the evolution of the dependent variables of the modelled evolving economy (e.g. equilibrium values of prices, subsidies, taxes et cetera), nor for the evolution of its independent variables (describing the successive states of the economy). But this view turns out not to be true in general. The presentation will show that there are generally valid regularity characteristics of open loop economic evolutions on both levels of dependent and of independent variables as well.
In the first part we will present a model framework in which the dependent variables are equilibrium variables and will show that there is a universal near-continuity characteristic of the dependently evolving equilibria even though equilibria may be non-unique at any state of the evolution.
Universal regularities of the evolution of independent variables describing the momentary states of an evolving economy are by their nature related to causality of the successive states in real time. Accordingly in the second part we will provide a general causality measuring method for diachronic states of an economy which bases on the ideas of contingency and of counterfactuality.

The brain’s decoding of fast sensory streams is currently impossible to emulate, even approximately, with artificial agents. For example robust speech recognition is relatively easy for humans but exceptionally difficult for artificial speech recognition systems. In this talk, I propose that recognition can be simplified with an internal model of how sensory input is generated, when formulated in a Bayesian framework. We show that a plausible candidate for an internal or generative model is a hierarchy of ‘stable heteroclinic channels’. This model describes continuous dynamics in the environment as a hierarchy of sequences, where slower sequences cause faster sequences. Under this model, online recognition corresponds to the dynamic decoding of causal states, giving a representation of the environment with predictive power on several timescales. The ensuing decoding or recognition scheme uses synthetic sequences of syllables, where syllables are sequences of phonemes and phonemes are sequences of sound-wave modulations. The resulting recognition dynamics disclose inference at multiple time scales and are reminiscent of neuronal dynamics seen in the real brain.

The concept of effective complexity of an entity as the minimal description length of its regularities has been initiated by Gell-Mann and Lloyd. In the talk we present a proposal for a precise definition of effective complexity of finite binary sequences. We discuss shortly its basic properties as well as its relation to other complexity measures, e.g. Kolmogorov complexity, Bennett's logical depth and Kolmogorov minimal statistics. We analyse the effective complexity of long typical realisations of stationary processes. Based on this analysis, we address the question to what extent our results imply limitations of the concept.