Although the tenet of "correlation does not imply causation" is still an important guiding principle in language research, a number of techniques developed in the last few decades opened new scenarios where testing causal relations becomes possible. Recent advances in information theory, time series analysis, phylogenetics, stochastic processes, dynamical systems, graphical models and Bayesian inference (among many others) set the stage for a new and exciting chapter in the field.
In parallel, in the last few decades an unprecedented amount of data became available on a large number of language-related phenomena. We have massive matrices of voxel activation in the neural circuits involved in speech production and comprehension, several years of annotated conversations between young children and their caregivers, hundreds of hours of phonetic and anatomical measurements and multiple environmental, genetic, and demographic variables related to populations of speakers for a large number of the world's languages.
The aim of this workshop is to address these two issues: how do we properly test causal relations in (eventually noisy, sparse or incomplete) data, and how can we infer or test the mechanisms underlying them?
Following a half-day school on cutting-edge methods for causal analysis, world class scientists will present their research on topics ranging from language history, writing systems, speech processing, typology, lexical semantics, and others.
We invite contributions from researchers facing specific problems in determining causality in language systems and also from researchers offering perspectives from the methodological and theoretical point of view of causal inference.
Child care is available on request for the duration of the conference.
We strongly encourage women and minorities to apply.
The workshop is co-organized by the following Max Planck Institutes (MPI):
In this lecture, basic concepts and formal tools for the study of causality will be introduced and discussed. The focus of the lecture will be on Pearl's causality theory and Shannon's information theory.
The main formal object of Pearl's theory represents the cause-effect relations within a system in terms of arrows between the nodes of a network. Such a network, a so-called Bayesian network, has two components, a structural and a mechanistic one. General measurements in the network can display correlations that do not directly correspond to causal links: "correlation does not imply causation." Reichenbach's common cause principle, on the other hand, refers to an apparently contradicting law. Somewhat simplified, it says that any correlation of variables implies a cause-effect relation or the existence of a common cause of these variables. In this sense, "correlation does imply causation."
In order to disentangle the casual relations that underly correlations, the concept of experimental intervention is required. Pearl's framework allows to formalise this operation in terms of his do-calculus. Surprisingly, actual experimental intervention is not always required in order to identify casual effects. One of the core results of that theory is given in terms of sufficient criteria for the identification of causal effects based on purely observational data. The lecture will highlight the utility of information theory in cases where these criteria are not satisfied. A very general quantitative extension of the common cause principle based on information theory will be presented in this regard.
Despite breathtaking recent progress, causal inference remains a very complex issue which becomes even more complex when studying phenomena that bridge multiple scales, levels and disciplines. This talk will focus on such a case, trying to link the population genetic structure to patterns of linguistic diversity. In a nutshell, this proposal suggests that structural properties of language might be influenced by the genetics of the speakers. Such influence must be very weak and indirect. Weak because we know that healthy children will acquire natively any language they happen to be raised into no matter their genetic makeup. Indirect because genes do not by themselves do anything to language and speech, not to mention group-level and historical phenomena such as language. Studying such cases rises a number of important questions including: What sort of data, methods, criteria and discourse must we use to support such long inferential chains crossing so many levels (molecular to social), timescales (milliseconds to generations) and disciplinary boundaries (molecular genetics to historical linguistics)? When and how do we decide that we have a causal explanation given the differences between disciplines (experimentation is not always possible)? How do we deal with loops in the process? How do we identify suppressors (for instance, compensatory strategies in speech articulation) and how do we control for them? In this talk I will use some more-or-less fictional examples to illustrate these difficulties but also to suggest approaches that might make such cases tractable, suggesting ways forward that should apply more generally to phenomena that bring the cultural and non-cultural together.
Cultural change can be quantified by temporal frequency changes of different cultural artefacts. Based on those (observable) frequency patterns researchers often aim to infer the nature of the underlying cultural transmission processes and therefore to identify the (unobservable) causes of cultural change. Especially in archaeological and anthropological applications this inverse problem gains particular importance as occurrence or usage frequencies are commonly the only available information about past cultural traits or traditions and the forces affecting them. Matters are further complicated by the fact that observed changes often describe the dynamics in samples of the population of artefacts whereas transmission processes act on the whole population. In this talk we start analysing the described inference problem. We develop a generative inference framework which firstly establishes a causal relationship between underlying transmission processes and temporal changes in frequency of cultural artefacts and secondly infers which cultural transmission processes are consistent with observed frequency changes. In this way we aim to deduce underlying transmission modes directly from available data without any optimality or equilibrium assumption. Importantly this framework allows us to explore the theoretical limitations of inference procedures based on population-level data and to start answering the question of how much information about the underlying transmission processes can be inferred from frequency patterns. Our approach might help narrow down the range of possible processes that could have produced observed frequency patterns, and thus still be instructive in the face of uncertainty. Rather than identifying a single transmission process that explains the data, we focus on excluding processes that cannot have produced the observed changes in frequencies. We apply the developed framework to a dataset describing the LBK culture.
Reconstructing the deep history of human diversity draws on combined evidence from linguistics, genetics and biological anthropology. Early explorations on the association between languages and genes suggested that patterns of linguistic diversity paralleled those of genetic diversity. Most of these early studies used pairwise distance measures of genetic and linguistic dissimilarity to statistically compare the significance of their association (Derish and Sokal 1988 and much subsequent work). Other work on the phylogenetic structure of genetic and linguistic data assessed similarities in the topology of generated trees (Cavalli-Sforza et al. 1988; 1992). The general conclusion drawn from this body of work was that when human populations separated and became genetically differentiated, their languages followed a similar evolutionary pattern.
Recent work using quantitative measures of linguistic diversity (Creanza et al, 2015; Longobardi et al., 2015, among others) largely confirm the gene-language correlation.
Likewise, skeletal morphology has been shown to be correlated with genetic signals (Harvati and Weaver 2006, among others). However, some anatomical regions are better suited for this type of analysis than others: the shape of the neurocranium (brain case) and cranial base, as measured either by 3-D geometric morphometric data or by conventional linear measurements, appear to track population history relatively closely. The face, on the other hand, was found to respond to selection pressures caused by environmental factors such as climate or subsistence patterns.
In an ongoing study, we compared phenotypic distances related to different aspects of cranial morphology to linguistic distances. The latter were derived from word lists from the Automatic Similarity Judgment Program (Wichman et al., 2013). The study used cranial data from ca. 150 populations around the world. Quite surprisingly, we found that linguistic distances show a stronger and more robust correlation with the face than with the neurocranium. This suggests that linguistic diversity cannot adequately be explained by population diversification alone.
In the talk I will deploy phylogenetic comparative methods combined with causal inference --- especially path equation modeling --- to disentangle the vertical and horizontal processes governing phenotypic and linguistic diversity and their causal structure.
Language happens in the here-and-now. Our memory for linguistic input is fleeting. New material rapidly obliterates previous material. How then can the brain deal successfully with the continual deluge of linguistic input? I argue that, to deal with this “Now-or-Never” bottleneck, the brain must incrementally compress and recode language input as rapidly as possible into increasingly more abstract of levels of linguistic representation. This perspective has profound implications for the nature of language processing, acquisition, and change. Focusing on language acquisition, I present a computational model that learns in a purely incremental fashion, through on-line processing of simple statistics, and offers broad, cross-linguistic coverage while uniting comprehension and production within a single framework. The model achieves strong performance across over 200 single-child corpora representing 29 languages from the CHILDES database. I conclude that the immediacy of language processing provides a fundamental constraint on accounts of language acquisition, implying that acquisition fundamentally involves learning to process, rather than inducing a grammar.
As an illustration of how my lab explores and tests causal links, I will present some of our research research on functional biases in language change. Such biases can in turn explain typological patterns, including (hypothesized) linguistic universals. In tune with the motivation for this workshop, I’ll argue that advances in Big Data and experimental methods afford exciting new approaches to these questions, which have enjoyed continued prominence in the language sciences. For this talk, I’ll focus on two specific pressures on language use. The first pressure relates to the fact that linguistic communication takes place in the presence of noise, so listeners need to infer intended message from noisy input —making less probable message harder to infer (e.g., Levy, 2008; Norris & McQueen, 2008; Bicknell & Levy, 2012; Gibson et al., 2013; Kleinschmidt & Jaeger, 2015). The second pressure relates to memory demands during language processing, where longer dependencies are associated with slower processing (Gibson, 1998, 2000; Lewis et al., 2006; Vasishth & Lewis, 2005). Both pressures are well-known to affect language processing, including evidence from both experimental data (e.g., McDonald & Shillcock, 2003; Grodner & Gibson, 2005) and broad coverage corpus studies (e.g., Demberg & Keller, 2008; Boston et al., 2010; Smith & Levy, 2013). This means that an ideal speaker (in the sense of ideal observers) should a) support low-probability —i.e., high information— messages with ‘better’ linguistic signals to the extent that this is warranted against the effort in implies (e.g., due to aiming for more precise articulations or due to articulation additional words, cf. Lindblom, 1990, Jaeger, 2006, 2013; Gibson et al., 2013) and b) aim for short dependencies (e.g., by reordering constituents, Hawkins, 2004, 2014). Here, I ask whether the same pressures affect language learning, change, and/or the distribution of languages across the world. Case study 1 asks whether actual natural languages have syntactic properties that increase processing efficiency, as would be expected if the processing efficiency biases language learning and/or change. Using data from five large syntactically annotated corpora, I show that natural languages have lower information density and shorter dependency lengths than expected by chance (Gildea & Jaeger, in prep; for dependency length, see also Gildea & Temperley, 2010). Previous work has found similar properties for phonological and lexical systems (e.g, Manin, 2006; Piantadosi et al., 2011, 2012; Wedel et al., 2013). The present work is the first to find that the same properties affect even the syntactic system (which involves considerably more complex latent structure and has often been assumed to be encapsulated from functional pressures). Case studies 2 and 3 employ an miniature language learning approach to the same question. I show that learners of such languages restructure them in a way that improves both the inferability of messages and the dependency length (Fedzechkina et al., 2011, 2013, under review; Fedzechkina & Jaeger, 2015). Unlike approaches that rely on statistical modeling of typological data, miniature language learning does not suffer from data sparsity and can —if applied correctly— assess causality by directly manipulating the relevant factors. Additionally, the approach I describe differs from almost all other miniature language learning experiments in that the observed learning biases were neither in the input nor in the native language of learners (see also Culbertson et al., 2012; Culbertson & Adger, 2014; though see Goldberg, 2013). Time permitting, I will also present big data on the role of probabilistic transfer during L2 learning (Schepens & Jaeger, 2015). I close with some caveats about statistical work on these questions.Some references to related work from my labFedzechkina, M. and Jaeger, T. F. 2015. ‘Long before short’ preference in a head-final artificial language: In support of dependency minimization accounts. The 28th CUNY Sentence Processing Conference. USC, CA, March 19th-21st. Fedzechkina, M., Jaeger, T. F., and Newport, E.2013. Communicative biases shape structures of newly acquired languages. In Knauff, M., Pauen, N., Sebanz, & I. Wachsmuth (eds.) Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci13), 430-435. Austin, TX: Cognitive Science Society. Fedzechkina, M., Jaeger, T. F., and Newport, E. 2012. Language learners restructure their input to facilitate efficient communication. Proceedings of the National Academy of Sciences 109(44), 17897-17902. [doi:10.1073/pnas.1215776109; IF: 9.681] Fedzechkina, M., Jaeger, T. F., and Newport, E. L. 2011. Functional Biases in Language Learning: Evidence from Word Order and Case-Marking Interaction. In Carlson, L., Hoelscher, C., and Shipley, T. F. (eds.) Proceedings of the 33rd Annual Meeting of the Cognitive Science Society (CogSci11), 318-323.Austin, TX: Cognitive Science Society.Schepens, J. and Jaeger, T. F. 2015. L2 phonological learning in adults: The role of language background, length of exposure, and age of acquisition. DGfS, Leipzig, Germany, March 4th- 6th.
Fakhteh Ghanbarnejad, Martin Gerlach, José M. Miotto, Eduardo G. Altmann
In this work we investigate how much information on the process of language change can be inferred from the shape of the adoption curve of a lexical innovation. We investigate simple models in which an innovation spreads in a network of speakers due to two different factors: exogenous and endogenous to the speakers. We propose a measure that quantifies the strength of each of these factors and we test different methods to estimate this measure from the adoption curves. We apply our methods to different historical examples of lexical change: regularization of verbs in English, romanization of Russian names in German and English, and orthographic reforms in German.
Reference: "Extracting information from S-curves of language change", J. R. Soc. Interface 11, 20141044 (2014)
Systems often achieve robustness and evolvability through the architectural principle of using large and thin layers---the layering allowing largely modular processing and the thin occupancy giving rise to a `digital' error-correcting capabilities. Innovations in the different layers that maintain this structure, and changes at the interfaces between the different layers, both allow evolution while maintaining function. These changes, when viewed from the perspective of a lower layer, appear as non-independent, coordinated, changes affecting the entire system. Human language displays such a structure where the the different layers---syntax, morphology, lexical tokens, phonemes, phones---are structurally largely modular, and affects language evolution strongly. For example, a change of mapping of phonemes to underlying phones ultimately gives rise to regular correspondences. Such coordinated changes need to be, and can be, explicitly modeled in reconstructing histories, or `phylogenies', of languages. Language can also be viewed as a distributed communication system, where agents constantly propose and adopt changes consistent with layering and maintaining communicative intent. At the lexical level, these ultimately lead to word innovation and replacement. The adaptive aspect of this diffusive process may be studied as a distributed optimization problem running on an underlying layer of semantic network that is shared by the agents. Such a view allows one to build a state-process model of language change that can be exploited both to study human languages and as a model for artificial adapting distributed communication systems.
A central question in the language sciences is why languages have the semantic categories they do, and what those categories reveal about cognition and communication. Word meanings vary widely across languages, but this variation is constrained. I will argue that this pattern reflects a range of language-specific solutions to a universal functional challenge: that of communicating precisely while using minimal cognitive resources. I will present a general computational framework that instantiates this idea, and will show how that framework accounts for cross-language variation in several semantic domains, including color, spatial relations, kinship, and number.
The growth of functional complexity in speechGonzalo CastilloUniversitat de Barcelona, SpainPlease see the abstract as PDF file.Disentangling style and priming using Generalized Additive ModelsAaron EcayUniversity of Pennsylvania/University of York, USAPlease see the abstract as PDF file.How to constrain the set of possible causal explanationsRamon Ferrer-i-CanchoUniverstat Politècnica de Catalunya, SpainPlease see the abstract as PDF file.Information-theoretic approach to measure vocabulary distance: How localized is a book in time?Martin GerlachMax Planck Institute for the Physics of Complex Systems, GermanyAuthors: Martin Gerlach, Francesc Font-Clos, and Eduardo G. Altmann
In this work we investigate the temporal and author variation in the usage of language. We quantify the difference in the vocabularies by looking at the statistics of word frequencies in written text and applying tools rooted in information theory (e.g. mutual information, Jensen-Shannon divergence). This not only yields a well-defined measure capturing how much information is shared between two different (samples of) text, but also allows for a calculation of the expected fluctuations in order to assess the significance of these differences. We use these tools to compare individuals books to a reference corpus with yearly resolution (Google n-gram database). We confirm a good matching of our results with the publication date of the books and we develop measures that quantify how innovative the vocabulary of individual books and authors were.Causality and converging evidence: overcoming methodological monism with corpus analysis and experimental approachesIulia GrosmanUniversité catholique de Louvain, BelgiumPlease see the abstract as PDF file.Complex coordinative patterns in speech productionLeonardo LanciaMax Planck Institute for Evolutionary Anthropology, GermanyPlease see the abstract as PDF file.On causality and comorbidity: A view from the ‘schizophrenia-blindness-language’ triangleEvelina LeivadaUniversitat de Barcelona, SpainPlease see the abstract as PDF file.Testing iconicity: A quantitative study of causative constructions based on a parallel corpus of film subtitlesNatalia LevshinaF.R.S.-FNRS, Université catholique de Louvain, BelgiumPlease see the abstract as PDF file.Against causality: science of language by Prague Linguistic CircleMarek NagyPalacký University, Czech RepublicPlease see the abstract as PDF file.Complexity in language phenomenaMichał B. ParadowskiUniversity of Warsaw, PolandThroughout history language sciences have been dealing with numerous phenomena that are either inherently complex/dynamic systems, or which display typical qualities of such systems. Within an individual, one can bring to mind perceptual dynamics and categorisation in speech, the emergence of phonological templates, or word and sentence processing; across society, think variations and typology, the rise of new grammatical constructions, semantic bleaching, language evolution in general, and the spread and competition of both individual expressions, and entire languages.
A representative handful of language phenomena will be depicted which have been known to exhibit such properties as hysteresis, phase transition, bifurcation, attractor states, or power law distribution. The multifaceted dynamism and complexity will also be discussed of the process of language acquisition, highlighting the importance of adopting designs with different timescales in order to trace language development as a process of change over time, of the utility of time-series analyses, and of the ability to determine optimal temporal integration windows, e.g. in analyses of dynamic motifs in human communication.Cognate identification from word alignments: algorithms and scoringNancy RetzlaffUniversity of Leipzig, GermanyPlease see the abstract as PDF file.The cultural evolution of functional morphology in an Iterated Learning experimentCarmen SaldanaUniversity of Edinburgh, United KingdomPlease see the abstract as PDF file.Gender in Language and Economics: Is the epidemiological approach the road to causality?Estefania Santacreu VasutESSEC Business School, FrancePlease see the abstract as PDF file.Causality, arbitrariness, and a view from diachronyKevin StadlerThe University of Edinburgh, United KingdomAs has been pointed out, the quest to understand the causal origin of linguistic structures is ultimately an investigation into the diachronic processes that shape languages across time. Consequently, new approaches to causal inference can be informed and strengthened by considering insights on the nature of language change from the long tradition of diachronic studies. In this poster I want to investigate how the concept of arbitrariness, as well as the sporadic nature and so-called 'actuation problem' of language change relate to causal explanations of language. I will try to argue that these central linguistic tenets call into question whether causal explanations of language are possible, or even desirable. Rather than completely discarding the role of causality in language, I hope to show that much could be gained from studying the inverse (but related) problem, i.e. from identifying and characterising the mechanisms which keep languages from becoming completely determined by their environment. A general understanding of the dynamics of linguistic changes would seem to be an important prerequisite for making links between particular changes and their underlying causes, particularly in terms of identifying the exact nature of 'causal triggers' of change.
For 5000 years humans have been using visible marks to encode spoken language. For a far longer period, they have been using visible marks to encode concepts, ideas or, in general, a variety of non-linguistic information. When faced with an ancient symbol system whose meaning is unknown, can one tell if was linguistic (and therefore worth trying to decipher as a language), or some sort of non-linguistic system?
On the face of it, it seems reasonable to use as evidence statistical information on the behavior of symbols in the system. If the symbols distribute in a way that is similar to the distribution of elements (phonemes, morphemes, words, etc) in language, then this could serve as evidence that the system is writing. In causal terms, the fact that it is writing causes the system to show the statistical properties it has.
Recent work that has used this line of argumentation suffers from a variety of problems. First, while such work invariably makes the claim that the statistical measures used are evidence for structure, often the measures actually tell us little or nothing about structure. Second, even if the measures do relate to structure, do they specifically imply linguistic structure? A parse tree looks very similar to a tree that describes the structure of a mathematical formula, so structure per se hardly seems enough. This leads to a third problem with such work in that it depends to some degree on a widespread misconception that non-linguistic systems are structureless. Finally there is the question of whether sample sizes for such systems are ever large enough to make robust statistical claims.
In this talk I review the results of my own work on the statistics of non-linguistic symbol systems, and draw a mostly negative conclusion about the possibility of finding statistical measures that are useful in answering this question.
Phylogenetic correlation is important: a major concern of functional/typological approaches to linguistics is the investigation of the causal interactions between aspects of language structure. Modern phylogenetic comparative methods offer new ways to address these questions. Dunn et al. (2011) use these methods in one such attempt: a large scale study of word order variation in four language families showed that many so-called "universals" of language structure were not reliably detectable in all language families, belying their universal status. Beyond the evidence for the possible non-universality of language universals, this paper also presented instances of apparent correlated evolutionary change which violated typological predictions.
Maddison and Wayne (2015) enumerate some serious methodological challenges outstanding in the investigation of phylogenetic correlation, and identify a series of phylogenetic contexts where standard phylogenetic comparative methods such as Pagel (1994) perform poorly. For example, standard phylogenetic tests of correlation between two characters tend to overestimate the degree of correlation where there exists some third factor driving change in one character, but which is co-distributed with the other by chance. These (and other) challenges to the method don not undermine the entire enterprise, but there is clearly a need for improved statistical tests.
In this paper I investigate the inferred histories of word order features and attempt to classify our previously identified correlations between word order characters into "safe" and the various "at risk" categories. In some of these latter "at risk" cases of apparent correlation historical linguistics can give us a more particular and detailed account of diachronic processes of change. Through investigating these known cases I hope to clarify where phylogenetic tests of correlation are performing well on linguistic data, and to contribute to future efforts to solve the problem of inferring correlation where current methods perform poorly.
Dunn, Michael, Simon J. Greenhill, Stephen C. Levinson, and Russell D. Gray. 2011. Evolved structure of language shows lineage-specific trends in word-order universals. Nature 473:79–82.Maddison, Wayne P., and Richard G. FitzJohn. 2015. The unsolved challenge to phylogenetic correlation tests for categorical characters. Systematic Biology 64(1): 127–36.Pagel Mark. 1994. Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proceedings of the Royal Society B: Biological Sciences. 255:37–45.
For over 2000 years, ever since the Indian philosopher Pāṇini developed the first precise grammar of a language, linguistics has essentially been an engineering enterprise: how can we capture the rules of a language most concisely and most elegantly? This perspective persists even when the focus shifts from individual languages to larger sets of languages: traditional approaches here chiefly consist in an open-ended search for concise and elegant generalizations and correlations that are argued to hold across grammatical systems, either universally or regionally (Sprachbünde). Explanations come in only post-hoc.
Here I explore an alternative line of thinking that approaches linguistic structures as natural phenomena whose distribution in time and space can be predicted by causal theories. Such theories are rooted outside grammar: one the one hand, in what we know about the mechanisms of structure copying in language contact, and therefore in what we know about population history, and, on the other hand, in what we know about the biological conditions of language, e.g. about the neurophysiology of language processing.
I will illustrate such causal theories, their predictions and their testability with recent case studies: (a) a case study on how population movements have caused large-scale spreads of linguistic structures around the Pacific and inside Eurasia, and (b) a case study on how stable properties of the language comprehension system cause case marking system to show universal preferences in how they evolve over time (e.g. away from ergativity).