Causality in the language sciences
Abstracts of the talks
Max Planck Institute for the Physics of Complex Systems, Germany
Quantifying the strength of factors leading to lexical change
Fakhteh Ghanbarnejad, Martin Gerlach, José M. Miotto, Eduardo G. Altmann
In this work we investigate how much information on the process of
language change can be inferred from the shape of the adoption curve of
a lexical innovation. We investigate simple models in which an
innovation spreads in a network of speakers due to two different
factors: exogenous and endogenous to the speakers. We propose a
measure that quantifies the strength of each of these factors and we
test different methods to estimate this measure from the adoption
curves. We apply our methods to different historical examples of lexical
change: regularization of verbs in English, romanization of Russian
names in German and English, and orthographic reforms in German.
Reference: "Extracting information from S-curves of language change",
J. R. Soc. Interface 11, 20141044 (2014)
Max Planck Institute for Mathematics in the Sciences, Germany
Concepts and formal tools for causality studies
In this lecture, basic concepts and formal tools for the study of causality will be introduced and discussed.
The focus of the lecture will be on Pearl's causality theory and Shannon's information theory.
The main formal object of Pearl's theory represents the cause-effect relations within a system in terms of arrows between the nodes of a network. Such a network, a so-called Bayesian network, has two components, a structural and a mechanistic one. General measurements in the network can display correlations that do not directly correspond to causal links: "correlation does not imply causation." Reichenbach's common cause principle, on the other hand, refers to an apparently contradicting law. Somewhat simplified, it says that any correlation of variables implies a cause-effect relation or the existence of a common cause of these variables. In this sense, "correlation does imply causation."
In order to disentangle the casual relations that underly correlations, the concept of experimental intervention is required. Pearl's framework allows to formalise this operation in terms of his do-calculus. Surprisingly, actual experimental intervention is not always required in order to identify casual effects. One of the core results of that theory is given in terms of sufficient criteria for the identification of causal effects based on purely observational data. The lecture will highlight the utility of information theory in cases where these criteria are not satisfied. A very general quantitative extension of the common cause principle based on information theory will be presented in this regard.
Santa Fe Institute, USA
Language change and layered systems
Systems often achieve robustness and evolvability through the architectural principle of using large and thin layers---the layering allowing largely modular processing and the thin occupancy giving rise to a `digital' error-correcting capabilities. Innovations in the different layers that maintain this structure, and changes at the interfaces between the different layers, both allow evolution while maintaining function. These changes, when viewed from the perspective of a lower layer, appear as non-independent, coordinated, changes affecting the entire system. Human language displays such a structure where the the different layers---syntax, morphology, lexical tokens, phonemes, phones---are structurally largely modular, and affects language evolution strongly. For example, a change of mapping of phonemes to underlying phones ultimately gives rise to regular correspondences. Such coordinated changes need to be, and can be, explicitly modeled in reconstructing histories, or `phylogenies', of languages. Language can also be viewed as a distributed communication system, where agents constantly propose and adopt changes consistent with layering and maintaining communicative intent. At the lexical level, these ultimately lead to word innovation and replacement. The adaptive aspect of this diffusive process may be studied as a distributed optimization problem running on an underlying layer of semantic network that is shared by the agents. Such a view allows one to build a state-process model of language change that can be exploited both to study human languages and as a model for artificial adapting distributed communication systems.
University of Zurich, Switzerland
Moving beyond Pāṇini: causal theories in linguistics
For over 2000 years, ever since the Indian philosopher Pāṇini developed the first precise grammar of a language, linguistics has essentially been an engineering enterprise: how can we capture the rules of a language most concisely and most elegantly? This perspective persists even when the focus shifts from individual languages to larger sets of languages: traditional approaches here chiefly consist in an open-ended search for concise and elegant generalizations and correlations that are argued to hold across grammatical systems, either universally or regionally (Sprachbünde). Explanations come in only post-hoc.
Here I explore an alternative line of thinking that approaches linguistic structures as natural phenomena whose distribution in time and space can be predicted by causal theories. Such theories are rooted outside grammar: one the one hand, in what we know about the mechanisms of structure copying in language contact, and therefore in what we know about population history, and, on the other hand, in what we know about the biological conditions of language, e.g. about the neurophysiology of language processing.
I will illustrate such causal theories, their predictions and their testability with recent case studies: (a) a case study on how population movements have caused large-scale spreads of linguistic structures around the Pacific and inside Eurasia, and (b) a case study on how stable properties of the language comprehension system cause case marking system to show universal preferences in how they evolve over time (e.g. away from ergativity).
Morten H. Christiansen
Cornell University, USA
The now-or-never bottleneck: a fundamental constraint on language
Language happens in the here-and-now. Our memory for linguistic input is fleeting. New material rapidly obliterates previous material. How then can the brain deal successfully with the continual deluge of linguistic input? I argue that, to deal with this “Now-or-Never” bottleneck, the brain must incrementally compress and recode language input as rapidly as possible into increasingly more abstract of levels of linguistic representation. This perspective has profound implications for the nature of language processing, acquisition, and change. Focusing on language acquisition, I present a computational model that learns in a purely incremental fashion, through on-line processing of simple statistics, and offers broad, cross-linguistic coverage while uniting comprehension and production within a single framework. The model achieves strong performance across over 200 single-child corpora representing 29 languages from the CHILDES database. I conclude that the immediacy of language processing provides a fundamental constraint on accounts of language acquisition, implying that acquisition fundamentally involves learning to process, rather than inducing a grammar.
Max Planck Institute for Psycholinguistics, Netherlands
Across levels and timescales: causally linking genes, language processing and cultural evolution
Despite breathtaking recent progress, causal inference remains a very complex issue which becomes even more complex when studying phenomena that bridge multiple scales, levels and disciplines. This talk will focus on such a case, trying to link the population genetic structure to patterns of linguistic diversity. In a nutshell, this proposal suggests that structural properties of language might be influenced by the genetics of the speakers. Such influence must be very weak and indirect. Weak because we know that healthy children will acquire natively any language they happen to be raised into no matter their genetic makeup. Indirect because genes do not by themselves do anything to language and speech, not to mention group-level and historical phenomena such as language. Studying such cases rises a number of important questions including: What sort of data, methods, criteria and discourse must we use to support such long inferential chains crossing so many levels (molecular to social), timescales (milliseconds to generations) and disciplinary boundaries (molecular genetics to historical linguistics)? When and how do we decide that we have a causal explanation given the differences between disciplines (experimentation is not always possible)? How do we deal with loops in the process? How do we identify suppressors (for instance, compensatory strategies in speech articulation) and how do we control for them? In this talk I will use some more-or-less fictional examples to illustrate these difficulties but also to suggest approaches that might make such cases tractable, suggesting ways forward that should apply more generally to phenomena that bring the cultural and non-cultural together.
Uppsala University, Sweden
Overestimating correlation between typological features
Phylogenetic correlation is important: a major concern of functional/typological approaches to linguistics is the investigation of the causal interactions between aspects of language structure. Modern phylogenetic comparative methods offer new ways to address these questions. Dunn et al. (2011) use these methods in one such attempt: a large scale study of word order variation in four language families showed that many so-called "universals" of language structure were not reliably detectable in all language families, belying their universal status. Beyond the evidence for the possible non-universality of language universals, this paper also presented instances of apparent correlated evolutionary change which violated typological predictions.
Maddison and Wayne (2015) enumerate some serious methodological challenges outstanding in the investigation of phylogenetic correlation, and identify a series of phylogenetic contexts where standard phylogenetic comparative methods such as Pagel (1994) perform poorly. For example, standard phylogenetic tests of correlation between two characters tend to overestimate the degree of correlation where there exists some third factor driving change in one character, but which is co-distributed with the other by chance. These (and other) challenges to the method don not undermine the entire enterprise, but there is clearly a need for improved statistical tests.
In this paper I investigate the inferred histories of word order features and attempt to classify our previously identified correlations between word order characters into "safe" and the various "at risk" categories. In some of these latter "at risk" cases of apparent correlation historical linguistics can give us a more particular and detailed account of diachronic processes of change. Through investigating these known cases I hope to clarify where phylogenetic tests of correlation are performing well on linguistic data, and to contribute to future efforts to solve the problem of inferring correlation where current methods perform poorly.
Dunn, Michael, Simon J. Greenhill, Stephen C. Levinson, and Russell D. Gray. 2011. Evolved structure of language shows lineage-specific trends in word-order universals. Nature 473:79–82.
Maddison, Wayne P., and Richard G. FitzJohn. 2015. The unsolved challenge to phylogenetic correlation tests for categorical characters. Systematic Biology 64(1): 127–36.
Pagel Mark. 1994. Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proceedings of the Royal Society B: Biological Sciences. 255:37–45.
Eberhard Karls University Tübingen, Germany
(joint work with Katarina Harvati and Hugo Reyes-Centeno)
Words and bones. Combining linguistic and phenotypic data to probe deep history
Reconstructing the deep history of human diversity draws on combined evidence from linguistics, genetics and biological anthropology. Early explorations on the association between languages and genes suggested that patterns of linguistic diversity paralleled those of genetic diversity. Most of these early studies used pairwise distance measures of genetic and linguistic dissimilarity to statistically compare the significance of their association (Derish and Sokal 1988 and much subsequent work). Other work on the phylogenetic structure of genetic and linguistic data assessed similarities in the topology of generated trees (Cavalli-Sforza et al. 1988; 1992). The general conclusion drawn from this body of work was that when human populations separated and became genetically differentiated, their languages followed a similar evolutionary pattern.
Recent work using quantitative measures of linguistic diversity (Creanza et al, 2015; Longobardi et al., 2015, among others) largely confirm the gene-language correlation.
Likewise, skeletal morphology has been shown to be correlated with genetic signals (Harvati and Weaver 2006, among others). However, some anatomical regions are better suited for this type of
analysis than others: the shape of the neurocranium (brain case) and cranial base, as measured
either by 3-D geometric morphometric data or by conventional linear measurements, appear to track
population history relatively closely. The face, on the other hand, was found to respond to selection pressures caused by environmental factors such as climate or subsistence patterns.
In an ongoing study, we compared phenotypic distances related to different aspects of cranial morphology to linguistic distances. The latter were derived from word lists from the Automatic Similarity Judgment Program (Wichman et al., 2013). The study used cranial data from ca. 150 populations around the world. Quite surprisingly, we found that linguistic distances show a stronger and more robust correlation with the face than with the neurocranium. This suggests that linguistic diversity cannot adequately be explained by population diversification alone.
In the talk I will deploy phylogenetic comparative methods combined with causal inference --- especially path equation modeling --- to disentangle the vertical and horizontal processes governing phenotypic and linguistic diversity and their causal structure.
T. Florian Jaeger
University of Rochester, USA
Pressures for processing and communicative efficiency bias language development: evidence from big(ish) data and direct causal manipulation
As an illustration of how my lab explores and tests causal links, I will present some of our research research on functional biases in language change. Such biases can in turn explain typological patterns, including (hypothesized) linguistic universals. In tune with the motivation for this workshop, I’ll argue that advances in Big Data and experimental methods afford exciting new approaches to these questions, which have enjoyed continued prominence in the language sciences.
For this talk, I’ll focus on two specific pressures on language use. The first pressure relates to the fact that linguistic communication takes place in the presence of noise, so listeners need to infer intended message from noisy input —making less probable message harder to infer (e.g., Levy, 2008; Norris & McQueen, 2008; Bicknell & Levy, 2012; Gibson et al., 2013; Kleinschmidt & Jaeger, 2015). The second pressure relates to memory demands during language processing, where longer dependencies are associated with slower processing (Gibson, 1998, 2000; Lewis et al., 2006; Vasishth & Lewis, 2005). Both pressures are well-known to affect language processing, including evidence from both experimental data (e.g., McDonald & Shillcock, 2003; Grodner & Gibson, 2005) and broad-coverage corpus studies (e.g., Demberg & Keller, 2008; Boston et al., 2010; Smith & Levy, 2013). This means that an ideal speaker (in the sense of ideal observers) should a) support low-probability —i.e., high information— messages with ‘better’ linguistic signals to the extent that this is warranted against the effort in implies (e.g., due to aiming for more precise articulations or due to articulation additional words, cf. Lindblom, 1990, Jaeger, 2006, 2013; Gibson et al., 2013) and b) aim for short dependencies (e.g., by reordering constituents, Hawkins, 2004, 2014).
Here, I ask whether the same pressures affect language learning, change, and/or the distribution of languages across the world. Case study 1 asks whether actual natural languages have syntactic properties that increase processing efficiency, as would be expected if the processing efficiency biases language learning and/or change. Using data from five large syntactically annotated corpora, I show that natural languages have lower information density and shorter dependency lengths than expected by chance (Gildea & Jaeger, in prep; for dependency length, see also Gildea & Temperley, 2010). Previous work has found similar properties for phonological and lexical systems (e.g, Manin, 2006; Piantadosi et al., 2011, 2012; Wedel et al., 2013). The present work is the first to find that the same properties affect even the syntactic system (which involves considerably more complex latent structure and has often been assumed to be encapsulated from functional pressures).
Case studies 2 and 3 employ an miniature language learning approach to the same question. I show that learners of such languages restructure them in a way that improves both the inferability of messages and the dependency length (Fedzechkina et al., 2011, 2013, under review; Fedzechkina & Jaeger, 2015). Unlike approaches that rely on statistical modeling of typological data, miniature language learning does not suffer from data sparsity and can —if applied correctly— assess causality by directly manipulating the relevant factors. Additionally, the approach I describe differs from almost all other miniature language learning experiments in that the observed learning biases were neither in the input nor in the native language of learners (see also Culbertson et al., 2012; Culbertson & Adger, 2014; though see Goldberg, 2013).
Time permitting, I will also present big data on the role of probabilistic transfer during L2 learning (Schepens & Jaeger, 2015). I close with some caveats about statistical work on these questions.
Some references to related work from my lab
- Fedzechkina, M. and Jaeger, T. F. 2015. ‘Long before short’ preference in a head-final artificial language: In support of dependency minimization accounts. The 28th CUNY Sentence Processing Conference. USC, CA, March 19th-21st.
- Fedzechkina, M., Jaeger, T. F., and Newport, E.2013. Communicative biases shape structures of newly acquired languages. In Knauff, M., Pauen, N., Sebanz, & I. Wachsmuth (eds.) Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci13), 430-435. Austin, TX: Cognitive Science Society.
- Fedzechkina, M., Jaeger, T. F., and Newport, E. 2012. Language learners restructure their input to facilitate efficient communication. Proceedings of the National Academy of Sciences 109(44), 17897-17902. [doi:10.1073/pnas.1215776109; IF: 9.681]
- Fedzechkina, M., Jaeger, T. F., and Newport, E. L. 2011. Functional Biases in Language Learning: Evidence from Word Order and Case-Marking Interaction. In Carlson, L., Hoelscher, C., and Shipley, T. F. (eds.) Proceedings of the 33rd Annual Meeting of the Cognitive Science Society (CogSci11), 318-323.Austin, TX: Cognitive Science Society.
- Schepens, J. and Jaeger, T. F. 2015. L2 phonological learning in adults: The role of language background, length of exposure, and age of acquisition. DGfS, Leipzig, Germany, March 4th- 6th.
City University London, United Kingdom
Inferring cultural transmission processes from frequency data
Cultural change can be quantified by temporal frequency changes of different cultural artefacts. Based on those (observable) frequency patterns researchers often aim to infer the nature of the underlying cultural transmission processes and therefore to identify the (unobservable) causes of cultural change. Especially in archaeological and anthropological applications this inverse problem gains particular importance as occurrence or usage frequencies are commonly the only available information about past cultural traits or traditions and the forces affecting them. Matters are further complicated by the fact that observed changes often describe the dynamics in samples of the population of artefacts whereas transmission processes act on the whole population. In this talk we start analysing the described inference problem. We develop a generative inference framework which firstly establishes a causal relationship between underlying transmission processes and temporal changes in frequency of cultural artefacts and secondly infers which cultural transmission processes are consistent with observed frequency changes. In this way we aim to deduce underlying transmission modes directly from available data without any optimality or equilibrium assumption. Importantly this framework allows us to explore the theoretical limitations of inference procedures based on population-level data and to start answering the question of how much information about the underlying transmission processes can be inferred from frequency patterns. Our approach might help narrow down the range of possible processes that could have produced observed frequency patterns, and thus still be instructive in the face of uncertainty. Rather than identifying a single transmission process that explains the data, we focus on excluding processes that cannot have produced the observed changes in frequencies. We apply the developed framework to a dataset describing the LBK culture.
Fermin Moscoso del Prado
University of California, Santa Barbara, USA
Granger causality, information theory, and language change: assessing the causal relations for change across the tiers of Icelandic grammar
University of California, Berkeley, USA
Word meanings across languages support efficient communication
A central question in the language sciences is why languages have the semantic categories they do, and what those categories reveal about cognition and communication. Word meanings vary widely across languages, but this variation is constrained. I will argue that this pattern reflects a range of language-specific solutions to a universal functional challenge: that of communicating precisely while using minimal cognitive resources. I will present a general computational framework that instantiates this idea, and will show how that framework accounts for cross-language variation in several semantic domains, including color, spatial relations, kinship, and number.
Google, Inc., USA
The statistics of non-linguistic symbol systems.
For 5000 years humans have been using visible marks to encode spoken language. For a far longer period, they have been using visible marks to encode concepts, ideas or, in general, a variety of non-linguistic information. When faced with an ancient symbol system whose meaning is unknown, can one tell if was linguistic (and therefore worth trying to decipher as a language), or some sort of non-linguistic system?
On the face of it, it seems reasonable to use as evidence statistical information on the behavior of symbols in the system. If the symbols distribute in a way that is similar to the distribution of elements (phonemes, morphemes, words, etc) in language, then this could serve as evidence that the system is writing. In causal terms, the fact that it is writing causes the system to show the statistical properties it has.
Recent work that has used this line of argumentation suffers from a variety of problems. First, while such work invariably makes the claim that the statistical measures used are evidence for structure, often the measures actually tell us little or nothing about structure. Second, even if the measures do relate to structure, do they specifically imply linguistic structure? A parse tree looks very similar to a tree that describes the structure of a mathematical formula, so structure per se hardly seems enough. This leads to a third problem with such work in that it depends to some degree on a widespread misconception that non-linguistic systems are structureless. Finally there is the question of whether sample sizes for such systems are ever large enough to make robust statistical claims.
In this talk I review the results of my own work on the statistics of non-linguistic symbol systems, and draw a mostly negative conclusion about the possibility of finding statistical measures that are useful in answering this question.
Date and Location
April 13 - 15, 2015
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
see travel instructions
- Damián Blasi, MPI for Mathematics in the Sciences and MPI for Evolutionary Anthropology, Leipzig
- Jürgen Jost, MPI for Mathematics in the Sciences, Leipzig
- Peter Stadler, Leipzig University, Interdisciplinary Centre for Bioinformatics, Leipzig
- Russell Gray, MPI for Human History, Jena
- Bernard Comrie, MPI for Evolutionary Anthropology, Leipzig
- Stephen C. Levinson, MPI for Psycholinguistics, Nijmegen
- Nihat Ay, MPI for Mathematics in the Sciences, Leipzig
- Sean Roberts, MPI for Psycholinguistics, Nijmegen
- Leonardo Lancia, MPI for Evolutionary Anthropology, Leipzig
External Scientific Committee
- Ewa Dabrowska, University of Northumbria, Newcastle upon Tyne
- Nick Enfield, University of Sydney, Sydney
- Simon Greenhill, Australian National University, Canberra
- Martin Haspelmath, MPI for Evolutionary Anthropology, Leipzig
- Steve Piantadosi, University of Rochester, Rochester
- Maria Polinsky, Harvard University, Cambridge
- Søren Wichmann, MPI for Evolutionary Anthropology, Leipzig
Max Planck Institute for Mathematics in the Sciences
Contact by Email