Statistical signatures of complexity in natural language
- Eduardo Altmann (Max-Planck-Institut für Physik komplexer Systeme)
The statistical analysis of the frequency of different words reveals numerous similarities between language usage and other complex systems. Two prominent examples are Zipf's law, the power-law decay of the word-frequency distribution, and the presence of long-range correlations in texts. In this talk I will propose simple models which explain these two well-known empirical observations and shed some light on their origin. The unprecedented amount of written texts available for investigation (e.g., in the Internet) provides new motivations and opportunities to the quantitative investigation of these and other problems in statistical natural language.