A new data-driven approach to a more sophisticated and detailed investigation of the history of chemistry is presented by a research team from MPI MiS and the Interdisciplinary Center for Bioinformatics at the University of Leipzig.
Publication in PNAS on June 11, 2019
Exploration of the chemical space and its three historical regimes
Eugenio J. Llanos, Wilmer Leal, Duc H. Luu, Jürgen Jost, Peter F. Stadler, and Guillermo Restrepo
Chemical research unveils the structure of chemical space, spanned by all chemical species. Its exploration is documented since more than 200 years in literature and databases. Very little is known, however, about the large-scale patterns of this exploration. Using mathematical tools the scientists analysed millions of chemical reactions, documented in Elsevier’s Reaxys database, to study the growth of production of chemicals and to address its variability.
In the paper “Exploration of the chemical space and its three historical regimes”, published in PNAS, the research team Prof. Jürgen Jost, Prof. Peter Stadler, Dr. Guillermo Restrepo, Dr. Duc H. Luu, Wilmer Leal and Eugenio J. Llanos presents a new data-driven approach to the history of chemistry.
Questions are: Has the exploration been affected by social events, especially World Wars, or scientific influences as introduction of new theories? Is chemical synthesis that central for the exploration as generally accepted? As there are more and more substances, therefore available substrates, can one identify the workings of substrate selection to explore the chemical space? Are chemists actually reaching new regions of the space?
The exploration of the space has followed three statistically distinguishable regimes. The first one included uncertain year-to-year output of organic and inorganic compounds and ended about 1860, when structural theory gave way to a century of more regular and guided production, the organic regime. The current organometallic regime is the most regular one. Analysing the details of the synthesis process, the scientists found out that chemists have had preferences in the selection of substrates and could identify the workings of such a selection. Regarding reaction products, the discovery of new compounds has been dominated by very few elemental compositions.
In the exploration of the chemical space from 1800 to 2015, chemists have reported new compounds at an exponential rate. The year-to-year variability of the report of new compounds has two historical drastic reductions affected by the World Wars, which temporarily reduced the production. However, after each war, chemistry recovered from these setbacks and returned to its long-term growth curve of about 4.4% annual growth. A similar trend also applies to scientific events such as the introduction of structural theory and the rise of organometallic chemistry, which marked the transition of regimes. At the transition, growth rates were somewhat perturbed, but again, chemistry quickly returned to the historical growth trend of 4.4%. This leads to the question of why chemistry maintains such a stable growth rate of 4.4% across different regimes despite major external perturbations. The scientists speculate that this derives from the intrinsic structure of the underlying network of chemical reactions, and they devise formal models to analyze this.
They could find out that the exploration of the space has been ruled since the early 19th century by synthesis, that is, even before Wöhler’s synthesis of urea in 1828, which is traditionally considered the beginning of organic synthesis. Nevertheless, for a long period, extraction was similarly important to synthesis, and the latter became the established tool to report new compounds only around 1900, i.e., 70 years after Wöhler’s synthesis and 40 years after the introduction of the structural theory. This time lag for a systematic shift in the practice of chemistry is remarkable.
In terms of the use of substrates and the production of compounds, chemists have been conservative in the selection of their starting materials, presumably as a disciplinary consequence of starting from substances that are readily available or as a way to develop valid and reliable expert intuition to explore the chemical space.
The exploration of chemical space, however, seems to have been rather uneven, with only a handful of compositions extensively explored. The set of explored combinations is narrow in the sense that a fixed-substrate approach is preferred. In fact, reported reactions typically include two substrates: one less known and the other part of the synthetic toolkit of preferred substrates, acetic anhydride leading since 1940.
In contrast to text-based approaches that focus on themes ad topics, the computational approach presented in this paper, using chemical compounds and reactions, goes to the very core of chemistry as a science. The scientists anticipate that the present work serves as a starting point for more sophisticated and detailed studies of the history of chemistry.
Picture: The historical exponential expansion of the chemical space, made of species reported in the scientific literature. Three statistical regions are identified with three different colours, along with their most relevant chemical species.