Exploring the Chemical Space (01.05.2020)

Wilmer Leal, Eugenio J. Llanos, Duc H. Luu, Marzieh Eidi, Peter F. Stadler, Guillermo Restrepo, Jürgen Jost

Chemical reactions turn educts into products. By now, more than 50 million reactions involving more than 30 million chemical substances have been reported in the literature and in patent bases, and all this information is collected in the Reaxys© database that continuous the traditional chemical handbooks of Gmelin and Beilstein in electronic form, and one can follow the annual growth of chemical knowledge for more than two centuries. We have explored statistical properties of this data set and are developing mathematical tools to analyze the emerging chemical space.

We have found that the annual growth rate of chemical knowledge has been remarkably constant over time, at a value of about 4.4%, interrupted only by the two world wars, but quickly recovered afterwards. Nevertheless, we can distinguish three different statistical regimes: inorganic, organic and organometallic; with different variabilities in the production of compounds and rather sharp transitions, from the most exploratory regime, the inorganic one, to the least fluctuating one, the current organometallic regime. The transition from the inorganic to the organic regime, about 1860, coincided with the incorporation of the structural theory. The pass to the organometallic regime was characterized by a sharp increase of carbon-metal compounds about 1980.

We have also analyzed the patterns of the use of particular educts and the synthesis of particular products and have observed both conservative patterns and transitions in the history of chemistry.

From a more abstract perspective, we can view the chemical space as a (huge) directed hypergraph whose vertices are the chemical substances and whose hyperedges link sets of educts to sets of products. Analyzing this hypergraph requires the development of new mathematical tools. We want to understand how different the actual chemical hypergraph is from a random one, and for that, we need baseline models of random hypergraphs. We also want to identify those hyperedges that are particularly important for bringing together heterogeneous ingredients and branching out in different directions. We want to see the local density of alternatives to chemical pathways. And we want to detect large scale structural properties. Some powerful mathematical tools that we have developed for those purposes are so-called hypergraph curvatures. They are called curvatures because they share the abstract properties of curvatures in Riemannian geometry, like relative closeness of neighborhoods, volume growth, or coupling properties of random walks.


[1]   E.Llanos, W.Leal, Duc H. Luu, J.Jost, P.Stadler, G.Restrepo, Exploration of the chemical space and its three historical regimes, PNAS 116, 12660–12665, 2019
[2]   W.Leal, G.Restrepo, P.Stadler, J.Jost, Forman-Ricci Curvature for Hypergraphs, arXiv 1811.07825, 2018
[3]   W.Leal, M.Eidi, J.Jost, Ricci curvature of random and empirical directed hypernetworks. Applied Network Science. 2020
[4]   M.Eidi, J.Jost, Ollivier Ricci curvature of directed hypergraphs, arXiv 1907.04727, 2019

04.05.2020, 14:56