Workflows for structuring mathematical research data


In the mathematical sciences, a large amount of data is regularly generated. This includes numerical data used and produced for specific computations, but also information of models, calculation methods, software, notations, etc. While there is now a growing awareness of the need to make numerical data findable, accessible, interoperable and reusable (FAIR), this awareness is not yet very pronounced for the other mentioned forms of data that arise. However, the research community can also benefit greatly for precisely these forms of data if they are stored according to FAIR principles, as we will explain in our talk.

As part of a solution approach, we present and discuss first results obtained by the interdisciplinary working group of the mathematical research data initiative MaRDI, which enable a fair storage of information on used workflows and models. In the process, the potentially complex work processes are recorded in a structured manner in a workflow schema. It combines a description of what is done with auxiliary information like software and hardware used as well as the applied mathematical models and solution techniques, and algorithms with their corresponding properties. For the models that are used in research, we develop an ontology, which allows to classify single components and quantities and to implement a knowledge graph that can then be browsed via a web platform. In this way, a wide variety of models and problems are linked together so that researchers can identify connections between different applications and exploit possible synergies.

The concepts designed so far are demonstrated on specific use cases involving complex mathematical models, parameter identification and data acquisition. They originate from applied research and therefore touch on diverse areas of mathematics, such as modeling, optimization, statistics, or machine learning - they are therefore not limited to a specific area of mathematics. The gains of documenting research data in such a scheme are compared to the expected additional effort.

Tabea Bacher

Max-Planck-Institut für Mathematik in den Naturwissenschaften Contact via Mail

Tabea Bacher

Max-Planck-Institut für Mathematik in den Naturwissenschaften

Tabea Krause

Universität Leipzig

Daniel Mietchen

FIZ Karlsruhe

Björn Schembera

Universität Stuttgart

Olaf Teschke

FIZ Karlsruhe