Research Data Management (RDM)

Open access to scientific publications is closely linked to open access to research data. In accordance with the principle of “as open as possible, as closed as necessary”, the issue of research data management (RDM) is at the center of interest not only of financial providers, but also of scientific institutions themselves. Improving the visibility and availability of research data will not only lead to better reproducibility and transparency of scientific research, but will also help increase the visibility of the results of the researcher and the institution as a whole.

What are research data?

Research data are those data that are created and collected in order to produce, verify or better understand research results. Research data can be qualitative or quantitative, factual or non-factual, numerical, textual or audiovisual. Research data can be digital or non-digital.

They can include, for example:

Observation data that are obtained in real time and areunique and irreplaceable (eg. brain images, data from interview).
Experimental data from laboratory equipment.
Simulation or model data (economic or climatic models).
Derived or compiled data that result from the processing or combination of other “raw” data (text mining).

Research data are therefore not just tables, but can also be audio or video recordings, laboratory diaries, questionnaires, photographs, as well as software and scripts.

What is RDM good for?

Research Data Management involves acquisition, organization, long-term storage and sharing of the data. It is good to focus on this area at the beginning and during the research, a Data Management Plan (DMP) can be useful for you.

DMP is a living document that you keep up to date so it really describes what is happening to your data. The DMP contains information on what data and how you will process – how you will generate the data, where they will be stored, who will have access to them and under what conditions data can be reused.

You can use DMPonline or the Data Stewardship Wizard to create a DMP. The Science Europe template could be useful as well.

Why should you bother with research data management?

By sharing your data, anyone can replicate your research and results — you can easily defend your conclusions and avoid possible attempts to cast doubt upon your results in the future.
Good planning will help you to anticipate potential problems (eg how your colleagues from other institutions can access the data; long-term storage and possibly anonymisation of data etc.) as well as the necessary costs (eg storage space), which you can include in grant applications.
You can easily find and use the data againin the future.
DMP is required by some funders (eg European Comission).

Florian Markowetz from the University of Cambrigde also summarized the main reasons for good RDM in his talk: https://doi.org/10.1186/s13059-015-0850-7.

For those interested in the issue, we also recommend the e-learning course of the Open Science Support Centre at Charles University. It is possible to log in as a guest and study the materials.

FAIR data

Research data may not all be open and shared, but should be FAIR. What does it mean? Fair is an acronym for Findable, Accessible, Interoperable and Reusable. FAIR data are therefore:

Findable — both humans and machines are able to find your data. This is mainly due to a good machine-readable metadata description of the data – metadata describes what the content of your data is. In addition, your data and metadata are provided with a unique and permanent identifier (eg DOI) and are registered in sources that are indexed by search engines. For example, you can save and register your data in the Zenodo repository, where we have a university account.
Available — Metadata should always be available. The data themselves only in cases where other facts do not prevent it (protection of personal data, etc.). The potential user should be able to easily find out under what conditions he can access the data and possibly use them.
Interoperable — can be connected to other applications and systems (eg with other datasets). This means, for example, that both data and metadata meet field standards and are made available in appropriate formats.
Reusable by other interested parties — shared under an open license. The data must also be well described so that it is clear to everyone how you obtained and processed the data.

You can verify how FAIR your data is with the help of a simple checklist.