Kemian laitoksella vauhditetaan kiertotaloutta ja hillitään ilmastonmuutosta.

What is research data?

Research data is a broad concept and covers many different types of data and code.

At the heart of good and successful data management is the ability to identify your own research data and their key characteristics, so that you can plan and implement their management in accordance with good scientific practice and FAIR principles.

By definition, research data refers to the data produced and/or utilised during the course of the research, on which the research findings are based. In broad terms, everything between the starting point of the research and the final results can be interpreted as research material. In the same study, data collected for the research (e.g. numerical data, text, images, AV data, software, algorithms, measurement results, samples), data resulting from the research process (e.g. documentation, laboratory and field work diaries, analyses, databases), data collected previously (e.g. Statistics Finland, archives) and further processed by codes, can all be considered. Research data is the term used when referring specifically to digital research data.

A process graph describing the phases of research data management

It is particularly important to identify databases, compilations and similar secondary and derived data produced during the research. These should be managed in the same way as the original (primary) data and, in fact, are often both easier and more meaningful to open and publish than primary data. This is particularly important in studies that are based on the use of previously collected data. 

For example, in historical research, original data are held by archives, which are responsible for their preservation and distribution, and may also manage the rights to them. In contrast, databases, corpora and other derived material compiled from these materials are managed and, where appropriate, published by the researcher.

Composition and main characteristics of research data

Research data can consist of several different parts.For data management, it is important to identify those parts of the data that share the same key characteristics that affect its management. 
Correctly dividing the data and treating the parts separately allows for proper targeting of interventions, minimises risks and allows for opening of parts of the data if the part that is kept closed is kept separate.

Key characteristics of the data include the origin of the data (e.g. whether it is produced, self-collected, sample, reusable, derived), the file format and size, the rights of use and ownership of the data, whether the data contains personal data or other sensitive information, and the intended life cycle of the data (whether it is destroyed, retained, archived or opened). These characteristics may vary from one type of data to another within the same study.

When considering the composition of the research data, particular attention should be paid to the interlinked questions of the personal data contained in the data and the life cycle of the data. Without careful planning and appropriate information, data containing personal data should be destroyed at the end of the study. The discoverability, accessibility and reusability of the data can only be ensured by proper prior information, taking into account the end of life of the data.

If the end point of the data is publication or archiving, the compilation should also be considered in terms of the sections into which the data are fragmented and described to improve their discoverability and usability. More on this in the instructions for opening the data.