Data management glossary

Data management glossary

Key terms in data management

Research planning

The Ģ��ֱ�� requires a data management plan for all research data resulting from research conducted at the University. The data management plan outlines the key characteristics of the data, the management of ethical issues, rights, and obligations related to the data, the planned lifecycle of the data, and the plan for opening the data.

Research data management is guided by the and the research data aligned with them. The FAIR principles ensure the findability, accessibility, interoperability, and reusability of the data. The FAIR principles can be implemented in a variety of ways. Research at the Ģ��ֱ�� aims for the broadest possible implementation of the FAIR principles, taking into account the specific characteristics and limitations of each dataset.

The University and key funders require that the management of research data be planned and implemented to ensure that data can be made available for reuse in the most open way possible at the appropriate stage. Openness is the guiding principle, provided there are no ethical or legal restrictions, such as those involving personal data, confidentiality, or trade secrets. Where restrictions apply, the principal investigator must justify the reasons in the data management plan and in the descriptive metadata. Open data and FAIR data are not synonymous; data can be highly restricted in terms of access rights and still be considered FAIR. If open publication is not feasible, openness can be maintained, for example, by archiving the data in a repository, where it can be made available upon request through a contract. See also "Opening up the data".

Ethical and legal actions

The rights to use, publish, and distribute research data generally belong to the original creators, unless otherwise agreed upon, for example, in a project or collaboration agreement. When the data creator is employed by a university or produces data with the support of an external funder (especially the Research Council of Finland or the EU), the rights to the data are owned by the creator’s home organization. This makes it easier, for example, to transfer the data to the organization for publication or archiving when the research concludes. According to the current interpretation of the legislation, research data cannot be owned as such; rather, the discussion revolves around the rights and agreements associated with them.

Disclosure of personal data refers to the transfer of data for a legitimate reason to another research project while the original research is ongoing. The Ģ��ֱ��'s data privacy notice includes a model text for such disclosures. The disclosure is always for a limited time, and a separate contract is established between the data controller and the receiving party.

Detailed instructions: Data privacy at the Ģ��ֱ��: Instructions for researchers

Documentation and metadata

The basic descriptive information (metadata) is maintained in the data section of the University's . Metadata are always published unless otherwise restricted by mandatory legislation, official orders, or contractual agreements with the data provider. The publication of metadata is the minimum requirement to ensure the findability of the data.

In addition to the basic descriptive information, it is necessary to provide more detailed descriptions of the data, such as its variables, methodology and processing. This information, along with contextual and structural descriptive information, or documentation, is maintained in separate files alongside the data and linked to the basic descriptive information in Converis.

Documentation also refers to the process of documentation as part of the study's workflows. A sufficiently comprehensive set of explanatory documentation is essential for ensuring the long-term interoperability and usability of the data.

A README file is an efficient way to create documentation, i.e., a description of the content. The purpose of a README file is to serve as an accompanying document that explains the content and structure of the data, ensuring it can be accurately interpreted and correctly understood long after the data have been collected. This helps prevent the material from becoming unusable, misinterpreted, or misused.

Lifecycle of research data when the research project ends

Making research data available means that the researcher provides the appropriate parts of the data for further use, such as for new research or other purposes, either openly or with limited access rights. Open availability means the material can be downloaded online by anyone from a high-quality repository. However, openness may be restricted if the material cannot be published openly due to ethical or legal reasons (e.g., personal data). In such cases, data management can be structured so that the researcher deposits the data in the repository with limited, controlled access rights at the appropriate stage of the research. For example, the material may be made available solely for research use or for research, teaching, and thesis purposes only. The repository assists the researcher in publishing, alongside the metadata, the restriction criteria and conditions for authorized access.

Archiving data involves the researcher assessing its value and determining that its nature and content make it worthy of archiving from research, historical, or cultural perspectives. The researcher then hands over the data to a research archive of their choice, which is responsible for its long-term preservation, distribution, and curation. Storing the data on university or personal storage systems such as Nextcloud, web drives, or external hard drives is not considered archiving but continuing storage. Examples of trusted repositories include the future repository of the Ģ��ֱ�� (in development), the Language Bank, the Finnish Literature Society's archive, and the Finnish Social Science Data Archive.

It is possible and advisable to publish the data at the appropriate stage of the research, provided it does not contain personal data or other confidential information—that is, it is anonymous and otherwise non-sensitive. The researcher(s) must also have the rights to publish the data. For this reason, data produced by the researchers themselves is generally suitable for publication. Publishing involves the researcher providing the material and the necessary documentation to a digital repository of their choice, which then receives and publishes it online for open downloading. A repository that complies with the FAIR principles supplements the data with descriptive information and assigns it a permanent identifier (usually a DOI). The researcher should also select an appropriate open license for the material, such as Creative Commons Attribution 4.0 International. The repository will assist with the choice of license if necessary.

A data repository is an open publication service for research data that provides tools for describing the data (metadata) and publishes it in accordance with the FAIR principles, including the use of a persistent identifier (PID). For example, JYX is a repository. In contrast, a data archive operates under archival legislation and is responsible for archiving research data.

When data is published online and made openly accessible to all, the author chooses the appropriate license. The license allows the data to be used for new purposes in accordance with the FAIR principles, as long as the user is clearly informed of the purpose and conditions under which the data may be used. The license must be both human-readable and machine-readable, and the repository acting as the publishing channel is responsible for publishing it along with the material. If necessary, the Open Science Centre's data specialists can assist with choosing the appropriate license.

For research material, the usual licenses are those from the Creative Commons 4.0 family.

For code and software publications, the recommended licenses are the GNU and MIT licenses.

Ģ��ֱ��

Data management glossary

Research planning

Data management plan, DMP

FAIR principles

Open data

Ethical and legal actions

Rights and ownership

Copyright

Special categories of personal data and other confidential information (sensitive data)

Pseudonymous and anonymous data

Disclosure of data

Documentation and metadata

Basic descriptive information (metadata)

Documentation

README file

Lifecycle of research data when the research project ends

Opening and reuse

Valuation and archiving

Publishing

Data repository

License