Personal data
Table of contents
Do you study people? If you collect data from or about a person, start with the assumption that you are processing personal data.
You must handle data responsibly and comply with the EU General Data Protection Regulation (GDPR) and Finland's national data protection law. Responsible processing of personal data is one of the basic starting points of ethically conducted research.
What is personal data?
The definition of personal data is broad. Personal data may include any information and characteristics relating to a person that could allow an individual to be identifiable. No precise information is required for identification, and there is usually some level of personal data in the dataset when people participate in research or you collect information from or about people. In addition, identification can take place by combining the data with, for example, additional information found on the Internet.
- Age
- Image (recording etc.)
- Speaking voice (for example, in interviews)
- Educational background
- Industry, workplace or profession
- Residential area
- Statements and opinions characteristic of the person
- Email address
- Ethnic background or nationality (special personal data)
- Health data (special personal data)
- Income
- Exercise habits
- Fingerprints
- Walking style
- A characteristic physical traits
- Information on the research subject's friends or other persons related to the research subject
- Information on the research subject's family or other third parties, for example information regarding a teacher's students
Personal data is therefore much broader than just name or background variables.
For example, interview responses or survey responses themselves often contain personal data. Similarly, belonging to the target group may constitute personal data.
For example, if you ask secondary school PE teachers about their own sports hobbies, the indirect personal data would already include their profession and sports habits/hobbies/physical activity.
There are different levels of personal data: some are data that alone are sufficient to identify a person, but any information related to an identifiable person is considered personal data.
Interviews always contain personal data, as the speaking voice is a so-called direct identifier, i.e. information that alone is sufficient for identification.
Personal data does not have to be particularly secret or intimate. What matters is whether the person can be identified. Nor does recognizability require that anyone can identify a person. It is enough if only family members or colleagues could identify the person. (Source: Data protection guidelines for researchers)
Justify the collection of personal data based on your research plan and/or research questions
In your research plan, you describe the research design and research objectives. It must be possible to justify the processing of personal data on the basis of these.
Minimizing information
- You may only collect personal data that you need for your research. For example, if the age of the research subject does not matter in your study, do not ask for it. Data minimisation is one part of protecting research subjects.
- It is also generally better to collect information such as age or years of work as categories; for example, work experience 0-4 years, 5-10 years, 11-15 years etc.
- Sometimes the interviewee tells more about themselves than the interviewer has asked. In this case, excess data will be removed from the data.
- Try to avoid collecting special categories of personal data (such as health data) or sensitive data (such as personal experiences of domestic violence or bullying) combined with direct identifying information (such as voice or name).
When processing personal data, the data may not be stored just anywhere. For example, you can't use Google's cloud services or iCloud. You will also need secure software and tools if, for example, you conduct interviews or surveys. The topic of information security will be discussed later.
Data protection is not intended to prevent research from being conducted, but the purpose is to protect research participants. It is important that the processing of personal data has a justification and legal basis and that the research subject is aware of what is being done with their data.
The guidelines for processing personal data do not apply to deceased or fictitious persons.
Special categories of personal data
- Ethnicity
- Political opinions
- Religious or philosophical beliefs
- Trade union membership
- Health information
- Sexual orientation or behavior
- Genetic and biometric data for the purpose of identifying a person
If your data contains special categories of personal data, it is very important that the data is handled responsibly.
The privacy notice has its own section where you inform about the processing of special categories of personal data.
Follow these protective measures (more on these later):
- Stricter requirements related to data security.
- Data minimization: you may only collect special categories of personal data that are necessary for carrying out the research. The amount of special personal data must be proportionate to the objective of the research.
- Pseudonymisation should always be carried out if it is possible. More on pseudonymisation later on this page under Pseudonymisation and anonymisation.
- Documentation of the processing of personal data: keep track of what you do, where the data is stored and what you have agreed with the research subjects.
For example, if you studied fatigue experienced by university students, you could get information about a respondent's depression, anemia, or other health data. In other words, the data would contain special categories of personal data.
The collection of data must be planned so that if special categories of personal data are not intended to be collected, they may not enter the data accidentally. If special categories of personal data might appear, plan the collection of data with the assumption that you are processing special categories of personal data.
TEST: DOES MY DATA CONTAIN PERSONAL DATA?
Test whether your research data contains personal data or special categories of personal data:
- You can change the language of the test on the top right-hand corner.
- Remember that the test results are indicative.
Research notification, privacy notice and consent form
According to the law, a privacy notice must always be made if personal data is processed in research. In addition, research participants will be given a research notification and a consent form. This is how you inform research participants, i.e. tell them about the processing of their personal data. As a rule, research participants have the right to know about the processing of their personal data.
You can find the university's instructions for students on the university's website. The aim has been to summarise the main parts of the instructions on the university's website in this course material.
Templates
The university's website contains templates for the privacy notice, research notification and consent forms.
Instructions for making a privacy notice:
- The privacy notice is a form in which you tell the research participant, for example, what personal data is collected, why, what is done with it and how it is secured. This is how you inform the research participants (ie. the data subject).
- The university's data protection guidelines contain templates for forms (privacy notice, notification, consent form).
- You can use the privacy notice template and also look at the pre-filled example.
- If necessary, ask your supervisor for help.
- The privacy notice will be provided to research subjects.
- Research participants are informed about the processing of personal data by means of a privacy notice and a research notification and consent form. The consent form is issued even if no signature is requested, as this provides additional information specifically related to consent.
- The author of the thesis is the data controller, i.e. responsible for the processing of personal data.
- The privacy notice explains the lawful basis for processing personal data(section 4. Lawful basis for processing personal data).
- The lawful basis for processing personal data may be the consent of the research subject. Use the template for privacy notice, research notification and consent form found under "Participants in coursework or theses".
- If the data contains special categories of personal data, the lawful basis for processing them is explicit consent, which is a separate section in the privacy notice.
- The template also has a legitimate interest as an alternative, which is only suitable for certain situations and requires a so-called balance test.
- If the thesis plan meets the criteria for scientific research according to your supervisor's assessment, public interest can be used as the lawful basis for processing. In this case, use the template in the scientific research privacy notice, research notification and consent form.
- If necessary, more information can be found in the university's data protection guidelines under Legal basis for processing personal data and requesting consent and Informing the data subject in the processing of personal data
What is the significance of the lawful basis for processing personal data?
- Research subjects' rights are based on the basis for processing stated in the privacy notice. E.g. If the basis for processing is consent and the research subject withdraws the participation, all collected data will be deleted, even if it is difficult. If the basis for processing is public interest, previously collected data will not be deleted, but the collection of data will be suspended.
If you are writing your thesis in a research group or from ready-made data, the controller is often a university or other research organisation.
- In this case, you usually do not make the privacy notice yourself. In the research project, the privacy notice may have already been taken care of and you have been mentioned in the privacy notice as a processor of personal data. In this case, you will have access to the data or part of it confidentially.
- A commitment to data processing is made with the project.
If the research subject is under 15 years of age, the consent of the guardian is usually required for the research
The privacy notice, notification and consent form are given to both the guardian and the child. The child should be informed in an age appropriate manner so that the child understands.
Providing a privacy notice to research participants
The privacy notice can be attached to an e-mail, given to research participants on paper or linked to the beginning of a Webropol survey.
- If you want to link the privacy notice in the survey: It can be 1) shared via SharePoint or 2) posted on a personal JYU , which is provided for students by the university, and link the website in the beginning of your Webropol-survey.
- In Webropol, you need to turn on the text editor to link (two T's next to each other in the upper right corner).
If providing a privacy notice to research subjects would cause unreasonable effort, it should be published. Everyone has that can be used to help with this. For example, if you investigate comments on a public social media channel, link the privacy notice to the comments.
What if I don't plan to publish the personal information I receive? Do I need to make a privacy notice?
It doesn't matter if you publish the information you receive. What matters is that you process this data, so yes, make a privacy notice.
Consent to participate in research
Research participants are always asked for their consent to participate in research.
The consent must be documented, i.e. it must be verifiable afterwards.
For example, document consent like this:
- By requesting a signature on the consent form or
- By asking for consent at the beginning of the recording of the interview or by having the research participant tick the box "I have read the privacy notice" at the beginning of the survey.
Consent is requested regardless of whether consent is the lawful basis for processing in the privacy notice. Research participants must have enough information about the research before they can agree to participate in it. For example, research participants must know what data is collected about them and why.
When consent is the legal basis for processing (i.e. it is not scientific research), special attention should be paid to the content of consent and how to request it, because giving consent must be an active action.
In the previous section, Privacy notice, Research notification and consent form, there were links to consent forms. These links are also listed here.
- Consent as a lawful basis for data processing data: form Consent to personal data processing and participation
- Public interest as a lawful basis for data processing (scientific research): form Appendix 7. Consent form for research subjects (the informed consent process 2/2) | Ä¢¹½Ö±²¥ (jyu.fi)
How do I verify consent in a survey?
At the beginning of the survey, attach the privacy notice, research notification and consent form. Put a mandatory box to tick to acknowledge the documents as read and understood.
It is a good idea to provide the research participant with a consent form without a signature box (or equivalent information), even if the consent form is not signed. This ensures that the research subject has the necessary information.
If the research subject is under 15 years of age, parental consent is usually required for the research.
Pseudonymisation and anonymisation
Personal data shall be pseudonymised whenever possible.
- This is a key part of protecting research subjects.
- Anonymisation may not be possible.
Pseudonymisation
Pseudonymisation
- The names, place of residence and other personal data of research subjects are replaced with codes.
- The code key is stored separately from the data in a secure location, such as a locked desktop drawer. It is still possible to establish the identity of the research subject.
- In practice, a code key is a list containing, for example, the name of the research subject and its corresponding alias or number sequence.
- In addition to the coding of direct identifiers, pseudonymisation requires that all indirect identifiers, for example in the open responses of a survey or in interview text, are removed.
- Indirect identifiers can be removed, for example, by using categories (e.g., age groups 15-20 years old, 20-25 years old, etc.) or by coarsening them (e.g., Viitasaari -> a municipality in Central Finland; Cygnaeus School -> a primary school in Jyväskylä). Note: not all background variables are identifiers! For pseudonymization, only those data that would help in deducing the identity of the research participant need to be modified.
- Rule of thumb: pseudonymous data is almost anonymous data, except for the destruction of the code key and consent forms.
Anonymisation
- The data is edited so that all personal data is deleted. This also applies to indirect personal data, such as place of residence or occupation. It is no longer possible to identify the research subject in any way.
- Note that genuine anonymisation is challenging. The anonymisation of data may even be impossible because anonymisation would remove so much content that the rest of the data would no longer be relevant. In addition, as technology advances, new ways of combining data may emerge.
- Anonymisation is one of the possibilities to open data to downstream users.
- So, if you're considering anonymization, think about whether it's realistic.
- Do not promise research participants that the data will be anonymous if this is not really possible.
If the data is genuinely anonymised, it is no longer personal data. Pseudonymised data, on the other hand, are still personal data.
I'm doing a survey asking for age and county. However, the data cannot be linked to voice mail. Do I need to file a privacy notice? Is it anonymous data or personal data?
The survey can be anonymous if, for example, all questions are on a scale of 1 to 5 and few categorical background variables are asked. For example, ask for age in categories, not exact age, eg. between 20-30 years. If the survey has open-ended question fields, there is a greater chance that it also contains personal data. If the survey is anonymous, i.e. no one can be identified directly or indirectly by combining the data, a privacy notice does not need to be submitted. However, you will need a research notification and consent to participate.
If there is even the slightest chance of identifying the respondent, especially if you collect indirect identification data and open-ended responses, also prepare a privacy notice.
Always ensure that you use secure survey software such as Webropol (basic personal data) or REDCap survey software (special categories of personal data). The use of Google Forms, for example, is prohibited. N.B! Always remember to remove survey responses from the software after processing has ended.
When is a person identifiable?
Remember that identification can take place by combining information from different sources.
- If the research subject's place of residence and occupation are mentioned, and it's a small town or a rare occupation, the information may be sufficient to identify the person.
- For example, the president is easily recognizable because the presidency is a position held by only one person at a time. However, the president could also be interviewed, as long as he is told that he is identifiable.
- Personal data is classified into direct identifiers, strong indirect identifiers and indirect identifiers.
- Direct identifiers, such as name, are obviously personal data, but personal data also includes other information, characteristics, factors, actions and behaviour relating to a person.
- For more information, see FSD's
Other sensitive or confidential material
In addition to personal data, sensitive and confidential data may include, for example:
- business secrets,
- information on endangered animal or plant species,
- information relating to national security,
- criminal convictions , etc.
Please note that the thesis should not deal with highly sensitive information, such as the aforementioned "national security information". The thesis is a public document. Sometimes a topic of interest to a student may be too ethically challenging for a thesis. It is not necessarily about the student's skills, but about what it is possible to do within the framework of the thesis.
Checklist for handling personal data
- Identify what personal data you collect or process.
- Who is the data controller?
- Assess risks.
- A light, informal risk assessment is part of all personal data planning and collection. Consider whether the collection and/or processing of personal data could pose risks to the subjects, yourself, or outsiders. .
- Data protection impact assessment (DPIA).
- If the processing of personal data poses a high risk to people’s rights and freedoms, a data protection impact assessment must be carried out before collecting and processing the data. Instances of such high-risk situations include, for example, processing personal data with new technology and large-scale processing of personal data. . On the intranet of the Ä¢¹½Ö±²¥, you can conduct an initial mapping to determine whether your research requires an impact assessment and, if necessary, carry out a DPIA with your supervisor. You can find links to the forms on Assessment tools to support data protection implementation website.
- Provide the research participant with a privacy notice, a research notification and a consent form (this is done to inform the research participants).
- If you receive a dataset from a project for example, you do not submit the privacy notice yourself. Instead, a commitment is made with you regarding the processing of personal data ( are only available to staff).
- Verify and document the consent of the research subject.
- Do you need a research permit?
- Organisations may require a research permit if their students or staff participate in research, for example if you want to interview teachers at a certain school. Check the permission policies with the target organization. Example of JY's permit requirements Research permit | Ä¢¹½Ö±²¥ (jyu.fi)
- Only collect personal data that is relevant to your research (minimisation).
- Document the processing of personal data.
- If possible for your research: do pseudonymisation or anonymisation and avoid collecting direct identifiers to begin with.
- Ensure data security (e.g. storing data on the university's U-drive and using secure devices).
- Use safe software and equipment for collecting and processing personal data.
- Safe software and equipment are introduced in the section Data security.
- Ethical review:
- If a research meets certain criteria, an ethical review must be requested.
- Students should avoid conducting research in their thesis for which they should apply for ethical review or conduct a DPIA.