AI in information seeking

Table of contents

Artificial intelligence (AI) is an umbrella term encompassing the theory, development, and study of systems and machines performing tasks that require human intelligence. Here, we utilise the term to refer to software and applications that use AI technologies to assist in information retrieval and information production.

AI systems and applications each have their own purposes: some are developed for very narrow tasks, while others are general-purpose. The ability to choose the right tool for the right purpose is part of being AI literate. However, AI literacy also includes testing and critically evaluating the performance of different tools for different tasks.

Links to tool introductions and usage tips can be found in the following text section.

Most of the applications presented on this site utilise large language models (LLMs) to some extent, and have a generative component. Generative AIrefers to an AI model that can create new content, such as text, images, audio, or code, based on training data or prompts.

Generative AI models – as well as AI more broadly – involve research ethical, legal, and moral questions that every user should be aware of. In most cases, the user is ultimately responsible for the ethical and lawful use of AI tools. You can find more information about these issues at the bottom of this page.

Please note that information related to AI becomes outdated quickly. Application interfaces are updated, and the legislation and interpretations regulating AI evolve. Although these pages are updated regularly, always use other sources as well and ensure that you have the most current information!

AI tools

Purposes and uses of AI tools

From the perspective of information seeking, useful applications can be divided into three categories based on their purpose:

  • Searching for scientific sources - search engines that utilize AI and are either integrated into academic databases, or otherwise linked to databases indexing citations.
  • Searching for up-to-date information online - combinations of AI and search engines.
  • Brainstorming the research and information seeking process - chatbot applications.

Instructions and guidelines

Students and researchers affiliated with the university need to utilise AI tools according to the guidelines and policies of the Ģֱ.

Research ethics

From the perspective of research ethics, the challenges of generative AI models include:

  • The “black box” problem – The decision-making processes of AI systems, especially those utilizing deep learning, are often opaque. This makes it difficult for users to fully understand and explain how, for example, a generative chatbot arrives at a particular outcome.
  • Replicability – One of the goals of scientific research is replicability, meaning the ability to repeat the research setup and verify the results. However, generative AI models designed for creative tasks contain randomness in their operation. Even if the input is the same, the outcome may vary each time the model is run.
  • Training data – The data used to train an AI model may be of poor quality, contain biases, misinformation, disinformation, and harmful stereotypes. A generative chatbot may plagiarize training data either directly or by slightly modifying the original text. Additionally, data input by users themselves may end up in the training material for models unless explicitly prohibited.

The guidelines for by the Finnish National Board on Research Integrity (TENK) emphasize ethically sustainable methods for data acquisition, research, and analysis. However, it can be difficult for an individual user to assess the ethical sustainability of AI applications, as this involves questions such as: Under what working conditions was the training data cleaned and annotated? Or, what environmental, economic, and societal impacts might the training and use of the underlying AI model have?

Document and report

  • Transparency: When utilizing AI in scientific activities, both software developers and users should strive for maximum transparency and openness. Developers should explain how the software works and what data it has been trained on. Users, on the other hand, should clarify the role of AI in the research process and provide detailed information on what tools were used and how they were utilized in the research.

When you are using AI tools in your studies or scientific activities, document the usage carefully. Documentation is a prerequisite for being able to report the use of AI truthfully and in accordance with good scientific practice. Remember that according to the student guidelines of the Ģֱ: "If generative artificial intelligence applications have been used in completing assignments or theses related to studies, the work must mention which application was used and how it was used." 

The requirement for transparency permeates all scientific activities. When planning scientific publications, it is also advisable to familiarize oneself with the policies of the future publication channel regarding the use and reporting of AI applications.

AI, personal data, and data protection – Guidance for undergraduate and doctoral students

refers to any information that relates to an identified or identifiable individual. Data protection, on the other hand, involves safeguarding personal data and regulating its processing.

If the Ģֱ acts as the data controller, or if data is handled in an employment relationship with the university: 

Refer to the university's policies regarding which (AI) applications are permitted to process personal data. Only input personal data into the application if it is specifically designed to process such data in compliance with data protection laws and if it is offered by the Ģֱ, and accessed using JYU credentials.

If you are an independent data controller and not employed by the university:

Check if the university provides a centrally acquired AI tool applicable for student use.

Before utilising an AI-assisted application, gather essential details regarding how the application handles personal data:

  • What personal data does the application collect from its users (i.e. you)?
  • Does the application provider place any limitations on processing personal data within the application (i.e. is the entry of personal data permitted or prohibited)?
  • Does the application comply with the General Data Protection Regulation (GDPR) of the European Union? Where does the processing of your personal data – or any data you input – take place?
  • Does the service provider permanently retain the personal data entered into the application? If so, then do not use the application!

Assess the data you intend to enter into the application:

  • Did you obtain the material, or any portion of it, from a third party? Are there any contractual limitations on the usage of the material?
  • Is it possible for you to anonymize the material before using AI or minimize the personal data input into the application?
  • Is the use of the application for processing personal data in line with how you have informed the data subjects? For example, is the application provider mentioned as a data processor in your privacy notice? Or does using the application result in the transfer of personal data outside the EU, despite informing the subjects that processing occurs within the EU?

Ensure that your actions are consistent with the information you have communicated to the research participants or other data subjects!

AI and copyrights – Guidance for undergraduate and doctoral students

Copyright basics and legislation

grants the author the exclusive right to control the usage of their work, subject to specific limitations. The present copyright legislation does not specifically address artificial intelligence. As a result, the current law needs to be interpreted and adapted to accommodate this emerging context. Copyright concerns encompass both the use of works as training data for AI and the submission of copyrighted material for AI processing. The discussion is further complicated by the fact that copyright legislation differs somewhat between the United States and Europe/Finland. U.S. copyright law acknowledges the "fair use" principle, which has been cited to support the training of AI using publicly accessible, yet copyrighted, material from the internet.

The came into force on August 1, 2024, and will be mainly applied from August 2, 2026. The AI Act complements copyright legislation. The definitions and references to current copyright laws included in the AI Act help to evaluate and address copyright issues when using AI. The AI Act's central regulatory target, however, is the AI system, defined as "a machine-based system that is designed to operate with varying levels of autonomy and that may exhibit adaptiveness after deployment, and that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments."

Reproduction of permanent copies

When material is uploaded into an AI system and remains in the system, it undoubtedly constitutes  reproduction which requires the copyright holder's consent. Additionally, making the material available to the public can happen if the AI system subsequently replicates the uploaded work either in substantial portions or in too similar adaptations. However, it is more difficult for a typical AI system user to evaluate this later usage. Check how the AI system functions and deactivate the storage of copies and the use of inputted material for training if needed. 

Rule of thumb: get permission from the copyright holder if the AI system provider retains a copy of the material you upload for its own purposes.

Temporary reproduction

From a copyright viewpoint, the scenario is less clear-cut if the AI system maintains only temporary copies of the material you upload. This might fall under temporary reproduction allowed by Section 11a of the Finnish Copyright Act as an integral and essential part of a technological process. For private use (which includes educational use), it is also possible to invoke Section 12, which allows the reproduction of works for private use, provided the work has been made public. Nonetheless, reproducing a computer program or a database for personal use is prohibited. 

There is no case law yet on this so-called input data. Rights holders or organizations representing them are less permissive about the use of AI. For example, the is that when copyright-protected content is uploaded into an AI system, it constitutes copying (or reproduction) for which the author's or publisher's permission is required. This perspective may arise from concerns about the professional use of AI-generated content and the belief that copyright-protected material stays with the AI system's provider.

From the AI user's perspective, the situation is unfortunate. It is well known that masses of copyright-protected material have been used in AI training (although legal cases concerning the AI training use are still unresolved). At the same time, rights holders seek to restrict AI use based on copyright grounds, even in use cases where AI's productivity-enhancing features would come into play. Strictly interpreted, the reproduction restriction would prevent, for example, having an AI system summarize a scientific article, unless copyright permission has been obtained. In practice, however, having an AI summarize an article or asking questions about its content, is the first action AI applications suggest, and for which many of them are designed for. For study purposes, a practical interpretation is that reproduction needing the copyright holder's consent happens only if the copy stays with the AI service provider beyond a temporary period.

Reproduction of works for text and data mining

Section 13b of the Copyright Act allows the reproduction and storage of works for text and data mining. Text and data mining activities are permissible unless explicitly restricted by the authors. However, if the mining is executed for scientific research within a research institute or cultural heritage facility, rights holders cannot prevent it. Requirement for mining is having lawful access to the material, which may include content behind a paywall if access rights have been appropriately acquired.

The exception for text and data mining can also permit the reproduction of works to be analysed using AI. Even though you can use an AI application for data mining, you cannot grant the AI application rights to permanently retain the input works or use them for AI training purposes.

The EU directive on copyright and related rights in the Digital Single Market (the so called DSM directive, EU 2019/790) defines that the exception for text and data mining applies not only to research organizations but also to individuals associated with them. As a result, the exception benefits not only university employees but also students, emeritus/emerita professors, and grant researchers. However, the research must be conducted under JYU affiliation and meet the criteria for scientific research.

Utilizing AI-generated results

Generally, AI-generated content is not subject to copyright as it is not deemed original. A piece of work is considered original only if it embodies the personality of a human individual, demonstrated through their free and creative decisions. Nevertheless, it remains possible that a method could eventually be invented wherein AI usage genuinely reflects the user's independent creative decisions, thereby granting copyright to the user of the AI. However, the standard for this would be very high.

A common concern is whether AI outputs might violate existing copyrights. This can happen if AI generates an adaptation or even a direct copy of an existing work. When users ask AI to create something like a literary or visual piece with just a short instruction, it's challenging for them to judge the risk of copyright infringement, unless the works are widely known or recognized by the user. Emulating the distinctive style of a specific artist represents a borderline scenario. This practice could have detrimental effects on content creators. Therefore, for example, is designed to reject user requests to imitate the style of a living artist. It is also in the interest of AI service providers that users feel safe using the service and that no copyright infringement claims arise from its use. to responding to copyright infringement claims arising from the (proper) use of its Copilot AI.

The user is in the best position to evaluate any potential copyright infringement of the material they upload to AI systems. When inputting material into an AI system and prompting it, the user should ensure that they do not ask AI to create and later use a work, which could be interpreted as an adaptation of a copyright-protected work uploaded to the system. Modifying and making a work available to the public requires the consent of the copyright holder. Using AI to remove, add, or change parts of a work are adaptations. Examples of such use could be requests like "enlarge the mermaid's tail in this image by 30%" or "add a section on the U.S. fair use doctrine to this article and compare it to the Finnish copyright law." The exclusive right of the author also includes deciding on the translation of their work into another language. However, the Copyright Act allows the use of a published work in parody, caricature, or pastiche.

Summary 

  • What is permitted:
    • Using AI for study purposes to analyze texts
    • Using AI for translation when the purpose is to increase your understanding of the subject
  • What is prohibited without the copyright holder's permission:
    • Allowing AI systems rights to the input content beyond just temporary analysis, such as for AI training purposes
    • Publishing adaptations of existing works created with AI. Parody is however permitted.
    • Producing AI-generated works that closely mimic another artist's style

It is important to keep in mind the university's general policies regarding the use of AI in academic work, as well as any specific instructions related to your courses. For instance, AI usage may be completely banned in certain subjects like language studies.