AI in information seeking
Table of contents
Artificial intelligence (AI) is an umbrella term encompassing the theory, development, and study of systems and machines performing tasks that require human intelligence. Here, we utilise the term to refer to software and applications that use AI technologies to assist in information retrieval and information production.
AI systems and applications each have their own purposes: some are developed for very narrow tasks, while others are general-purpose. The ability to choose the right tool for the right purpose is part of being AI literate. However, AI literacy also includes testing and critically evaluating the performance of different tools for different tasks.
AI and copyrights – Guidance for undergraduate and doctoral students
Copyright basics and legislation
grants the author the exclusive right to control the usage of their work, subject to specific limitations. The present copyright legislation does not specifically address artificial intelligence. As a result, the current law needs to be interpreted and adapted to accommodate this emerging context. Copyright concerns encompass both the use of works as training data for AI and the submission of copyrighted material for AI processing. The discussion is further complicated by the fact that copyright legislation differs somewhat between the United States and Europe/Finland. U.S. copyright law acknowledges the "fair use" principle, which has been cited to support the training of AI using publicly accessible, yet copyrighted, material from the internet.
The came into force on August 1, 2024, and will be mainly applied from August 2, 2026. The AI Act complements copyright legislation. The definitions and references to current copyright laws included in the AI Act help to evaluate and address copyright issues when using AI. The AI Act's central regulatory target, however, is the AI system, defined as "a machine-based system that is designed to operate with varying levels of autonomy and that may exhibit adaptiveness after deployment, and that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments."
Reproduction of permanent copies
When material is uploaded into an AI system and remains in the system, it undoubtedly constitutes reproduction which requires the copyright holder's consent. Additionally, making the material available to the public can happen if the AI system subsequently replicates the uploaded work either in substantial portions or in too similar adaptations. However, it is more difficult for a typical AI system user to evaluate this later usage. Check how the AI system functions and deactivate the storage of copies and the use of inputted material for training if needed.
Rule of thumb: get permission from the copyright holder if the AI system provider retains a copy of the material you upload for its own purposes.
Temporary reproduction
From a copyright viewpoint, the scenario is less clear-cut if the AI system maintains only temporary copies of the material you upload. This might fall under temporary reproduction allowed by Section 11a of the Finnish Copyright Act as an integral and essential part of a technological process. For private use (which includes educational use), it is also possible to invoke Section 12, which allows the reproduction of works for private use, provided the work has been made public. Nonetheless, reproducing a computer program or a database for personal use is prohibited.
There is no case law yet on this so-called input data. Rights holders or organizations representing them are less permissive about the use of AI. For example, the is that when copyright-protected content is uploaded into an AI system, it constitutes copying (or reproduction) for which the author's or publisher's permission is required. This perspective may arise from concerns about the professional use of AI-generated content and the belief that copyright-protected material stays with the AI system's provider.
From the AI user's perspective, the situation is unfortunate. It is well known that masses of copyright-protected material have been used in AI training (although legal cases concerning the AI training use are still unresolved). At the same time, rights holders seek to restrict AI use based on copyright grounds, even in use cases where AI's productivity-enhancing features would come into play. Strictly interpreted, the reproduction restriction would prevent, for example, having an AI system summarize a scientific article, unless copyright permission has been obtained. In practice, however, having an AI summarize an article or asking questions about its content, is the first action AI applications suggest, and for which many of them are designed for. For study purposes, a practical interpretation is that reproduction needing the copyright holder's consent happens only if the copy stays with the AI service provider beyond a temporary period.
Reproduction of works for text and data mining
Section 13b of the Copyright Act allows the reproduction and storage of works for text and data mining. Text and data mining activities are permissible unless explicitly restricted by the authors. However, if the mining is executed for scientific research within a research institute or cultural heritage facility, rights holders cannot prevent it. Requirement for mining is having lawful access to the material, which may include content behind a paywall if access rights have been appropriately acquired.
The exception for text and data mining can also permit the reproduction of works to be analysed using AI. Even though you can use an AI application for data mining, you cannot grant the AI application rights to permanently retain the input works or use them for AI training purposes.
The EU directive on copyright and related rights in the Digital Single Market (the so called DSM directive, EU 2019/790) defines that the exception for text and data mining applies not only to research organizations but also to individuals associated with them. As a result, the exception benefits not only university employees but also students, emeritus/emerita professors, and grant researchers. However, the research must be conducted under JYU affiliation and meet the criteria for scientific research.
Utilizing AI-generated results
Generally, AI-generated content is not subject to copyright as it is not deemed original. A piece of work is considered original only if it embodies the personality of a human individual, demonstrated through their free and creative decisions. Nevertheless, it remains possible that a method could eventually be invented wherein AI usage genuinely reflects the user's independent creative decisions, thereby granting copyright to the user of the AI. However, the standard for this would be very high.
A common concern is whether AI outputs might violate existing copyrights. This can happen if AI generates an adaptation or even a direct copy of an existing work. When users ask AI to create something like a literary or visual piece with just a short instruction, it's challenging for them to judge the risk of copyright infringement, unless the works are widely known or recognized by the user. Emulating the distinctive style of a specific artist represents a borderline scenario. This practice could have detrimental effects on content creators. Therefore, for example, is designed to reject user requests to imitate the style of a living artist. It is also in the interest of AI service providers that users feel safe using the service and that no copyright infringement claims arise from its use. to responding to copyright infringement claims arising from the (proper) use of its Copilot AI.
The user is in the best position to evaluate any potential copyright infringement of the material they upload to AI systems. When inputting material into an AI system and prompting it, the user should ensure that they do not ask AI to create and later use a work, which could be interpreted as an adaptation of a copyright-protected work uploaded to the system. Modifying and making a work available to the public requires the consent of the copyright holder. Using AI to remove, add, or change parts of a work are adaptations. Examples of such use could be requests like "enlarge the mermaid's tail in this image by 30%" or "add a section on the U.S. fair use doctrine to this article and compare it to the Finnish copyright law." The exclusive right of the author also includes deciding on the translation of their work into another language. However, the Copyright Act allows the use of a published work in parody, caricature, or pastiche.
Summary
- What is permitted:
- Using AI for study purposes to analyze texts
- Using AI for translation when the purpose is to increase your understanding of the subject
- What is prohibited without the copyright holder's permission:
- Allowing AI systems rights to the input content beyond just temporary analysis, such as for AI training purposes
- Publishing adaptations of existing works created with AI. Parody is however permitted.
- Producing AI-generated works that closely mimic another artist's style
It is important to keep in mind the university's general policies regarding the use of AI in academic work, as well as any specific instructions related to your courses. For instance, AI usage may be completely banned in certain subjects like language studies.