Generative AI and Copyright

by William Eitrem and Thomas Hagen



1. Introduction

Society is witnessing a revolutionary development of artificial intelligence (AI), especially within generative AI. The technological advancement thrusts us into uncharted territory with a range of legal dilemmas, where principled and complex questions about rights and AI arise. With current legislation primarily centred around humans, there is an increasing necessity to explore and analyse the framework of rights issues in this AI-driven era. Doing this requires knowledge of the technology at hand and its legal implications.

2. The GPT technology

In this article, we will focus on the generative models of OpenAI (GPT models). The GPT models are used to generate both text (ChatGPT) and images (Dall-E). By reviewing large amounts of data, they learn to recognize patterns in the data, and these patterns can be used by the machine to generate new content. An essential clarification is this: The models do not copy previous combinations of words or pixels. E.g., ChatGPT receives an input and calculates the probability of which words are best suited to respond, based on how it has seen similar words used during training. It then creates an output. Dall-E follows the same process, with pixels.

3. Copyright

3.1 Work of Mind

Anyone who has copyright to a protected work of the mind fundamentally has the exclusive right to deny others from doing two things. The first is making the work available to the public. The second is to produce copies of the work – regardless of the method and form of production, and whether the production is permanent or temporary.

When a computer processes text, images, or other forms of digitally represented works of the mind, the digital representation must necessarily be copied into the computer's memory. Such production occurs, for example, when an image is displayed on a computer screen. This is the rationale behind the existence of the Copyright Act Section 4, which contains an exception from the exclusive right to copy a work when the production "constitutes an integrated and essential part of a technical process ...", and the sole purpose of the production is "to enable ... legal use of a work, or ... a transfer in a network by an intermediary on behalf of third parties". In other words, to avoid infringing on copyright with the production of copies of protected material, one of the exceptions must apply. We will assess these exceptions for two of the generative AI models' "phases" in the following – namely training and generation, as well as consider other issues that arise for these two phases.

3.2 Training

The technical processes that occur when a generative AI model is trained is not entirely clear. However, it is obvious that a representation occurs in the machine's memory when training is conducted. Such temporary reproduction occurs even if the information is not stored in the form the model trains on it, cf. point 2 above. The reproduction does not occur to enable legal use of a work, nor to transfer it in a network by an intermediary on behalf of third parties. Therefore, it is likely that the reproduction constitutes an infringement of the copyright holder’s rights.

Let us assume that a copyright holder sues the owner of an AI model for this potential infringement. If a court of law considers an infringement to have occurred, the question arises of what consequences this should have for the company who conducted and oversaw the training of the AI model. Chapter 5 of the Copyright Act contains a number of reactions that can be applied.

The Copyright Act Section 78 provides the authority to prohibit the repetition of the action. This provision is likely to be invoked, and a ruling that establishes a prohibition against repetition will provide some security to the creator from the time of the decision going forward. However, the copyright infringement has already been completed. Therefore, the reaction offers little protection for rights holders whose copyright rights have already been infringed.

Another possibility is that the person who conducted the training of the AI model is subject to criminal punishment, cf. the Copyright Act Sections 79 and 80. In that case, the four general conditions for criminal punishment must be met. This is, however, a less practical sanction, due to both evidentiary considerations and the government's ability to ensure effective enforcement.

The Copyright Act Section 81 stipulates that the creator can demand remuneration and compensation. The person who will use this provision must claim economic compensation according to one of the following alternatives:

a. reasonable remuneration for the use, as well as compensation for damage resulting from the infringement that would not have occurred with an agreement on use

b. compensation for damage as a resulting from the infringement, or

c. remuneration corresponding to the profit obtained from the infringement.

Letter b states that the violator must pay compensation for damage incurred by the creator, while letter c expresses that remuneration equivalent to the violator's gain should be paid.

For creative professionals like artists and writers, the use of their copyrighted work in training tools like ChatGPT can feel invasive and unjust. However, proving that ChatGPT's training directly resulted in financial loss for an author or monetary gain for the violator is complex. For instance, if ChatGPT's owner profits by selling books that were trained on an author's work, the loss to the author is a consequence of the GPT model's output, not a result of its training process. OpenAI might theoretically benefit from not paying for copyrighted materials, but quantifying this is challenging due to the vast amount of data ChatGPT is trained on. Moreover, the economic impact, whether loss or gain, is not clearly visible, posing difficulties for those tasked with determining it.

Therefore, letter a, a "reasonable remuneration for the use" is likely to be most relevant. However, the process of calculating this can also cause significant doubt. There is currently no legal jurisprudence on this issue, and the ratio between a creator's rights and the overwhelming size of the aggregated training data makes the calculation highly complex.

In the case of intentional or grossly negligent breaches, it may also be appropriate that "consideration is also given to non-economic damage inflicted on the injured party by the violation", cf. § 81 first paragraph last sentence. In addition, the Copyright Act Section 81 second paragraph allows the determination upwards to twice the reasonable remuneration for the use where the violator has acted grossly negligently or with intent. The challenge for these alternatives will not necessarily consist in calculating remuneration, but in the process of proving subjective guilt. For example: Generative AI models are trained on extensive amounts of data, and the collection and training are probably largely automated, making it difficult to establish intent or gross negligence. However, the degree of difficulty will depend on the specific circumstances of the case, such as the type of model and the amount of training data.

3.3 Generation

Generated material will be a temporary or permanent production. As with training, there is no reason to believe that such reproduction will be exempt, cf. the rules described under section 3. We therefore proceed to consider other questions that arise in connection with generative AI models' production of new material and copyright consequences.

Theoretically, it is easy to imagine the following: Dall-E generates a work that is identical to an existing artwork created by a human, or ChatGPT generates a series of words that collectively are identical to a copyright-protected poem. Without going into more depth on technical issues, it is important to clarify that the likelihood of something identical occurring is very low. In practice, therefore, an important question will be how similar something must be before it infringes copyright.

Before we have indications to the contrary, it is just as well to assume that the assessments will follow existing practice for human interventions.

Another interesting question is whether generative AI models can infringe copyright even if the copyright-protected material is demonstrably not used in training (such proof will be possible for AI companies to provide). This question will probably not be raised often, but it is entirely possible – and increasingly likely as generative AI is used more and more – that it will happen.

Again, we must look at how the situation is for humans. Humans obtain copyright protection the second the created meets the requirements of the Copyright Act. If a human writes a poem that meets the requirements of the Copyright Act Section 2, it will be protected. The creator can put the poem in a drawer, without anyone needing to see it – but it will still be protected. If another person creates a poem that resembles to the extent that it constitutes an infringement, it will be an infringement regardless of whether this person has never seen the first poem. The fact that the former must prove that they created their poem first is a completely different matter. The same applies to the fact that the latter must have acted in careful good faith.

The starting point for the assessment will probably start here. If no fundamental reason can be established why these situations should be treated differently, we will probably see that the line from previous legal practice is maintained. The decisive factor is therefore, as mentioned above, whether the produced material actually constitutes an intervention or not.

Regarding sanctions related to possible copyright infringements, many of the same challenges arise as described under section 3.2 above. Even if a copyright infringement is established, it is difficult to measure compensation, relinquishment of gain, etc., because the generated material under many circumstances will not cause the creator a loss or create direct gain for the provider of the AI model, since the provider usually offers completely different products and/or services than the creator. In such cases, one ends up having to determine a license fee based on the Copyright Act Section 81 first paragraph letter a, alternatively the second paragraph if intent or gross negligence is demonstrated by the violator.

For the rest, we refer to section 3.2 above.

4. Conclusion

Overall, it is clear that the legal rules, as they appear today, can make it difficult to safeguard creators' rights, both in relation to generative AI's training and generation of new works, as well as the purposes of the Copyright Act as stated in the Copyright Act Section 1.

AI and rights is an exciting area where continuous advancements are happening. We expect that legislators will step in with more detailed regulations regarding the relationship to copyright protected material. Additionally, in the coming time, we anticipate more clarifications on how the current existing regulations can be applied, through case law and administrative practices both nationally and internationally.

An interesting example in the field of copyright could arise following the launch of Microsoft Copilot, Microsoft's new generative AI solution. Microsoft has confirmed that it will cover legal costs for anyone sued for infringements, including copyright, based on material generated by this solution.

From a marketing perspective, this is undoubtedly a bold and strategic move, as many fear that the use of generative AI tools may lead them to commit copyright infringements without their knowledge or intent. However, Microsoft's pledge may also signal its viewpoint on copyright and generative AI. Due to technical challenges in establishing a causal relationship between training on material and generating works identical to works protected by copyright, as we have seen in this article, it is not unlikely that Microsoft intends to have alleged infringements tested in court. It will be exciting to see how such questions, and cases already raised based on other generative AI tools, will be resolved.

Do you have any questions?