Halvor Manshaus
Partner
Oslo
Newsletter
by Halvor Manshaus
Published:
The English High Court of Justice has recently delivered judgment in a much-discussed case concerning the use of unlicensed copyright-protected material for training a AI model. The case was brought forward by a group of claimants connected to Getty Images against Stability AI Limited as defendant.
Getty Images (Getty) is a global image agency operating offering licensed images, photographs, music and videos. In the image database available on the website gettyimages.com, users can for example search amongst 11,461,132 different images. On the same site, 1,549,905 video clips are currently made available for licensing and use. According to an article from October 2025 in Yahoo Finance, Getty holds a total of 562 million visual objects, with a turnover of 946 million dollars in 2024. The material is offered primarily through the aforementioned website, as well as gettyimages.co.uk and istockphoto.com. Getty is thus a major global operator with an extensive rights library and significant turnover. Users of the service include various types of businesses, private individuals, media organisations worldwide and other consumers of image material. The business model is based on individual users paying a licence fee to use the material. The individual companies in the Getty group have different commercial models, but all depend on the use of material being subject to a paid licence.
Stability AI Limited (Stability) was founded in 2019 and became known to a wider audience in August 2022 through the launch of the AI model Stable Diffusion. This AI model is capable of, amongst other things, generating images based on descriptions that the user types into a text field. The service can also be used for other image processing, such as editing existing images or upscaling quality. Several distinctive features has set Stable Diffusion apart from other existing AI models. Stability has for example made the underlying source code available, which in turn opens up for further development and modification. Later versions have somewhat stricter licensing requirements, but the core technology remains available for research and development outside Stability's control. Another characteristic of Stable Diffusion is that the service is very efficient and requires relatively little computing power to process requests from the user. This has great practical significance in that the service can be run on ordinary computers, provided they have a sufficiently powerful graphics card. The combination of open-source code and running the model on ordinary gaming PCs means that many users use the service to experiment and familiarize themselves with the technology.
Stable Diffusion was trained on a dataset consisting of approximately 2.3 billion images, known as LAION-5B. Technically, Stable Diffusion is a text-based AI model within what is called a generative diffusion model. This means that the service is designed to create content that has similarities with the dataset on which it is trained, combined with the user's input. The term "diffusion" is familiar from physics, namely the spreading of particles from an area of high concentration to an area of lower concentration. Applied to Stable Diffusion, data diffusion means that what initially consists of random connections or noise is refined and improved through an iterative process, which ultimately results in coherent images of high quality. Concentrations of data are thus gradually formed that the training indicates should belong together. The generative modelling means that the service is trained to recognize patterns and connections in data that has been linked together. After the completed learning period, the service is able to generate new data or image material based on inferences derived from the known material from the training.
AI models such as Stable Diffusion have been well received by the public and are used by millions of users daily. A number of industries have quickly realized the utility of being able to generate or adapt image material in this way. Advertising agencies generate images for use in marketing, commercials, brochures, posters and advertisements. Within art and design, an entirely new generation of art is being developed, and AI-generated works have won art competitions and is exhibited in numerous museums and exhibitions. Within the gaming and entertainment industry, visual content, textures and entire worlds are created based on AI-generated content. Within education and research environments, visual material is created for use in learning resources, presentations and modelling. There are numerous other examples, for instance within architecture, fashion and product development. These are very different industries, but all can use AI-generated content to model, test and showcase new ideas and concepts.
The launch of the various AI models now emerging also raises a number of questions. One question that is often raised is how wise it is to release these models, apparently without any form of coordinated governance or control. Development is proceeding at such a pace that it is impossible to draw any firm conclusions about what this field will look like in just 10 to 15 years. Another objection concerns the training method itself, which is based on large-scale exploitation of millions of existing works. The material produced by AI models also has an unclear intellectual property status. Is this legally a copyrighted product, and if so: who holds the rights to the individual work? Is it the user who has instructed the AI model, the owner of the model, the software developer, the rights holders to the training material or someone else? In a slightly different direction we have a discussion about whether images and films produced in this way deserve to be called art at all. As noted above, AI images have already won art competitions, provoking strong reactions amongst human artists who see not only a threat to their own practice, but also a development where the concept of art changes in line with the emergence of AI technology.
The case concerned, in principle, the question of whether unlicensed use of existing works in the training of Stable Diffusion constituted a copyright infringement. The core point of the claim was the use of material from Getty during the training period. An extensive claim was originally issued with a long list of demands directed at Stability. The main claims concerned infringements of copyright, trademark rights and passing off. Shortly before the deadline for the closing submissions, however, Getty withdrew several of these grounds. It was, amongst other things, acknowledged by Getty that there was no evidence that the training of Stable Diffusion had taken place in Great Britain. Thus, perhaps the most interesting aspect of this case was withdrawn prior to the oral hearing. Getty nevertheless maintained that there was a copyright infringement in that Stable Diffusion itself constituted an "infringing copy". This article examines the court's discussion of this question. The other questions relating to trademark use and other topics fall outside the scope of this article.
Getty possesses a large database of image material. The database also contains what the court describes as sophisticated, curated metadata about each individual work. Thus, each individual data file contains not only the image itself, but also associated information such as production date, type of image, file size, number of pixels, information about the creator of the piece as well as title and keywords. It was argued by Getty that these metadata were particularly relevant and suitable for machine learning. Reference was made, amongst other things, to the fact that the image material was of very high quality, and that the metadata provided unique additional information about each individual work of great significance for the learning process itself.
The interface point between Getty and Stable Diffusion lies in the dataset LAION-5B. LAION stands for Large-Scale Artificial Intelligence Open Network. This is a German organisation that works with various AI models and is known for having made available several large datasets. These datasets have been used to train several well-known AI models, including all published versions of Stable Diffusion. The training has taken place on various subcategories of LAION-5B, with metadata containing approximately 5.85 billion CLIP-filtered image-text pairs with URL references. In brief, this means that the setup from LAION contains links to nearly 6 billion images. CLIP stands for Contrastive Language-Image Pre-training. CLIP links text and images in large datasets and is based on what is called contrastive learning. The model distinguishes between paired data points by maximising similarity within a class representing a correct answer, and minimising similarity in another class representing an incorrect answer. If the text, for example, refers to a black horse, an image of a horse may be highlighted. Where the text instead indicates a red post-box, the horse will no longer be as relevant. The learning method itself has much in common with a picture book for children. One page in the book shows a picture of a ball, and underneath is the text "ball". The two elements consisting of image and text are locked together into a linguistic unit. This is stored for future use.
In this way, patterns and relationships are learned, and the model continuously adjusts its internal parameters to create more precise connections between the training data and queries to be processed. An important form of such parameters is weights. Weights function somewhat like neurons in the human brain. Each connection has an associated weight or value that indicates both the strength and importance of the individual connection. When an AI model, for example, is to assess whether an email is advertising, it may analyse the content based on defined factors. For instance, the AI model will assess keywords, linguistic contexts, rhetorical devices, sentence length, punctuation and the like. Each element is linked to a defined weight, which together provide a conclusion as to whether the content is advertising or not. These weights are factors that can be adjusted during training, so that the outcome of the assessment becomes more precise or better adapted to the developer's requirements.
As part of the training process, it is necessary to download the images to which the URL list in the LAION dataset refers. This process is referred to as materialisation. It is through materialisation that the AI model gains access to the actual image, which in this case was temporarily stored on an Amazon cloud server. In addition, temporary copies of the image were made in the memory chip on the graphics cards that ran the actual training session in the AI model. Stability's own description of this process made it clear that first there was a copying and storage on the server itself, and then a further and temporary copying when the image was handled through the RAM chip on the graphics card. The judgment indicates that Getty acknowledged that the datasets from LAION actually contained references to Getty's images and that these would most likely have been used in the training. The parties clarified this in the case preparation, with Getty presenting 11 images as examples. The underlying assertion was nevertheless that in practice millions of images had been used. Stability, for its part, acknowledged that some of these 11 examples had been used by Stable Diffusion.
As noted above, there was no evidence that any of these actions had taken place in Great Britain, and Getty had therefore, prior to the oral hearings, abandoned those parts of the claim that concerned this form of use and copying. In the case, it was acknowledged by Stability that the training involved the use of images that were retrieved from Getty's database via the URL link in LAION. Stability also argued that the number of images that were specifically used would vary depending on the setup for the individual training round. In cases where Stable Diffusion produced images with Getty's trademark integrated, Stability argued that this was due to use by a third party (the user of the service), and that the use was not relevant in trademark terms. The background to this discussion was that Stable Diffusion could produce images that contained a watermark from Getty. Getty has itself placed watermarks on the images in its own database. When the user searches for images at Getty, these are displayed with the watermark integrated in the image. Only when the user pays a licence for the image is the watermark removed, so that the image can be downloaded and used without the watermark. The question of infringement of trademark rights falls outside the scope of this article.
English law distinguishes between primary and secondary infringement of copyright. Copying and performance are typical direct and primary infringements. Sale of copies of works, as well as rental, import and possession, are acts that under English law are categorised as secondary infringements. The remaining assertion of copyright infringement was that Stable Diffusion itself constituted a secondary infringement of copyright. Getty's assertion was not that Stable Diffusion itself constituted a copying of the relevant works, or that such copies were stored in Stable Diffusion. According to Getty, the infringement occurred in that the development of the weights (which were developed through the training) used in Stable Diffusion would have constituted a secondary infringement of copyright if the action had taken place in Great Britain. Copyright, Designs and Patents Act 1988, section 27 defines what constitutes an infringing copy of a work:
"27. Meaning of 'infringing copy'.
(1) In this Part 'infringing copy' in relation to a copyright work, shall be construed in accordance with this section.
(2) An article is an infringing copy if its making constituted an infringement of the copyright in the work in question.
(3) An article is also an infringing copy if-
(a) it has been or is proposed to be imported into the United Kingdom, and
(b) its making in the United Kingdom would have constituted an infringement of the copyright in the work in question, or a breach of an exclusive licence agreement relating to that work."
The crucial point in the court's assessment therefore became whether there was an "article" under section 27, with reference to sections 22 and 23, and whether this was an "infringing copy". Getty argued that the conditions in section 27(3) (a) and (b) were satisfied in that there had been such an import as described in paragraph (a), and that making the "article" would have been unlawful in Great Britain. Getty referred to the fact that the training that took place involved unlicensed copying both on the server and on the RAM chip. The "article" referred to in this context is not Stable Diffusion itself, but the underlying weights that were derived and optimized through the training process. The weights were directly affected by the review of the various copies, and this exposure simultaneously changed fundamental aspects of the weights' character.
Stability argued, for its part, that the concept of "article" in the legal sense was limited to tangible objects. The provision in section 27 could therefore not be applied to abstract information such as weights in an AI model. Furthermore, Stability argued that the concept of "infringing copy", as defined in section 27, could not apply for several reasons, amongst others because the weights themselves do not store or produce copies of the relevant works.
The court states in paragraph 553 that the assessment must be based on how an AI model such as Stable Diffusion is trained, and how it subsequently produces its own images. In order to determine whether Stable Diffusion can constitute an infringement of copyright at all, the court must first have a clear understanding of what Stable Diffusion actually is. This question had been the subject of several expert assessments, and the court refers to Professor Brox, who had stated the following, reproduced in paragraph 554 of the judgment:
"8.36 ... in order for a diffusion model to successfully generate new images, that model must learn patterns in the existing training data so that it can generate entirely new content without reference to that training data.
8.37 Rather than storing their training data, diffusion models learn the statistics of patterns which are associated with certain concepts found in the text labels applied to their training data, i.e. they learn a probability distribution associated with certain concepts. This process of learning the statistics of the data is a desired characteristic of the model and allows the model to generate new images by sampling from the distribution.
[...]
8.40 ... For models such as Stable Diffusion, trained on very large datasets, it is simply not possible for the models to encode and store their training data as a formula .... It is impossible to store all training images in the weights. This can be seen by way of a simple (example) calculation. As I explained in paragraph 6.28 above, the LAION-5B dataset is around 220TB when downloaded. In contrast, the model weights for Stable Diffusion 1.1-1.4 can be downloaded as a 3.44GB binary file. The model weights are therefore around five orders of magnitude smaller than a dataset which was used in training those weights."
The expert thus points out that the AI model does not store the actual images in the weights. This is evident, amongst other things, from the fact that the model weights for Stable Diffusion can be downloaded as a relatively small binary file. The file size of the binary file is under 4 gigabytes, whilst the LAION-5B dataset, which was used for training, amounts to approximately 220 terabytes. During training, the model weights do not store pixel values linked to all the millions or billions of images that are reviewed. Instead, the image information is converted from pixel units to a so-called latent space using an autoencoder. Such a latent space contains a compressed representation of the image, in a format that is far more efficient both for storage and subsequent calculations or use of the information. All of this suggests that the image material cannot be loaded into the actual weights that lie in Stable Diffusion.
At the same time, the experts agreed that Stable Diffusion can generate images that are distinct and different from the training images, but that it can also produce images that are approximately identical to the learning material. It does not emerge clearly from the judgment why Getty had not pursued this line further. Insofar as Stable Diffusion is capable of reproducing images from the training, it could be argued that this in itself constitutes unlawful copying. Depending on the precise facts surrounding this particular process, it could perhaps be argued that both the storage and the new production constitute infringements of copyright. The argument would then be that it is not decisive whether the images are stored in the same pixel format as the originals. As long as the images can be recreated, this is a strong indication that they are stored, albeit in a different format, in the weights that lie in the AI model. A relevant parallel may be the use of a compression program to compress a file. Even if the new file is smaller in pure file size, it may still be said that there is a copy of the original work as long as this can be restored from the new file. This argument was not, however, seemingly not advanced in this case.
The court's assessment of the concept of "article" and whether this also encompasses more ephemeral digital information is too extensive to reproduce in full here. In an earlier article (Nation States and Hacking, Lov & Data No. 4/2024), I have discussed the English interpretative principle of "always speaking", opening for a dynamic statutory interpretation. The judge addresses this also in the present case, and states in paragraph 580 that the principle suggests that electronically stored data can be covered by the concept of "article":
"I agree with Getty Images that the 'always speaking principle' is of assistance in these circumstances. Stability does not suggest that the statute was intended to be 'frozen' in time and I consider that modern storage methods in intangible media amount to a fresh set of facts which fall within the same genus of facts as those to which the original expressed policy has been formulated. The fresh set of facts arises by reason of the prevalence in the modern world of intangible electronic storage which has been brought about by enormous strides in technology since the date of commencement of the CDPA. The purpose of the Act - the protection of copyright owners - would, in my judgment, be fulfilled by an interpretation which encompassed modern technology."
Although the court concludes that the law may also in principle cover digital information stored on various media, the question remains whether there is an "infringing copy". Stability presented an apt formulation of this problem: "... [a]n infringing copy must be a copy" (see paragraph 592, underlined in the judgment). What Stability seeks to convey is that there must be a copy, a specimen that can physically and practically be found somewhere. Since the model weights in Stable Diffusion do not store the visual information contained in the original works, there cannot be a copy or specimen in the legal sense.
The court then formulates the issue as follows in paragraph 599:
"... [w]hether an article whose making involves the use of infringing copies, but which never contains or stores those copies, is itself an infringing copy such that its making in the UK would have constituted an infringement. Taking the specific facts with which I am concerned, is an AI model which derives or results from a training process involving the exposure of model weights to infringing copies itself an infringing copy?"
This is discussed further in the central paragraph 600 of the judgment, where the court concludes that there is no copying in the legal sense:
"In my judgment, it is not. It is not enough, as it seems to me, that (in Getty Images' words) 'the time of making of the copies of the Copyright Works coincides with the making of the Model' (emphasis added). While it is true that the model weights are altered during training by exposure to Copyright Works, by the end of that process the Model itself does not store any of those Copyright Works; the model weights are not themselves an infringing copy and they do not store an infringing copy. They are purely the product of the patterns and features which they have learnt over time during the training process. Getty Images' central submission that 'as soon as it is made, the AI model is an infringing copy' is, accordingly, in my judgment, entirely misconceived. Unlike the RAM chip in Sony v Ball which became an infringing copy for a short time, in its final iteration Stable Diffusion does not store or reproduce any Copyright Works and nor has it ever done so. The fact that its development involved the reproduction of Copyright Works (through storing the images locally and in cloud computing resources and then exposing the model weights to those images) is of no relevance. Furthermore, that there is no requirement that an article which is an infringing copy must continue to retain a copy does not assist Getty Images, because it is implicit in the word 'continue' that at some point the article has in fact contained an infringing copy. The model weights for each version of Stable Diffusion in their final iteration have never contained or stored an infringing copy."
Getty therefore did not succeed with its arguments relating to this claim. The court concluded that the model weights are a result of the learning process, but do not reproduce or store the image works as such. The court considers that this must apply even though the model weights are affected and changed continuously through the training and contact with the image works. The court bases its reasoning on the fact that the model weights can never be said to have contained, stored or copied the relevant works. The claim for secondary infringement of copyright therefore did not succeed, but the court agreed on the question of principle that an "article" can encompass more ephemeral digital information.
The judgment covers several other questions, and the court has some further discussions also regarding copyright infringement. Getty partially succeeded with minor claims relating to infringement of trademark rights. However, the judgment's principal contribution lies in the court's exposition and assessments relating to what an AI model is in various legal senses. The court's discussions regarding model weights and how these are involved in the training of this type of AI model are notably comprehensive and provide valuable guidance. This also means that the judgment is an excellent springboard for further discussions and clarifications going forward within this field, regardless of whether one agrees or disagrees with the court's analysis and conclusions.
At the same time, this particular case is characterized by several uncertainties relating to Getty's claims and arguments, which were changed substantially towards the end of the case preparation phase. It appears that the facts surrounding where the actual training took place were not sufficiently clarified at the time the claim was issued. The absolutely central question of whether the training itself and the use of copyright-protected works for this purpose constituted infringement was therefore not assessed. It is, however, clear that we can expect more cases in this field going forward.
Partner
Oslo
Partner
Oslo
Partner
Oslo
Partner
Oslo
Partner
Oslo
Partner
Oslo
Managing Associate - Qualified as EEA lawyer
Oslo
Senior Associate
Oslo
Senior Associate
Oslo
Senior Lawyer
Stockholm
Senior Associate
Oslo
Associate
Stockholm
Partner
Oslo
Associate
Oslo
Partner
Oslo
Partner
Stockholm
Partner
Oslo
Associate
Stockholm
Special Advisor
Stockholm