Text-to-video AI models – mitigating disruption in the film and TV industry
In February 2024 OpenAI introduced its ground-breaking text-to-video tool, Sora, heralding a new dawn for potential disruption by generative AI within the film and television industry.
In this article we consider how current UK legislation could assist in mitigating the impact of text-to-video models, especially in the context of how copyright might subsist in AI output, and who might own such copyright. Given the widespread unauthorised use of training data, we also examine how copyright infringement could be curbed, in light of developing case law, rules on fair dealing, licensing, and using other technology to police datasets. Actors also have a role to play in protecting their image and performances through reliance on image rights, union agreements and data protection.
Background
Videos generated by OpenAI, created from a mere text prompt, are hyper-realistic and remarkably complex. They have been generated in a variety of styles, are well-lit and appear to be colour-graded, and some involve shots from multiple angles.
Understandably, OpenAI’s announcement has raised concerns about the level of disruption that text-to-video tools could cause to a film and TV industry already facing pressure from a tough economic climate and the after-effects of last year’s union strikes in the US. While Sora can, for now, only produce videos of up to one minute (with a variety of imperfections), there is a broad consensus that longer, more complex output will be viable from text-to-video models in the near future. That opens up possibilities for various use cases for text-to-video models, both for at-home content creation and in a professional filmmaking environment.
The absence of AI-dedicated legislation in the UK also begs questions about how legislation will protect the work of creatives in the film and TV industry as the technology improves.
Current legislation
At present, there is no legislation specifically dedicated to AI in the UK. Instead, to protect creative works, creatives must rely on existing legislation, including, in particular, the Copyright, Designs and Patents Act 1988 (CDPA).
In March 2023 the UK Government published a white paper detailing plans for a “pro-innovation approach” to AI regulation”.[1] There were plans for a code of practice for copyright and AI, but that was scrapped in the Government’s response to an AI consultation on 6 February 2023, as the Government considered that the working group (set up by the IPO and consisting of rights-holders and AI developers) could not agree to an effective voluntary code.
A report by the House of Lords Communications and Digital Committee was published on 2 February 2024, examining trends over the next three years and identifying ten priority actions.[2] Those included the better management of immediate risks and protection of copyright. The Committee commented that, if a process towards a code of practice remained unresolved by spring 2024, then the Government should prepare to resolve the dispute definitively, including legislative changes if necessary.
By contrast, the EU is heading towards the implementation of the AI Act, which will be the world’s first AI-dedicated legislation, following approval by the Council of Ministers on 2 February 2024.[3] Post Brexit, that will not apply to the UK, but it is useful to be mindful of the EU’s approach and how it may influence domestic legislative changes. The AI Act focuses on transparency, requiring the disclosure of datasets used to train models, and compels developers to adhere to minimum standards.
Copyright protection under current legislation
Subsistence
For copyright to subsist in a work, it must first fall within a category protected by the CDPA. A “film” is one of those, and it is defined as “a recording on any medium from which a moving image may be produced”. The output from text-to-video models is not technically a “recording” per se, but instead a series of predictions generated by the model based on data on which it has been trained. So legislation might need to be amended to capture text-to-video outputs as “films” more accurately under the CDPA. For literary, dramatic, musical or artistic works, there is a requirement under the CDPA that the work also be both original and fixed. Yet that is not a requirement for films, meaning that it is arguably more straightforward for text-to-video outputs to be protected.
Ownership
Under the CDPA the first owner of copyright is usually the author. A film is treated as a work of joint authorship, between the principal director and the producer (i.e. the person that made the arrangements necessary for making the film). It is hard to determine exactly who that might be for text-to-video outputs, as it could potentially be either the AI developer or the user who entered the prompt.
There is a more specific provision in the CDPA that deals with the authorship of computer-generated literary, dramatic, musical and artistic works (but not for film). The author of those works is taken to be the person by whom arrangements necessary for the creation of the work are undertaken, which is a similar definition to a producer. Usually, the effect of this provision has been that the author is the owner of the system that generated the work. This suggests that, in the case of text-to-video outputs, the author is likely to be the AI developer – absent a contractual agreement to different effect.
Practically speaking, then, the terms of use for any text-to-video model will also have to be considered. The current terms of OpenAI’s ChatGPT assigns any copyright in the output to the user.[4] At the time of writing, OpenAI has not indicated whether Sora’s terms would take a similar approach.
Copyright infringement
Concerns have been raised about the potential for copyright infringement by GenAI, including by text-to-video models. Infringement might take place through (a) the data used to train the model and/or (b) the output produced by the model.
Training data
OpenAI has not revealed what training data has been used to create Sora. In the technical report published alongside Sora’s release, OpenAI states that it takes “inspiration from large language models which acquire generalist capabilities by training on internet-scale data”,[5] but does not expand on the source of such “internet-scale” data. It is generally assumed that the datasets would have been obtained by “scraping” data from the internet, which is likely to have included the use of copyright audiovisual content.
There are a handful of test cases addressing how large language models are being trained in both UK and US courts. There have been no rulings at the time of writing, but the outcome of those cases will be fundamental to the understanding of how producers might act if they discover that their copyright works have been used to train text-to-video models without authorisation.
A key case currently in the High Court is Getty Images (US) v Stability AI Ltd.[6] Getty Images is claiming copyright and database-right infringement by Stability AI through the unauthorised use of 5.85 billion image-text pairings from their library to train Stability AI’s text-to-image model, Stable Diffusion. A crucial question for the court is whether there is any evidence that the training and development of the model took place in the UK. Stability AI’s CEO has claimed that no local computing resources based in the UK were ever used for Stable Diffusion. If that is the case, it may be that there is no cause of action for Getty Images under the CDPA. Meanwhile, a concurrent case has been brought by Getty Images against Stability AI in the US.[7] If it is found that there is no cause of action in the UK case, then it will be important to watch how the US case progresses.
Another case brought in the US is New York Times v OpenAI and Microsoft.[8] The New York Times is claiming that its copyright has been infringed both through the use of its articles to train ChatGPT, and the fact that ChatGPT produces certain output that appears identical to New York Times content.
In the US, a key defence in this type of case is likely to be “fair use”, which permits the unlicensed use of copyright works for certain purposes. Various factors are considered, including the purpose and character of the use, the nature of the copyright work, the amount and substantiality of the portion used and the effect of the use on the potential market for or value of the copyright work. At this early stage in proceedings, it is hard to predict how the defence might be received by the court, but the outcome is likely to inform how models are trained in the future.
Output
Recent advances in text-to-video models offer great promises for at-home content creation. If longer-form creations are truly possible, then those models offer a much simpler avenue for aspiring film-makers to make high-quality work that circumvents typical production and distribution channels. This has raised concerns about how the distribution and exploitation of content produced by text-to-video models might infringe existing works.
If text-to-video models have been trained on existing copyright works, they will be capable of producing new content that is based on them. Fan fiction of both a written and visual nature has long been the provision of superfans of various major franchises (e.g. Harry Potter, Twilight or Marvel), some of which has achieved financial success in its own right. Text-to-video models may present new creative opportunities for fans to produce high-quality work based on existing characters and worlds with ease that infringes copyright in existing works. Rights-holders in films will find themselves presented with new challenges when it comes to protecting their works, and how they manage that will be affected by the outcomes of the cases discussed above (particularly the New York Times case, given that it concerns both inputs and outputs).
One way to deal with this would be to work with AI developers by choosing to license works to the owners of text-to-video models. Some rights-holders may enter into a form of partnership that enables controlled exploitation of works produced by at-home creators, protecting the brand of a franchise, while creating new revenue streams.
Those who wish to avoid this and want to maintain the integrity of their original works may have to try harder to protect them. Under the CDPA, the permitted act of “fair dealing” for the purpose of caricature, parody or pastiche may cover some works produced by at-home creators. Yet not all AI-generated works would necessarily fall into these categories, nor is it currently clear how far fair dealing would apply to text-to-video outputs.
Blockchain technologies might have a part to play: a system of digital certificates could be used to trace back to the original films used to generate the content to detect where they have been infringed. Other services similar to “Have I Been Trained?” – a service enabling image owners to check whether their work has been used in one of the largest open-access image-text datasets, and enables them to opt out of those – may be developed to allow those who own films to police datasets and to choose to remove their films from them. Notably though, this relies on transparency from AI developers about their datasets. Given the struggles that rights-holders and developers had in attempting to agree a code of practice last year, it remains to be seen whether appropriate compromises in relation to transparency can be reached, or whether new legislation might be required to achieve that.
Actors’ rights
Longer-term use cases for text-to-video models posit the use of either real or synthetic actors in AI-generated films. There are protections under existing legislation that actors could invoke to manage the use of their image, and that may shape future use cases.
Image rights
Image rights do not exist in the UK as a stand-alone right, but consist of potential causes of action that might arise when an actor’s image is used, such as in passing-off or defamation. An actor may well have grounds to bring such claims if their image or likeness is used without authorisation.
By contrast, in the US, there is a positive right of publicity, whereby an individual has a right to protect themselves against the use of their likeness for profit. It is easy to imagine how an actor might seek to rely on that right if an unauthorised performance based on their likeness were used in text-to-video output that is exploited for profit.
Union agreements
A key amendment made to the SAG-AFTRA TV/Theatrical Contract as a result of the US union strike last year was the insertion of a clause concerning AI, which split use cases for AI into two different categories: employment-based digital replicas and independently created digital replicas. The former occurs where an actor is already working on a project, and their image is then used via an artificially generated extension. The latter occurs where an actor’s recognisable likeness is entirely artificially generated for a project. Both require consent from the actor, which continues after death (and may also be granted by their estate), and there must be a “reasonably specific description of the intended use”.[9]
Equity, the UK actor’s union, does not currently have any equivalent in place in its agreements, but has issued an AI Vision Statement, which sets out eight principles for the film and TV industry to adopt for “performance cloning”.[10] The principles include consent, the right to license a performance/likeness on a time-limited basis and to be remunerated for the licence. Equity is campaigning to implement those principles into their collectively bargained agreements and is asserting that current UK legislation does not currently go far enough to protect performers’ rights.[11]
Unions both here and across the Atlantic will evidently continue to monitor closely how actors’ images are being used by GenAI, ensuring that any future use cases for text-to-video models are shaped by their concerns. It remains possible, though, that an actor may choose to consent to the use of their image by text-to-video models, which may mean that their career could evolve into entirely new and unanticipated formats both in life and well after their death.
Data protection
Under the UK GDPR,[12] “personal data” means any information that relates to an identified or identifiable living person. Text-to-video models will have been trained using footage that constitutes an actor’s personal data, whether used to augment a real performance already captured or to produce an entirely synthetic actor. While the former will use the personal data of a specific actor, it will probably be a requirement for the latter to be trained on footage of real actors to generate new content (particularly as the models look to improve the ability of synthetic actors to emote/produce dialogue), thereby processing their personal data.
Actors are entitled to be informed about how and why their personal data are being used, to have their personal data erased, to restrict the use of their data, to object to the use of their personal data and to withdraw their consent at any time. Actors cannot validly contract out of those rights.
There must also be a recorded lawful basis for sharing personal data. Of the six bases set out in the UK GDPR, the most relevant is likely to be consent to process the data for a specific purpose. Producers using GenAI tools must ensure that they have obtained consent from the actors whose performances outputs are based on. If an actor can prove that AI companies developing text-to-video models have used their personal data to train systems in the absence of their consent, that could be a breach of data protection laws.
The UK GDPR does make provision for “special purposes”, which exempt the processing of personal data from certain provisions if done with these purposes in mind. Those include an “artistic purpose”, which could potentially apply to the use of an actor’s pre-existing performances to train models. Yet for an “artistic purpose” to apply, the data processor must reasonably believe that it is in the public interest, and that applying the listed UK GDPR provisions would be incompatible with that purpose. If a text-to-video model is the purpose in question, that may well be hard to prove.
Conclusion
There is already an imagined future where individual GenAI users could simply enter a text prompt and generate an entire feature film with their ideal cast. That would be difficult to achieve lawfully without appropriate consents (and/or exceptions) under laws relating to copyright, image rights and data protection. Accordingly, use cases for text-to-video models are likely to be shaped by existing legislation, as much as new legislation may be informed by GenAI technology itself.
We look forward to seeing what approach the UK government takes to any legislative amendments and/or any fresh self-regulatory initiatives, and we shall be keeping a close eye on developments in case law both in the UK and in the US.
Article written for Entertainment Law Review.