by Frank Bilotto
OK, I’ve promised the executive committee of the Creative Licensing International that the focus of Content Licensing Brief will not be all AI all the time. And it won’t be. But I’ve been bothered by the legal approach to AI generated content for the last year, which culminated this week with the lawsuit filed by the New York Times (“NYT”) against Open AI, owner of the well known Chat GPT. So, for the third issue in a row, I will put forth my ideas on this subject, before turning to some of the other important issues that face publishers licensing their content.
Last week the The New York Times officially launched legal action against the threat that artificial intelligence poses to the future of journalism when it filed a federal lawsuit against OpenAI and Microsoft seeking to end the practice of using its stories to train chatbots. The Times had been in negotiations with Open AI and Microsoft for the past nine months, but the talks failed to produce an agreement. So, the Times filed a lawsuit in federal court, alleging that Open AI and Microsoft, “threaten the Times’s ability to provide its service” by “effectively stealing billions of dollars worth of work by its journalists,” and replacing the market for NYT content, thereby depriving NYT of “subscription, licensing, advertising, and affiliate revenue.”
Most of you reading this article know that facts are not copyrightable. News is essentially a retelling of facts. However, what is copyrightable is the expression of the facts. So, if you publish an article that contains many facts, the facts aren’t copyrightable. However, copyright protection can be obtained over how the facts are expressed, meaning the word choice, format of presentation and so on. But determining whether a news article is entitled to copyright protection is a tightrope walk. If the text of the article is so closely related to the facts being expressed that the facts cannot be expressed in another way, then the article is not subject to copyright protection. So, if Chat GTP is taking the facts presented in an NYT article and presenting them in a different format, with different word choices, etc… or if the facts are so integral to the article that the article could not be written any other way, it is unlikely that a court would determine that Chat GPT is infringing on copyright.
This NYT lawsuit is not the first case to address the financial implications of Generative AI content, but it is the first to make the money the primary issue. While there have been a plethora of lawsuits filed against AI companies in recent years, there are three others that stand out as being particularly relevant to the financial impact of generative AI on publishers.
Four years ago, while Chat GPT was still in its early stages of development, UAB Planner 5D filed a copyright infringement complaint against Facebook, Inc. Planner 5D operates a home design website that allows users to create virtual interior design scenes using a library of virtual objects, such as tables and chairs, to populate the scenes, of which Planner 5D claimed it is the copyright owner.
Planner 5D alleged that Facebook downloaded the entire collection of objects and scenes because of the commercial potential of scene recognition technology. Facebook argued that “Planner 5D’s works are data files that cannot be copyrighted as literary works because they lack human authorship, or as pictorial works because they lack originality.” The case is yet to go to trial, however, it is likely to address the question of the financial implications to both Facebook and Planner 5D of Facebook’s conduct. It will be difficult to prove that the objects or even the scenes created with them are eligible for copyright protection, but there is little question that Facebook can commercially benefit from the acquisition and use of the content. Publishers should keep a close eye on this case, because publishers win if Planner 5D prevails, as it will pave the way for other courts to preclude utilizing unauthorized content for commercial benefit, even if it is not eligible for copyright protection.
In 2020, Thomson Reuters and Westlaw sued ROSS Intelligence Inc. for copyright infringement relating to the unlawful use of the Westlaw’s unique platform capabilities. ROSS partnered with LegalEase, which “used a bot to download and store mass quantities of West’s proprietary information,” which it then provided to ROSS. This proprietary information in question is not the case summaries, but the Key Number System and the Headnotes.
Thomson argued that the Westlaw content is creative, which weighs against fair use and ROSS harmed the market for Westlaw content by taking and using Westlaw content to simply generate a ROSS product to displace Westlaw’s product.
ROSS argued that “any copying was intermediate and the final ROSS product does not contain any copyrighted materials,” and that the ROSS product does not replace the market for Westlaw’s works.
If and when it reaches the courtroom, this case will address two important issues as it relates to publishers:
1) The court will determine whether training AI on copyrighted materials constitutes transformative fair use. A transformative purpose adds something new to the copyrighted material, with a further purpose or different character, and does not substitute for the original use of the work. This case is likely to provide a unique opportunity in understanding how courts will analyze a fair use defense related to AI training on materials that include legal opinions that, while themselves are not subject to copyright protection, are still accompanied by creative expressive materials created and owned by Thomson Reuters.
2) The court is also likely to consider whether downloading material for AI training purposes as part of a subscription is in violation of terms of service and is a breach of contract liability. As a subscriber to Westlaw, there is nothing in the subscription agreement that prohibits a user from downloading the entire repository of content. However, to ingest that content into a machine learning technology raises the question of will the generative AI produce content that potentially replaces the market for Westlaw.
In February, Getty Images filed a lawsuit against Stability AI, alleging infringement of Getty’s copyrighted photographs, among other things. While the case has not yet reached the discovery stage, Stability AI’s defense will likely be that training AI on copyright protected materials qualifies as a transformative purpose that weighs heavily in favor of fair use.
The outcome of this case will have significant impact on whether AI will continue to undermine the content creators’ ability and right to license under the Copyright Act, and potentially jeopardize the livelihoods of millions of human creators.
How a court will decide NYT vs. Open AI and Microsoft has as much to do with the fact that the media has already been pummeled by a migration of readers to a variety of online platforms. While many publications — most notably the Times — have successfully carved out a digital space, the rapid development of AI threatens to significantly upend the publishing industry. No matter what side you think the law should fall, consider what I learned in law school thirty-five years ago and have seen play out in my career over the years. Courts don’t always rely on the law and precedent to reach decisions. Occasionally, courts make rulings on what they believe is the best public policy. My instinct tells me that even if it is considered transformative fair use, it’s bad public policy to permit AI to utilize content to directly compete against the creator of the content. As such, courts must do whatever is needed to protect the sustainability of publishers.
So, the long term viability of the publishing world may rest in the verdicts of these four cases. Planner 5D will decide whether the commercial value of what is typically considered not copyrightable content is protected from digital poaching. Thomson will determine whether AI generative content is transformative and whether the ingestion of content into machine learning technology is a breach of contract in a subscription agreement.
Getty will determine whether AI generated content is transformative and therefore, its conduct is fair use. And NYT will determine whether generative AI content effectively replaces the market of the publisher.
However, it is more likely that none of these cases will ever see a courtroom. There is too much risk on both sides to allow a court to decide the fate of the publishing or AI. Both sides of each case have winnable and losable arguments. There will be more lawsuits filed by copyright holders in the coming months and years, until the law catches up to the technology. My fear is that by that point in time, it will be too late to reverse the current. Publishers, particularly of news, seem to have the mindset, if you can’t beat ‘em, join ‘em. Axel Springer and Open AI executed and agreement that will pay Axel Springer an estimated $10 million. And, while not directly related to AI, the New York Times will reportedly receive $100 million over the next 3 years to let Google share firewalled NYT content with their audience.
I don’t think it’s prudent for publisher’s to wait for the courts and the law to protect their content and profitability. After doing the research to write this article, my previous opinion has only been validated exponentially. Publishers and AI need to make deals for what they both need going forward. Publishers need predictability of revenue streams and AI needs content to train its gen AI engines.
The question is how and when? If you wait too long to get into the game, it’s likely that your content will be worth less, because other publishers in your space will have already made deals. But if you get in too early, you may not be maximizing your revenue opportunity. For example if you do a fixed fee deal, now with Chat GTP, your fee will likely be based upon what Chat GPT believes is the value of your content based upon Chat GPT’s current business. But Chat GPT is expected to grow exponentially in the coming years, which means your content may be worth more in 5 years than it is right now.
I write this just two weeks after I advised publishers to negotiate up front fixed fee deals. I still think this is the way to go. But the terms of the agreements that I envision now are much different than they were then. The up front fixed fee must be adjusted during the term of the agreement to reflect the growth of the AI company over the same period. And then we must consider if and when a license agreement is terminated. Once in, the content is forever part of the machine learning technology.
If companies like Chat GPT and Google become the biggest publishers and aggregators in the world in the next 10 years, as I predict, revenues from AI generated content will be exponentially larger than anyone can foresee, today. Our clients, the publishers, need to be compensated fairly for use of their archived content well into the future. I already have some ideas on how to do it. I think I might have to go back to the executive committee and ask for one more week of AI.