The New York Times (NYT) has filed a lawsuit against artificial intelligence research company OpenAI and its partner Microsoft, alleging the companies violated copyright law by scraping NYT articles without permission to train AI systems like ChatGPT. This landmark case questions the legalities around using copyrighted content to develop AI systems that can generate human-like text.
Background of the Dispute
OpenAI and Microsoft launched the ChatGPT chatbot in November 2022. ChatGPT is powered in part by machine learning models that were trained on vast troves of text data scraped from the internet without explicit permission from content creators.
The NYT alleges OpenAI and Microsoft copied “millions of articles” from NYT to train AI systems to complete tasks like summarizing articles, translating text, and answering questions in a conversational manner. While using copyrighted material to train AI is common, doing so without permission opens legal questions around copyright protections.
NYT sent OpenAI a cease-and-desist letter on November 28 demanding it stop using NYT content. OpenAI did not respond, prompting the lawsuit.
Details of the Lawsuit
The NYT complaint, filed December 27 in federal court, states:
- OpenAI and Microsoft created AI products that can scan, scrape, and copy content without permission or compensation
- Their actions violate copyright protections by creating derivative works and unlawfully copying original expressions
- Their infringement was “systematic, willful, and ongoing”
- The NYT suffered significant economic and reputational damages
The lawsuit seeks permanent injunction barring unauthorized use of NYT content for model training. It also asks OpenAI and Microsoft pay damages, including any profits earned from infringing systems.
OpenAI’s Defense
OpenAI published a blog post defending its practices and laying out principles for responsible AI development, including correctly attributing data sources. However, it believes scraping public web pages at scale for AI training constitutes “fair use” under copyright law and plans to defend itself vigorously in court.
Microsoft has not issued a public statement.
Implications of the Ruling
Legal experts say this case will likely prompt important decisions around copyright rules for AI systems:
Issue | Implication |
---|---|
Defining AI model outputs | Are AI model outputs like ChatGPT responses considered “derivative works” subject to copyright? Or new creative works? |
Fair use standards | Does scraping copyrighted online content to train AI constitute “fair use”? |
Liability distribution | Who bears legal responsibility – the AI developer or end users? |
The court’s rulings on these issues could determine what content AI systems can legally access and how culpable different parties are for potential infringement.
If the NYT wins, it would force major changes around sourcing training data. Tech companies may need to pursue commercial licensing deals with publishers, pay royalties, or only use public domain content.
However, if fair use protections apply, the burden would fall more on copyright holders to monitor infringement issues.
What Happens Next
The lawsuit will likely take months to play out. In the meantime:
- OpenAI and Microsoft will aim to continue ChatGPT’s rapid adoption while defending claims of copyright violation
- NYT and media companies will advocate for stronger legal protections and compensation for use of their content
- Policymakers may propose new regulations around AI ethics and rights protections
- AI researchers may need to overhaul practices around properly sourcing training data
While this case raises thorny issues, it underscores the fast-rising impact of AI and the need for clear rules of the road to govern ethical development. How the court balances the interests of copyright holders, AI innovators, and public access to information remains to be seen.
To err is human, but AI does it too. Whilst factual data is used in the production of these articles, the content is written entirely by AI. Double check any facts you intend to rely on with another source.