AI and Copyright Law: Who Owns AI-Generated Content and Training Data Disputes
Courts in 2023–2024 began drawing the first legal boundaries around AI and copyright: the U.S. Copyright Office has refused registration for purely AI-generated images, and multiple federal lawsuits challenge whether training large AI models on copyrighted works is fair use or mass infringement.
A New Technology, an Old Law, and Billions of Dollars in Dispute
In February 2023, the U.S. Copyright Office issued its first major guidance on AI-generated works, refusing to register images produced by the Midjourney AI system in the comic book Zarya of the Dawn. The author, Kristina Kashtanova, retained copyright in her text and selection/arrangement of images, but the AI-generated images themselves were held unprotectable because copyright requires human authorship. That ruling—followed by a detailed policy statement in February 2023 and a formal registration guidance in March 2023—set off a cascade of legal challenges, Congressional hearings, and industry lobbying that has made AI copyright one of the most contested legal questions of the decade.
The Human Authorship Requirement
U.S. copyright law has required human authorship since at least 1884, when the Supreme Court in Burrow-Giles Lithographic Co. v. Sarony upheld copyright in a photograph by reasoning that the photographer exercised creative judgment. The Copyright Office has consistently refused to register works produced by non-humans, including a 1984 rejection of a computer-generated work and a 2022 refusal to register an AI-generated painting submitted by Dr. Stephen Thaler.
The human authorship doctrine creates a spectrum of copyright eligibility for AI-assisted works:
| Scenario | Copyright Eligibility | Basis |
|---|---|---|
| Human writes text; AI fixes grammar | Full copyright to human author | Human creative expression dominates |
| Human provides detailed prompt; AI generates image | Partial—selection/arrangement may qualify | Copyright Office case-by-case review |
| Human presses button; AI produces entire work | No copyright protection | Insufficient human authorship |
| AI autonomously creates work | No copyright protection | No legal person capable of holding copyright |
The Copyright Office's guidance emphasizes that the critical question is whether a human author made sufficiently creative choices. Detailed, iterative prompting may qualify; a single generic instruction almost certainly will not.
The Training Data Problem
The more commercially significant legal battle concerns the upstream side of generative AI: whether training large models on copyrighted text, images, and code constitutes copyright infringement. AI companies have relied principally on fair use as a defense. The argument proceeds in three steps:
- Transformation: Training a model transforms copyrighted works into statistical weights—a fundamentally different purpose and character than consuming the original work.
- No market substitution: A trained model does not reproduce copies of the training data in a way that substitutes for the originals in the market.
- Public benefit: Training AI models serves the transformative public purpose of advancing technology.
Rights holders counter that the scale of copying—billions of works ingested without license or compensation—cannot be fair use regardless of purpose, and that AI outputs do compete with the original works as substitutes.
Active Litigation Landscape
Multiple high-stakes lawsuits are testing these questions in U.S. federal courts:
| Case | Court | Plaintiffs | Key Issue |
|---|---|---|---|
| New York Times v. Microsoft & OpenAI | S.D.N.Y. | New York Times | Verbatim reproduction of articles in ChatGPT outputs |
| Andersen v. Stability AI | N.D. Cal. | Visual artists | Training image generators on scraped artwork |
| Getty Images v. Stability AI | D. Del. | Getty Images | Watermark reproduction; mass unauthorized copying |
| Authors Guild v. OpenAI | S.D.N.Y. | Fiction authors | Training GPT models on copyrighted books |
| Thaler v. Vidal | Fed. Cir. | Dr. Stephen Thaler | Whether AI can be listed as inventor on a patent |
What AI Companies and Rights Holders Are Seeking
The legal disputes have generated parallel policy debates. Rights holders are lobbying for:
- A mandatory licensing regime requiring AI developers to compensate creators whose works are used for training.
- Opt-out registries so individual creators can exclude their works from training datasets.
- Transparency requirements compelling AI companies to disclose what training data they used.
AI companies are seeking:
- Judicial confirmation that training is fair use, establishing a stable legal foundation for the industry.
- Safe harbor protections analogous to the DMCA's protection for internet platforms.
- Clarity on ownership of AI-generated outputs to encourage investment in AI products.
International Approaches
Other jurisdictions have taken different stances. Japan has explicitly adopted a permissive approach: text and data mining for AI training is generally not copyright infringement under Japanese law, even for commercial purposes. The European Union's AI Act and Copyright Directive impose transparency and opt-out requirements without prohibiting training outright. The UK's Copyright, Designs and Patents Act includes a text and data mining exception for non-commercial research, and UK courts have yet to resolve whether commercial AI training falls within it.
The unresolved state of AI copyright law creates significant uncertainty for businesses building AI products, for creators trying to protect their works, and for consumers relying on AI-generated content. Courts are likely to shape AI copyright doctrine through conflicting decisions before Congress acts—a pattern that mirrors how copyright law adapted, slowly and imperfectly, to the arrival of the internet.
This article is for informational purposes only and does not constitute legal advice. Consult a qualified attorney for guidance on copyright issues related to AI systems.
Related Articles
intellectual property
Copyright Fair Use Doctrine: The Four-Factor Test and Where Courts Draw the Line
Fair use is the most litigated copyright doctrine in U.S. law. The four-factor test has no bright-line rules—courts have found the same amount of copying to be both infringing and non-infringing depending on context, purpose, and market impact.
9 min read
intellectual property
GDPR Data Subject Rights: Access, Erasure, Portability, and How to Exercise Them
A comprehensive guide to the eight data subject rights under GDPR—covering access, erasure, portability, rectification, and how individuals can exercise them against controllers.
9 min read
intellectual property
How Fair Use Works in Copyright Law
A detailed exploration of the fair use doctrine in US copyright law, covering the four-factor test, landmark cases, common misconceptions, and how to evaluate whether a use is likely to qualify.
10 min read
intellectual property
How Patent Applications Are Filed and Reviewed by the USPTO
Filing a patent requires navigating USPTO procedures, claim drafting, and examination. The process from application to grant typically takes two to three years.
9 min read