AI and Copyright Law: Who Owns AI-Generated Content and Training Data Disputes

A New Technology, an Old Law, and Billions of Dollars in Dispute

In February 2023, the U.S. Copyright Office issued its first major guidance on AI-generated works, refusing to register images produced by the Midjourney AI system in the comic book Zarya of the Dawn. The author, Kristina Kashtanova, retained copyright in her text and selection/arrangement of images, but the AI-generated images themselves were held unprotectable because copyright requires human authorship. That ruling—followed by a detailed policy statement in February 2023 and a formal registration guidance in March 2023—set off a cascade of legal challenges, Congressional hearings, and industry lobbying that has made AI copyright one of the most contested legal questions of the decade.

The Human Authorship Requirement

U.S. copyright law has required human authorship since at least 1884, when the Supreme Court in Burrow-Giles Lithographic Co. v. Sarony upheld copyright in a photograph by reasoning that the photographer exercised creative judgment. The Copyright Office has consistently refused to register works produced by non-humans, including a 1984 rejection of a computer-generated work and a 2022 refusal to register an AI-generated painting submitted by Dr. Stephen Thaler.

The human authorship doctrine creates a spectrum of copyright eligibility for AI-assisted works:

Scenario	Copyright Eligibility	Basis
Human writes text; AI fixes grammar	Full copyright to human author	Human creative expression dominates
Human provides detailed prompt; AI generates image	Partial—selection/arrangement may qualify	Copyright Office case-by-case review
Human presses button; AI produces entire work	No copyright protection	Insufficient human authorship
AI autonomously creates work	No copyright protection	No legal person capable of holding copyright

The Copyright Office's guidance emphasizes that the critical question is whether a human author made sufficiently creative choices. Detailed, iterative prompting may qualify; a single generic instruction almost certainly will not.

The Training Data Problem

The more commercially significant legal battle concerns the upstream side of generative AI: whether training large models on copyrighted text, images, and code constitutes copyright infringement. AI companies have relied principally on fair use as a defense. The argument proceeds in three steps:

Transformation: Training a model transforms copyrighted works into statistical weights—a fundamentally different purpose and character than consuming the original work.
No market substitution: A trained model does not reproduce copies of the training data in a way that substitutes for the originals in the market.
Public benefit: Training AI models serves the transformative public purpose of advancing technology.

Rights holders counter that the scale of copying—billions of works ingested without license or compensation—cannot be fair use regardless of purpose, and that AI outputs do compete with the original works as substitutes.

Active Litigation Landscape

Multiple high-stakes lawsuits are testing these questions in U.S. federal courts:

Case	Court	Plaintiffs	Key Issue
New York Times v. Microsoft & OpenAI	S.D.N.Y.	New York Times	Verbatim reproduction of articles in ChatGPT outputs
Andersen v. Stability AI	N.D. Cal.	Visual artists	Training image generators on scraped artwork
Getty Images v. Stability AI	D. Del.	Getty Images	Watermark reproduction; mass unauthorized copying
Authors Guild v. OpenAI	S.D.N.Y.	Fiction authors	Training GPT models on copyrighted books
Thaler v. Vidal	Fed. Cir.	Dr. Stephen Thaler	Whether AI can be listed as inventor on a patent

What AI Companies and Rights Holders Are Seeking

The legal disputes have generated parallel policy debates. Rights holders are lobbying for:

A mandatory licensing regime requiring AI developers to compensate creators whose works are used for training.
Opt-out registries so individual creators can exclude their works from training datasets.
Transparency requirements compelling AI companies to disclose what training data they used.

AI companies are seeking:

Judicial confirmation that training is fair use, establishing a stable legal foundation for the industry.
Safe harbor protections analogous to the DMCA's protection for internet platforms.
Clarity on ownership of AI-generated outputs to encourage investment in AI products.

International Approaches

Other jurisdictions have taken different stances. Japan has explicitly adopted a permissive approach: text and data mining for AI training is generally not copyright infringement under Japanese law, even for commercial purposes. The European Union's AI Act and Copyright Directive impose transparency and opt-out requirements without prohibiting training outright. The UK's Copyright, Designs and Patents Act includes a text and data mining exception for non-commercial research, and UK courts have yet to resolve whether commercial AI training falls within it.

The unresolved state of AI copyright law creates significant uncertainty for businesses building AI products, for creators trying to protect their works, and for consumers relying on AI-generated content. Courts are likely to shape AI copyright doctrine through conflicting decisions before Congress acts—a pattern that mirrors how copyright law adapted, slowly and imperfectly, to the arrival of the internet.

This article is for informational purposes only and does not constitute legal advice. Consult a qualified attorney for guidance on copyright issues related to AI systems.

AI and Copyright Law: Who Owns AI-Generated Content and Training Data Disputes

A New Technology, an Old Law, and Billions of Dollars in Dispute

The Human Authorship Requirement

The Training Data Problem

Active Litigation Landscape

What AI Companies and Rights Holders Are Seeking

International Approaches

Related Articles

Copyright Fair Use Doctrine: The Four-Factor Test and Where Courts Draw the Line

GDPR Data Subject Rights: Access, Erasure, Portability, and How to Exercise Them

How Fair Use Works in Copyright Law

How Patent Applications Are Filed and Reviewed by the USPTO