AI and Copyright Law: Who Owns AI-Generated Content and Training Data Disputes

Courts in 2023–2024 began drawing the first legal boundaries around AI and copyright: the U.S. Copyright Office has refused registration for purely AI-generated images, and multiple federal lawsuits challenge whether training large AI models on copyrighted works is fair use or mass infringement.

The InfoNexus Editorial TeamMay 23, 20269 min read

A New Technology, an Old Law, and Billions of Dollars in Dispute

In February 2023, the U.S. Copyright Office issued its first major guidance on AI-generated works, refusing to register images produced by the Midjourney AI system in the comic book Zarya of the Dawn. The author, Kristina Kashtanova, retained copyright in her text and selection/arrangement of images, but the AI-generated images themselves were held unprotectable because copyright requires human authorship. That ruling—followed by a detailed policy statement in February 2023 and a formal registration guidance in March 2023—set off a cascade of legal challenges, Congressional hearings, and industry lobbying that has made AI copyright one of the most contested legal questions of the decade.

The Human Authorship Requirement

U.S. copyright law has required human authorship since at least 1884, when the Supreme Court in Burrow-Giles Lithographic Co. v. Sarony upheld copyright in a photograph by reasoning that the photographer exercised creative judgment. The Copyright Office has consistently refused to register works produced by non-humans, including a 1984 rejection of a computer-generated work and a 2022 refusal to register an AI-generated painting submitted by Dr. Stephen Thaler.

The human authorship doctrine creates a spectrum of copyright eligibility for AI-assisted works:

ScenarioCopyright EligibilityBasis
Human writes text; AI fixes grammarFull copyright to human authorHuman creative expression dominates
Human provides detailed prompt; AI generates imagePartial—selection/arrangement may qualifyCopyright Office case-by-case review
Human presses button; AI produces entire workNo copyright protectionInsufficient human authorship
AI autonomously creates workNo copyright protectionNo legal person capable of holding copyright

The Copyright Office's guidance emphasizes that the critical question is whether a human author made sufficiently creative choices. Detailed, iterative prompting may qualify; a single generic instruction almost certainly will not.

The Training Data Problem

The more commercially significant legal battle concerns the upstream side of generative AI: whether training large models on copyrighted text, images, and code constitutes copyright infringement. AI companies have relied principally on fair use as a defense. The argument proceeds in three steps:

  • Transformation: Training a model transforms copyrighted works into statistical weights—a fundamentally different purpose and character than consuming the original work.
  • No market substitution: A trained model does not reproduce copies of the training data in a way that substitutes for the originals in the market.
  • Public benefit: Training AI models serves the transformative public purpose of advancing technology.

Rights holders counter that the scale of copying—billions of works ingested without license or compensation—cannot be fair use regardless of purpose, and that AI outputs do compete with the original works as substitutes.

Active Litigation Landscape

Multiple high-stakes lawsuits are testing these questions in U.S. federal courts:

CaseCourtPlaintiffsKey Issue
New York Times v. Microsoft & OpenAIS.D.N.Y.New York TimesVerbatim reproduction of articles in ChatGPT outputs
Andersen v. Stability AIN.D. Cal.Visual artistsTraining image generators on scraped artwork
Getty Images v. Stability AID. Del.Getty ImagesWatermark reproduction; mass unauthorized copying
Authors Guild v. OpenAIS.D.N.Y.Fiction authorsTraining GPT models on copyrighted books
Thaler v. VidalFed. Cir.Dr. Stephen ThalerWhether AI can be listed as inventor on a patent

What AI Companies and Rights Holders Are Seeking

The legal disputes have generated parallel policy debates. Rights holders are lobbying for:

  • A mandatory licensing regime requiring AI developers to compensate creators whose works are used for training.
  • Opt-out registries so individual creators can exclude their works from training datasets.
  • Transparency requirements compelling AI companies to disclose what training data they used.

AI companies are seeking:

  • Judicial confirmation that training is fair use, establishing a stable legal foundation for the industry.
  • Safe harbor protections analogous to the DMCA's protection for internet platforms.
  • Clarity on ownership of AI-generated outputs to encourage investment in AI products.

International Approaches

Other jurisdictions have taken different stances. Japan has explicitly adopted a permissive approach: text and data mining for AI training is generally not copyright infringement under Japanese law, even for commercial purposes. The European Union's AI Act and Copyright Directive impose transparency and opt-out requirements without prohibiting training outright. The UK's Copyright, Designs and Patents Act includes a text and data mining exception for non-commercial research, and UK courts have yet to resolve whether commercial AI training falls within it.

The unresolved state of AI copyright law creates significant uncertainty for businesses building AI products, for creators trying to protect their works, and for consumers relying on AI-generated content. Courts are likely to shape AI copyright doctrine through conflicting decisions before Congress acts—a pattern that mirrors how copyright law adapted, slowly and imperfectly, to the arrival of the internet.

This article is for informational purposes only and does not constitute legal advice. Consult a qualified attorney for guidance on copyright issues related to AI systems.

intellectual-propertycopyrightartificial-intelligence

Related Articles