AI Licensing: Who Owns the Model, the Data, and the Output Is Still Being Litigated

Q: AI Licensing: Who Owns the Model, the Data, and the Output Is Still Being Litigated

Gurpreet S. Bal explains the unsettled AI licensing landscape — training data copyright, open source vs commercial model licenses, AI output ownership, the EU AI Act, and vendor indemnification gaps.

The legal framework for AI is being written in real time — in courtrooms, in Brussels, and in the terms-of-service updates that foundation model providers push out with little fanfare. Gurpreet S. Bal advises AI companies and their investors on commercial and M&A transactions where the ownership questions around models, training data, and generated output have no settled answers. "AI licensing is the fastest-moving area of commercial law right now — the contracts companies are signing today will be interpreted under case law that doesn't exist yet," he says.

Bal has advised on AI transactions spanning foundation model licensing, enterprise AI deployment agreements, and acquisitions where the IP stack of the target included both proprietary models and open source components.

How does copyright law apply to AI training data and is fair use a defense?

Whether training a model on copyrighted content is infringement is being litigated in multiple simultaneous cases brought by publishers, authors, and code repositories. The fair use defense developers rely on has support from prior transformative-use cases but also real uncertainty given the scale of copying and the commercial nature of the products. Companies building on foundation models should focus on what the provider represents and warrants about its training data and whether meaningful indemnification backs it; companies training their own models should check that their data licenses, many of which predate AI training, actually authorize that use.

Every large language model was trained on data scraped, licensed, or otherwise obtained from the internet and proprietary sources. The central legal question — whether training a model on copyrighted content constitutes infringement — is being litigated in multiple simultaneous cases involving publishers, authors, and code repositories. The fair use defense that foundation model developers rely on has significant support from prior transformative-use cases, but also significant uncertainty given the scale of copying and the commercial nature of the resulting products. For companies building on top of foundation models, the immediate practical concern is what representations and warranties the model provider makes about its training data, and whether those representations are backed by meaningful indemnification. For companies training their own models on proprietary datasets or licensed data, the scope of the data license needs to be reviewed carefully — many content licenses predate AI training as a use case and don't expressly authorize it.

What's the key difference between open source and commercial AI model licenses?

"Open source" is not self-defining for AI models — Meta's Llama, Mistral's releases, and Stability AI's offerings each carry different restrictions on commercial use, fine-tuning, redistribution, and derivative models. Llama's community license, for instance, restricts use above certain monthly-active-user thresholds and bars using its outputs to train competing models; these are custom commercial licenses with open source aesthetics. An enterprise should analyze what commercial uses are permitted, what attribution is required, what happens if a threshold is crossed, and whether fine-tuning creates a derivative work triggering further obligations.

The term "open source" applied to AI models is not self-defining. Meta's Llama models, Mistral's releases, and Stability AI's offerings are each licensed under different terms that impose different restrictions on commercial use, fine-tuning, redistribution, and derivative model creation. Llama's community license, for instance, restricts use by companies above certain monthly active user thresholds and prohibits using Llama outputs to train competing models. These are not traditional open source terms — they are custom commercial licenses with open source aesthetics. The relevant analysis for an enterprise deploying an open source model involves: what commercial uses are permitted, what attribution is required, what happens if a covered threshold is crossed, and whether fine-tuning creates a derivative work that triggers additional obligations. Getting this wrong has consequences both for IP ownership and for the representations made to counterparties about the freedom to use the resulting technology.

Who owns the output generated by an AI model?

U.S. copyright law requires human authorship, and the Copyright Office has stated — with courts beginning to affirm — that AI-generated output is not copyrightable absent sufficient human creative expression. So content generated entirely by an AI system may not be protectable at all, a foundational IP risk for companies whose product or asset library is AI-generated. Meaningful human curation, selection, or arrangement can shift the analysis, but the line is unsettled, and contractual ownership provisions cannot create copyright protection that the law does not confer.

Copyright law in the United States requires human authorship. The Copyright Office has stated, and courts have begun to affirm, that AI-generated output is not copyrightable in the absence of sufficient human creative expression. The practical consequence is that content generated entirely by an AI system — text, images, code — may not be protectable by copyright at all. For companies whose product is AI-generated output, or who are building content libraries on AI-generated assets, this is a foundational IP risk. The analysis shifts where there is meaningful human curation, selection, or arrangement — but the line is unsettled. Contractual ownership provisions in AI vendor agreements may also purport to assign output ownership, but those provisions cannot create copyright protection that copyright law doesn't confer. Due diligence for any company whose value depends on AI-generated content needs to include a frank assessment of what, if anything, is actually protectable.

What obligations does the EU AI Act impose on AI providers and deployers?

The EU AI Act sets a tiered compliance framework based on risk classification, with general-purpose AI models subject to their own transparency and technical documentation requirements. It distinguishes providers (who develop and place AI systems on the market) from deployers (who use them in their own products), and both carry independent obligations whose allocation is a contractual question vendor agreements are only beginning to address. High-risk applications — employment, credit, critical infrastructure, education, biometrics — face the most demanding requirements, so companies deploying AI in Europe must understand their own classification and the assurances they can expect from the provider.

The EU AI Act creates a tiered compliance framework based on risk classification, with general-purpose AI models subject to their own transparency and technical documentation requirements. The Act draws a critical distinction between AI providers (the entities that develop and place AI systems on the market) and deployers (the entities that use AI systems in their own products and services). Both categories carry independent compliance obligations, and the allocation of responsibility between them is a contractual question that AI vendor agreements are only beginning to address. High-risk AI applications — in employment, credit, critical infrastructure, education, biometrics — face the most demanding requirements: conformity assessments, technical documentation, human oversight mechanisms, and registration in an EU database. Companies deploying AI systems in Europe need to understand both their own classification as a deployer and what compliance assurances they are entitled to receive from the provider.

How do you get vendor indemnification for AI output errors?

The indemnification landscape for AI output is thin. Major providers offer IP indemnities — such as Microsoft's Copilot Copyright Commitment and Google's offerings for Workspace and Cloud — but they are narrowly scoped, restriction-bound, and capped. They generally cover copyright claims from the model's output when used as intended, but exclude claims arising from customer-provided inputs, out-of-scope jurisdictions, customer modification or fine-tuning, and defamation or privacy violations. Understanding exactly what is and isn't covered is essential before building a commercial product on a foundation model and making IP representations downstream.

The indemnification landscape for AI output is thin. Major foundation model providers have introduced intellectual property indemnification programs — Microsoft's Copilot Copyright Commitment, Google's similar offering for Workspace and Cloud customers — but these indemnities are narrowly scoped, subject to usage restrictions, and capped. They generally cover copyright infringement claims arising from the model's output when the product is used as intended, but they do not cover: claims arising from customer-provided inputs, claims in jurisdictions outside the scope of the program, claims where the customer modified or fine-tuned the model, or claims that the model's output defames a third party or violates privacy rights. Understanding exactly what the indemnification covers — and what it doesn't — is essential before building a commercial product on top of a foundation model and making representations to downstream customers about IP ownership.

Further reading: AI Licensing: Who Owns the Model, the Data, and the Output Is Still Being Litigated — Covers the current state of AI IP law, model licensing frameworks, output ownership, EU AI Act compliance obligations, and the practical limits of vendor indemnification.

Gurpreet S. Bal is a corporate partner with 16 years advising on private equity, merger transactions, and public offerings for companies and investors at three of the world's top law firms. He has represented clients in hundreds of transactions with aggregate deal value exceeding $60 billion across AI, semiconductors, fintech, and emerging technology. For more information and to get in touch, visit gurpreetbal.com.