The legal framework for AI is being written in real time — in courtrooms, in Brussels, and in the terms-of-service updates that foundation model providers push out with little fanfare. Gurpreet S. Bal advises AI companies and their investors on commercial and M&A transactions where the ownership questions around models, training data, and generated output have no settled answers. "AI licensing is the fastest-moving area of commercial law right now — the contracts companies are signing today will be interpreted under case law that doesn't exist yet," he says.
Bal has advised on AI transactions spanning foundation model licensing, enterprise AI deployment agreements, and acquisitions where the IP stack of the target included both proprietary models and open source components.
Every large language model was trained on data scraped, licensed, or otherwise obtained from the internet and proprietary sources. The central legal question — whether training a model on copyrighted content constitutes infringement — is being litigated in multiple simultaneous cases involving publishers, authors, and code repositories. The fair use defense that foundation model developers rely on has significant support from prior transformative-use cases, but also significant uncertainty given the scale of copying and the commercial nature of the resulting products. For companies building on top of foundation models, the immediate practical concern is what representations and warranties the model provider makes about its training data, and whether those representations are backed by meaningful indemnification. For companies training their own models on proprietary datasets or licensed data, the scope of the data license needs to be reviewed carefully — many content licenses predate AI training as a use case and don't expressly authorize it.
The term "open source" applied to AI models is not self-defining. Meta's Llama models, Mistral's releases, and Stability AI's offerings are each licensed under different terms that impose different restrictions on commercial use, fine-tuning, redistribution, and derivative model creation. Llama's community license, for instance, restricts use by companies above certain monthly active user thresholds and prohibits using Llama outputs to train competing models. These are not traditional open source terms — they are custom commercial licenses with open source aesthetics. The relevant analysis for an enterprise deploying an open source model involves: what commercial uses are permitted, what attribution is required, what happens if a covered threshold is crossed, and whether fine-tuning creates a derivative work that triggers additional obligations. Getting this wrong has consequences both for IP ownership and for the representations made to counterparties about the freedom to use the resulting technology.
Copyright law in the United States requires human authorship. The Copyright Office has stated, and courts have begun to affirm, that AI-generated output is not copyrightable in the absence of sufficient human creative expression. The practical consequence is that content generated entirely by an AI system — text, images, code — may not be protectable by copyright at all. For companies whose product is AI-generated output, or who are building content libraries on AI-generated assets, this is a foundational IP risk. The analysis shifts where there is meaningful human curation, selection, or arrangement — but the line is unsettled. Contractual ownership provisions in AI vendor agreements may also purport to assign output ownership, but those provisions cannot create copyright protection that copyright law doesn't confer. Due diligence for any company whose value depends on AI-generated content needs to include a frank assessment of what, if anything, is actually protectable.
The EU AI Act creates a tiered compliance framework based on risk classification, with general-purpose AI models subject to their own transparency and technical documentation requirements. The Act draws a critical distinction between AI providers (the entities that develop and place AI systems on the market) and deployers (the entities that use AI systems in their own products and services). Both categories carry independent compliance obligations, and the allocation of responsibility between them is a contractual question that AI vendor agreements are only beginning to address. High-risk AI applications — in employment, credit, critical infrastructure, education, biometrics — face the most demanding requirements: conformity assessments, technical documentation, human oversight mechanisms, and registration in an EU database. Companies deploying AI systems in Europe need to understand both their own classification as a deployer and what compliance assurances they are entitled to receive from the provider.
The indemnification landscape for AI output is thin. Major foundation model providers have introduced intellectual property indemnification programs — Microsoft's Copilot Copyright Commitment, Google's similar offering for Workspace and Cloud customers — but these indemnities are narrowly scoped, subject to usage restrictions, and capped. They generally cover copyright infringement claims arising from the model's output when the product is used as intended, but they do not cover: claims arising from customer-provided inputs, claims in jurisdictions outside the scope of the program, claims where the customer modified or fine-tuned the model, or claims that the model's output defames a third party or violates privacy rights. Understanding exactly what the indemnification covers — and what it doesn't — is essential before building a commercial product on top of a foundation model and making representations to downstream customers about IP ownership.
Gurpreet S. Bal is a corporate partner with 16 years advising on private equity, merger transactions, and public offerings for companies and investors at three of the world's top law firms. He has represented clients in hundreds of transactions with aggregate deal value exceeding $60 billion across AI, semiconductors, fintech, and emerging technology. For more information and to get in touch, visit gurpreetbal.com.