AI Acquisitions and Reps & Warranties: Training Data, Model Ownership, and the Gaps Nobody Discloses

Q: AI Acquisitions and Reps & Warranties: Training Data, Model Ownership, and the Gaps Nobody Discloses

Gurpreet S. Bal identifies the AI-specific gaps in standard tech R&W schedules — training data provenance, open source contamination, and model IP ownership questions.

The standard representations and warranties schedule in a technology acquisition was built for a different asset. It asks about software IP, source code ownership, employee invention assignments, third-party licenses, and open source usage in the product. These are the right questions for a software company. They are incomplete questions for an AI company — where the primary asset is not software in the traditional sense, but a trained model whose value depends on what it was trained on, how it was built, and who owns the resulting weights.

"The standard tech reps schedule asks about software IP. It doesn't ask whether the training data was licensed. Those are very different questions," says Gurpreet S. Bal. "I've had deals where the AI model was the primary asset and the IP reps didn't specifically cover it. That's a problem."

Gurpreet is a corporate partner representing investors and companies in fundraising and exit transactions, and is known for a straightforward, cut-to-the-chase approach in dealings with clients and counterparties. In 2026, with AI regulation advancing rapidly across the US and EU, acquirers are increasingly asking questions the standard tech R&W schedule was never designed to answer — and sellers are discovering the scope of disclosure required.

What are the key AI-specific representation gaps in standard tech deals?

Standard tech M&A rep and warranty templates were not designed for AI companies and contain critical gaps around training data provenance, model performance claims, foundation model license compliance, and AI Act regulatory status. Buyers who rely on generic tech reps without AI-specific additions may close without adequate protection for the liabilities that matter most in AI acquisitions — and those liabilities can be massive.

Gurpreet S. Bal identifies four categories of AI-specific exposure that standard tech reps schedules consistently fail to address. First, training data provenance: who owns the data the model was trained on, was it licensed for training purposes, and are there any outstanding claims related to its use? Second, open source contamination in model weights: certain open source model licenses impose restrictions on commercial use that can affect the acquirer's ability to deploy the model without restriction. Third, regulatory exposure for the model's outputs: if the model has been used in regulated applications — lending, hiring, medical contexts — there may be accumulated regulatory exposure the standard IP reps don't capture. Fourth, IP ownership of fine-tuned or customized models: who owns a model that was built on a third-party foundation model and fine-tuned with proprietary data? The answer is frequently unclear and is not addressed in standard reps.

Why is training data provenance the hardest issue to disclose?

Training data provenance is hard to disclose fully because many AI companies cannot identify with certainty every source that contributed data to their training corpus, whether each source authorized commercial use, and whether any source's data was scraped in violation of terms of service or copyright. Sellers who give accurate and complete training data provenance reps are rare; buyers who rely on broad general reps without specific disclosure schedules face significant undisclosed liability.

Training data provenance is particularly challenging because many AI companies, particularly those founded in the 2019 to 2023 period, built their initial models using internet-scraped data without clear licensing. The legal status of using publicly available data for model training remains contested — pending litigation in multiple jurisdictions is directly addressing these questions. Sellers in AI acquisitions face a dilemma: disclosing that training data was scraped without specific licenses invites acquirer concern and potential deal repricing; not disclosing it creates post-closing indemnification exposure. Gurpreet S. Bal notes that acquirers are increasingly demanding specific reps about training data provenance precisely because the issue is known and material, and because post-closing liability for training data claims can be significant. The disclosure conversation is uncomfortable but unavoidable in any well-structured AI acquisition due diligence process.

How do foundation model licenses create unexpected M&A complications?

Foundation model licenses from providers like OpenAI, Anthropic, Google, and Meta contain commercial use restrictions, redistribution limitations, and assignment provisions that may prohibit or restrict transfer in an M&A transaction without provider consent. Buyers who acquire a company without checking whether the foundation model licenses can be assigned in a merger or asset sale may discover post-closing that their primary AI capability requires renegotiation or replacement.

Many AI companies build products by fine-tuning foundation models provided by major AI labs. These foundation models typically come with licenses that impose restrictions on commercial use, sublicensing, and in some cases on the ownership of derivative models built on top of them. In an M&A context, the acquirer inherits the company's license position — including any restrictions that come with it. Gurpreet S. Bal has encountered deals where the target company's core product was built on a foundation model license that prohibited transfer or assignment without the licensor's consent. This creates a closing condition: the acquirer needs the licensor's consent before the acquisition can complete. In competitive M&A processes, this kind of consent requirement — requiring engagement with a potentially unsympathetic third party — can create significant deal uncertainty. Reviewing foundation model license terms is now a standard part of AI M&A diligence.

How are AI-specific reps and warranties being structured in current deals?

Current AI M&A deals are adding specific representations covering training data sourcing and licensing, compliance with applicable AI regulations including the EU AI Act, accuracy of stated model performance metrics, absence of known biases that could create regulatory or litigation exposure, and compliance with all foundation model license terms. These reps typically have longer survival periods and are carved out from standard indemnification caps given their potentially large liability exposure.

Gurpreet S. Bal describes the current market as actively developing its standard approach. Best-practice AI acquisitions in 2026 include a dedicated AI-specific representations section addressing: the completeness and accuracy of the data inventory provided in diligence, the licensing status of all training data used in the company's models, the absence of open source model components whose licenses restrict commercial deployment, the company's compliance with applicable AI regulations (including EU AI Act requirements for any EU operations), and the ownership of all model weights and fine-tuned versions. These reps are backed by a specific disclosure schedule that requires the company to list its training data sources, foundation model licenses, and regulatory interactions. The disclosure schedule often reveals issues that the seller's team had not fully inventoried — and that conversation, while sometimes difficult, is far better to have before closing than after.

Gurpreet S. Bal is a corporate partner with 16 years advising on private equity, merger transactions, and public offerings for companies and investors at three of the world's top law firms. He has represented clients in hundreds of transactions with aggregate deal value exceeding $60 billion across AI, semiconductors, fintech, and emerging technology. For more information and to get in touch, visit gurpreetbal.com.