AI Company Due Diligence in 2026: The Checklist Has Changed

By Gurpreet S. Bal, Silicon Valley M&A and Technology Partner
Technology acquisition due diligence has a standard rhythm: intellectual property ownership, employee IP assignments, software licenses, open source usage, material contracts, regulatory compliance. That checklist was built for software companies whose primary risk exposure sits in conventional IP and employment law. AI companies in 2026 have a fundamentally different risk profile — and the standard checklist doesn't address it. Gurpreet S. Bal, who has led due diligence on AI acquisitions across enterprise software, infrastructure, and applied AI categories, describes the current state of practice candidly: "The companies being acquired today weren't building the same things five years ago. The diligence framework has to keep up." Gurpreet is a corporate partner representing investors and companies in fundraising and exit transactions, and is known for a straightforward, cut-to-the-chase approach in dealings with clients and counterparties. The expanded AI due diligence checklist includes categories that didn't exist as deal risk factors when most practitioners were trained — and the omissions are showing up in post-closing indemnification claims.

Why is training data provenance the M&A due diligence risk you can't find in the source code?

Training data provenance risk is invisible in standard code review because the risk is not in the code — it is in the dataset that trained the model. Web-scraped training data, licensed datasets with commercial use restrictions, and data contributed by users under ambiguous terms of service all create potential copyright and contractual liability that cannot be detected by reviewing the model architecture or inference code. The only way to assess this risk is through a dedicated training data audit.

The most important new category in AI company due diligence is training data provenance — and it's the category that receives the least attention from practitioners who learned diligence on conventional software companies. Training data drives AI model quality and, increasingly, AI model legal exposure. Data scraped from the web without proper licensing, data acquired through terms-of-service violations, data that includes personal information subject to GDPR or CCPA restrictions, and data sourced from third parties without clear rights to sublicense — each creates a different category of post-closing liability. Gurpreet S. Bal identifies this as the area where recent AI transactions are generating unexpected risk: "I've had deals where the biggest risk wasn't in the source code — it was in the training data. That's new." A proper training data audit requires understanding how each training dataset was acquired, what license or terms governed its use, and whether those rights survive the acquisition.

How does open source contamination in model weights create M&A deal risk?

If a model was fine-tuned using code or data subject to a copyleft open source license — such as GPL or certain Creative Commons licenses — the resulting model weights may be subject to the same license terms, requiring the acquirer to make the weights available under those open source terms. This can destroy the commercial value of the acquired model entirely. The contamination may have occurred at any point in the training pipeline and may not be documented in the company's records.

Traditional open source diligence focuses on identifying copyleft-licensed code in a company's software stack and assessing the infection risk to proprietary code. AI model diligence requires a different analysis. Foundation models built on open source architectures, fine-tuned using open source datasets, or trained using open source tooling may carry open source license obligations that affect the acquirer's ability to use or commercialize the resulting model. In 2026, several foundation model licenses include restrictions that go beyond standard open source terms — usage restrictions, prohibited-use categories, and commercial licensing requirements. An AI company that has built on a foundation model licensed under one of these restricted licenses may be transferring obligations the acquirer didn't anticipate. This requires a full inventory of every model component and its governing license before the acquisition closes.

How does EU AI Act compliance status affect tech M&A deal risk?

The EU AI Act classifies AI systems by risk level and imposes compliance obligations that attach to the system and its deployer. An AI system that has not completed the required conformity assessment for high-risk classification cannot legally be deployed in the EU after the applicable compliance deadline. An acquirer who buys a non-compliant AI system inherits the compliance obligation and the regulatory risk — including potential market access prohibition — of bringing the system into compliance post-closing.

As of 2026, the EU AI Act is in active enforcement for high-risk AI system categories, and companies with European operations or European customers are in various stages of compliance readiness. AI company acquisitions now require specific diligence on EU AI Act compliance status: whether the target's AI systems are classified as high-risk, prohibited, or general-purpose under the Act; what conformity assessment requirements apply; and what the state of compliance documentation looks like. Gurpreet S. Bal treats EU AI Act compliance as a material risk factor in any AI acquisition with European exposure — both because non-compliance creates regulatory liability and because remediating non-compliance post-closing can be unexpectedly expensive and time-consuming.

What does the expanded 2026 AI due diligence checklist require?

A comprehensive 2026 AI due diligence checklist covers: training data sourcing and licensing documentation for every dataset used at every training stage, foundation model license terms and compliance status, EU AI Act risk classification and conformity assessment status, model cards and performance benchmarks, known bias assessments and mitigation measures, AI incident history, and third-party audit reports where available. This checklist is materially longer than what most sellers have prepared, which creates significant diligence friction.

The 2026 AI due diligence checklist that Gurpreet S. Bal uses in practice extends to several additional categories that are now standard. Compute contract assignability: large GPU reservation agreements and cloud AI credits often include assignment restrictions and change-of-control provisions that must be reviewed before the deal closes. AI governance policies: acquirers are increasingly requiring documentation of the target's internal AI governance framework — what review processes exist for model deployment, what incident history exists, and what internal policies govern acceptable AI use cases. AI incident history: any prior model failures, harmful outputs, or regulatory inquiries should be treated as material disclosure items. Key person dependency: AI companies often have extreme talent concentration in one or two researchers whose departure would materially affect the acquired technology. Each of these categories requires specific diligence requests, document review protocols, and risk quantification — the standard technology diligence template is not sufficient.

Further reading: Due Diligence Checklist for Technology Acquisitions — a comprehensive technology acquisition due diligence framework covering IP, employment, contracts, regulatory compliance, and financial risk areas.
If you are evaluating counsel for this type of matter: How to Find a Sell-Side M&A Lawyer for a Technology Company
On choosing legal counsel generally: Considerations for Founders and Companies Raising Money or Selling  ·  gurpreetbal.com

Gurpreet S. Bal is a corporate partner with 16 years advising on private equity, merger transactions, and public offerings for companies and investors at three of the world's top law firms. He has represented clients in hundreds of transactions with aggregate deal value exceeding $60 billion across AI, semiconductors, fintech, and emerging technology. For more information and to get in touch, visit gurpreetbal.com.