What does California's AI Training Data Transparency Act require?

AB-1892 requires all AI companies deploying models commercially in California to publish detailed summaries of their training data, including data domains, whether copyrighted or personal data was used, and the time range of the training data. The law takes effect January 1, 2027.

Which AI companies are affected by California's training data law?

The law applies to any company that deploys AI models commercially in California, which effectively covers every major AI provider. OpenAI and Meta have opposed the law, citing trade secret concerns, while Anthropic has taken a neutral stance.

California Governor Signs AI Training Data Transparency Act Into Law

California Governor Gavin Newsom signed AB-1892, the AI Training Data Transparency Act, into law on Monday. The bill makes California the first US state to require AI companies to disclose detailed information about the data used to train commercially deployed models. It takes effect January 1, 2027.

What the Law Requires

Any company operating an AI model commercially in California must publish a training data summary covering four categories:

Data domains — A breakdown of source types used in training, including web crawls, books, code repositories, social media posts, academic papers, and other categories
Copyrighted material — Whether the training data included copyrighted works, and if so, a description of the types of works and how they were obtained
Personal data — Whether personally identifiable information was present in the training data, and what steps were taken to filter or anonymize it
Time range — The date range of the data used, from earliest to most recent sources

The summaries must be published on the company's website and updated within 90 days of any new model deployment. The California Attorney General's office will enforce the law, with fines of up to $10,000 per day of non-compliance.

Who Backed It and Who Fought It

AB-1892 was introduced by Assemblymember Rebecca Bauer-Kahan and co-sponsored by a coalition of publishers, authors' guilds, and artists' organizations. Supporters argued that without mandatory disclosure, there is no way to hold AI companies accountable for using copyrighted work or personal data without consent.

OpenAI and Meta lobbied against the bill throughout its passage through the legislature. Both companies argued that detailed training data disclosures constitute trade secrets and could undermine their competitive positions. Meta's head of public policy called the requirements "an invitation for competitors to reverse-engineer proprietary training pipelines."

Anthropic took a neutral stance, with a spokesperson noting that the company already publishes model cards and system documentation. "We support the principle of transparency and believe the industry benefits from clear standards," the spokesperson said, while adding that some technical details of the disclosure requirements need refinement.

National Implications

The law is the first of its kind in the United States. While the EU AI Act includes training data documentation requirements, no US federal law currently addresses the issue. Three similar bills are pending in Congress, and legislators in New York, Illinois, and Washington have introduced state-level versions modeled on AB-1892.

The copyright dimension is particularly significant. Multiple lawsuits from publishers, visual artists, and musicians against AI companies remain unresolved in federal courts. California's disclosure mandate does not resolve the question of whether training on copyrighted material constitutes fair use, but it ensures that the underlying facts — what data was actually used — become part of the public record.

Whether AB-1892 survives the inevitable legal challenges may determine how quickly training data transparency becomes a national standard.

California Governor Signs AI Training Data Transparency Act Into Law

What the Law Requires

Who Backed It and Who Fought It

National Implications

More in Policy

Anthropic Outspends OpenAI in Biggest-Ever AI Lobbying Quarter

Anthropic Now Demands Photo ID and Selfie to Block Claude Access From China, Russia, and North Korea

YouTube Opens AI Likeness Detection to Hollywood as Deepfakes Target Celebrities