California Governor Gavin Newsom signed AB-1892, the AI Training Data Transparency Act, into law on Monday. The bill makes California the first US state to require AI companies to disclose detailed information about the data used to train commercially deployed models. It takes effect January 1, 2027.
What the Law Requires
Any company operating an AI model commercially in California must publish a training data summary covering four categories:
- Data domains — A breakdown of source types used in training, including web crawls, books, code repositories, social media posts, academic papers, and other categories
- Copyrighted material — Whether the training data included copyrighted works, and if so, a description of the types of works and how they were obtained
- Personal data — Whether personally identifiable information was present in the training data, and what steps were taken to filter or anonymize it
- Time range — The date range of the data used, from earliest to most recent sources
The summaries must be published on the company's website and updated within 90 days of any new model deployment. The California Attorney General's office will enforce the law, with fines of up to $10,000 per day of non-compliance.
Who Backed It and Who Fought It
AB-1892 was introduced by Assemblymember Rebecca Bauer-Kahan and co-sponsored by a coalition of publishers, authors' guilds, and artists' organizations. Supporters argued that without mandatory disclosure, there is no way to hold AI companies accountable for using copyrighted work or personal data without consent.
OpenAI and Meta lobbied against the bill throughout its passage through the legislature. Both companies argued that detailed training data disclosures constitute trade secrets and could undermine their competitive positions. Meta's head of public policy called the requirements "an invitation for competitors to reverse-engineer proprietary training pipelines."
Anthropic took a neutral stance, with a spokesperson noting that the company already publishes model cards and system documentation. "We support the principle of transparency and believe the industry benefits from clear standards," the spokesperson said, while adding that some technical details of the disclosure requirements need refinement.
National Implications
The law is the first of its kind in the United States. While the EU AI Act includes training data documentation requirements, no US federal law currently addresses the issue. Three similar bills are pending in Congress, and legislators in New York, Illinois, and Washington have introduced state-level versions modeled on AB-1892.
The copyright dimension is particularly significant. Multiple lawsuits from publishers, visual artists, and musicians against AI companies remain unresolved in federal courts. California's disclosure mandate does not resolve the question of whether training on copyrighted material constitutes fair use, but it ensures that the underlying facts — what data was actually used — become part of the public record.
Whether AB-1892 survives the inevitable legal challenges may determine how quickly training data transparency becomes a national standard.



