What is CAISI and who runs it?

The Center for AI Standards and Innovation (CAISI) sits within the National Institute of Standards and Technology at the U.S. Department of Commerce. It is led by Director Chris Fall and was reorganized from the former AI Safety Institute under the Trump administration's AI Action Plan.

Will the government see AI models before they launch?

Yes. The new agreements give CAISI pre-deployment access to frontier models from Microsoft, Google DeepMind, and xAI, with provisions for testing in classified environments and access to versions with reduced or removed safeguards for national security evaluation.

How does this affect OpenAI and Anthropic?

OpenAI and Anthropic renegotiated their existing CAISI partnerships to align with the secretary of commerce's directives and the America's AI Action Plan, putting the largest U.S. labs on a similar evaluation footing.

CAISI Signs Frontier AI Testing Pacts With Microsoft, Google DeepMind, and xAI

The U.S. government has formalized one of the most consequential pieces of its AI oversight framework. The Center for AI Standards and Innovation (CAISI), housed inside the National Institute of Standards and Technology at the Department of Commerce, announced on May 5 that it has signed agreements with Google DeepMind, Microsoft, and xAI to test their frontier models for national security risks before those systems reach the public.

The agreements give CAISI pre-deployment access to frontier AI systems, post-deployment assessment rights, and the ability to run evaluations inside classified environments. Crucially, developers will provide CAISI with model variants that have reduced or removed safeguards, so evaluators can probe what these systems are actually capable of when stripped of guardrails.

A reorganized federal AI oversight stack

CAISI was reconstituted under the Trump administration's AI Action Plan from the former U.S. AI Safety Institute, with a tighter focus on national security and international competition rather than broad safety. The new agreements with Microsoft, Google DeepMind, and xAI build on earlier 2024 partnerships with Anthropic and OpenAI, both of which have now been renegotiated to reflect CAISI's updated directives from the secretary of commerce.

"Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications," CAISI Director Chris Fall said in the announcement.

According to NIST, CAISI has already completed more than 40 evaluations to date, including assessments of unreleased state-of-the-art models. Evaluators from multiple federal agencies participate through the TRAINS Taskforce, an interagency body that pools expertise across the national security community.

What the labs are agreeing to

The scope of the agreements goes beyond the usual red-teaming exercises. The signed terms cover:

Pre-deployment evaluations of new models before public release
Post-deployment assessments and longer-running research
Classified testing environments for sensitive capability probes
Reduced-safeguard model access for adversarial evaluation
Information-sharing and voluntary product improvements based on findings

The wording was deliberately drafted to remain flexible enough to keep up with the cadence of frontier model releases, which have been compressing from yearly to monthly across the major labs.

Implications for the AI industry

The news effectively standardizes pre-launch government testing across five of the most capable U.S. AI developers: Microsoft, Google DeepMind, xAI, OpenAI, and Anthropic. For enterprises building on these models, that creates a new layer of assurance — and a new bottleneck — between a model's training cutoff and its general availability.

One notable absence in the announcement is Meta, whose open-weight Llama models sit outside the same release-and-test cadence. The CAISI framework is built around closed labs that can hold a model back for federal evaluation; how it adapts to open-weight releases, where the model is the deliverable, remains an open question.

The agreements arrive amid a tense period for the federal AI portfolio — including a separate court fight over Anthropic's Pentagon blacklist — and signal that the Trump administration's preferred posture is structured collaboration with U.S. labs over either laissez-faire or hard regulation.

— Michael Ouroumis

CAISI Signs Frontier AI Testing Pacts With Microsoft, Google DeepMind, and xAI

A reorganized federal AI oversight stack

What the labs are agreeing to

Implications for the AI industry

More in Policy

White House Weighs AI Pre-Release Vetting in Sharp Reversal Driven by Anthropic's Mythos

Chinese Court Rules Firing Workers to Replace Them With AI Is Illegal

NIST CAISI: China's DeepSeek V4 Pro Trails US Frontier AI by Eight Months