Back to stories
Policy

CAISI Signs Frontier AI Testing Pacts With Microsoft, Google DeepMind, and xAI

Michael Ouroumis3 min read
CAISI Signs Frontier AI Testing Pacts With Microsoft, Google DeepMind, and xAI

The U.S. government has formalized one of the most consequential pieces of its AI oversight framework. The Center for AI Standards and Innovation (CAISI), housed inside the National Institute of Standards and Technology at the Department of Commerce, announced on May 5 that it has signed agreements with Google DeepMind, Microsoft, and xAI to test their frontier models for national security risks before those systems reach the public.

The agreements give CAISI pre-deployment access to frontier AI systems, post-deployment assessment rights, and the ability to run evaluations inside classified environments. Crucially, developers will provide CAISI with model variants that have reduced or removed safeguards, so evaluators can probe what these systems are actually capable of when stripped of guardrails.

A reorganized federal AI oversight stack

CAISI was reconstituted under the Trump administration's AI Action Plan from the former U.S. AI Safety Institute, with a tighter focus on national security and international competition rather than broad safety. The new agreements with Microsoft, Google DeepMind, and xAI build on earlier 2024 partnerships with Anthropic and OpenAI, both of which have now been renegotiated to reflect CAISI's updated directives from the secretary of commerce.

"Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications," CAISI Director Chris Fall said in the announcement.

According to NIST, CAISI has already completed more than 40 evaluations to date, including assessments of unreleased state-of-the-art models. Evaluators from multiple federal agencies participate through the TRAINS Taskforce, an interagency body that pools expertise across the national security community.

What the labs are agreeing to

The scope of the agreements goes beyond the usual red-teaming exercises. The signed terms cover:

The wording was deliberately drafted to remain flexible enough to keep up with the cadence of frontier model releases, which have been compressing from yearly to monthly across the major labs.

Implications for the AI industry

The news effectively standardizes pre-launch government testing across five of the most capable U.S. AI developers: Microsoft, Google DeepMind, xAI, OpenAI, and Anthropic. For enterprises building on these models, that creates a new layer of assurance — and a new bottleneck — between a model's training cutoff and its general availability.

One notable absence in the announcement is Meta, whose open-weight Llama models sit outside the same release-and-test cadence. The CAISI framework is built around closed labs that can hold a model back for federal evaluation; how it adapts to open-weight releases, where the model is the deliverable, remains an open question.

The agreements arrive amid a tense period for the federal AI portfolio — including a separate court fight over Anthropic's Pentagon blacklist — and signal that the Trump administration's preferred posture is structured collaboration with U.S. labs over either laissez-faire or hard regulation.

— Michael Ouroumis

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Policy

White House Weighs AI Pre-Release Vetting in Sharp Reversal Driven by Anthropic's Mythos
Policy

White House Weighs AI Pre-Release Vetting in Sharp Reversal Driven by Anthropic's Mythos

The Trump administration is drafting an executive order to create a working group that would review frontier AI models before public release, with Anthropic's unreleased Claude Mythos cited as the catalyst.

1 day ago2 min read
Chinese Court Rules Firing Workers to Replace Them With AI Is Illegal
Policy

Chinese Court Rules Firing Workers to Replace Them With AI Is Illegal

The Hangzhou Intermediate People's Court has ruled that companies cannot lawfully dismiss employees solely to replace them with artificial intelligence, classifying AI adoption as a strategic business choice rather than an unavoidable change in circumstances.

3 days ago3 min read
NIST CAISI: China's DeepSeek V4 Pro Trails US Frontier AI by Eight Months
Policy

NIST CAISI: China's DeepSeek V4 Pro Trails US Frontier AI by Eight Months

NIST's Center for AI Standards and Innovation finds DeepSeek's most capable model performs on par with GPT-5 from eight months ago, but at a meaningful cost advantage on most benchmarks.

3 days ago2 min read