Back to stories
Policy

CAISI Signs Frontier AI Testing Pacts With Microsoft, Google DeepMind, and xAI

Michael Ouroumis3 min read
CAISI Signs Frontier AI Testing Pacts With Microsoft, Google DeepMind, and xAI

The U.S. government has formalized one of the most consequential pieces of its AI oversight framework. The Center for AI Standards and Innovation (CAISI), housed inside the National Institute of Standards and Technology at the Department of Commerce, announced on May 5 that it has signed agreements with Google DeepMind, Microsoft, and xAI to test their frontier models for national security risks before those systems reach the public.

The agreements give CAISI pre-deployment access to frontier AI systems, post-deployment assessment rights, and the ability to run evaluations inside classified environments. Crucially, developers will provide CAISI with model variants that have reduced or removed safeguards, so evaluators can probe what these systems are actually capable of when stripped of guardrails.

A reorganized federal AI oversight stack

CAISI was reconstituted under the Trump administration's AI Action Plan from the former U.S. AI Safety Institute, with a tighter focus on national security and international competition rather than broad safety. The new agreements with Microsoft, Google DeepMind, and xAI build on earlier 2024 partnerships with Anthropic and OpenAI, both of which have now been renegotiated to reflect CAISI's updated directives from the secretary of commerce.

"Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications," CAISI Director Chris Fall said in the announcement.

According to NIST, CAISI has already completed more than 40 evaluations to date, including assessments of unreleased state-of-the-art models. Evaluators from multiple federal agencies participate through the TRAINS Taskforce, an interagency body that pools expertise across the national security community.

What the labs are agreeing to

The scope of the agreements goes beyond the usual red-teaming exercises. The signed terms cover:

The wording was deliberately drafted to remain flexible enough to keep up with the cadence of frontier model releases, which have been compressing from yearly to monthly across the major labs.

Implications for the AI industry

The news effectively standardizes pre-launch government testing across five of the most capable U.S. AI developers: Microsoft, Google DeepMind, xAI, OpenAI, and Anthropic. For enterprises building on these models, that creates a new layer of assurance — and a new bottleneck — between a model's training cutoff and its general availability.

One notable absence in the announcement is Meta, whose open-weight Llama models sit outside the same release-and-test cadence. The CAISI framework is built around closed labs that can hold a model back for federal evaluation; how it adapts to open-weight releases, where the model is the deliverable, remains an open question.

The agreements arrive amid a tense period for the federal AI portfolio — including a separate court fight over Anthropic's Pentagon blacklist — and signal that the Trump administration's preferred posture is structured collaboration with U.S. labs over either laissez-faire or hard regulation.

— Michael Ouroumis

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Policy

Bessent: US and China to Launch Formal AI Safety Talks at Trump–Xi Beijing Summit
Policy

Bessent: US and China to Launch Formal AI Safety Talks at Trump–Xi Beijing Summit

Treasury Secretary Scott Bessent announced from Beijing that the US and China — the world's 'two AI superpowers' — will begin formal talks on a joint AI safety protocol aimed at keeping the most powerful models out of non-state actors' hands.

19 hours ago3 min read
US Clears Nvidia H200 Sales to 10 Chinese Firms — But Beijing Is Blocking Deliveries
Policy

US Clears Nvidia H200 Sales to 10 Chinese Firms — But Beijing Is Blocking Deliveries

The U.S. Commerce Department has approved roughly 10 Chinese companies — including Alibaba, Tencent, ByteDance and JD.com — to buy Nvidia's H200 AI chips, but no deliveries have moved as Beijing keeps domestic firms on hold.

22 hours ago3 min read
OpenAI Opens GPT-5.5-Cyber to Europe With New EU Cyber Action Plan
Policy

OpenAI Opens GPT-5.5-Cyber to Europe With New EU Cyber Action Plan

OpenAI said it will give European businesses, governments and EU institutions access to GPT-5.5 and GPT-5.5-Cyber through a new EU Cyber Action Plan — while the European Commission says it still has no agreement to review Anthropic's Mythos model.

3 days ago2 min read