How far behind is China in AI, according to the US government?

Roughly eight months. CAISI says DeepSeek V4 Pro — the most capable PRC model it has evaluated — performs similarly to GPT-5, which was released about eight months earlier.

Which benchmarks did CAISI use to evaluate DeepSeek V4 Pro?

Tests spanned cyber (CTF-Archive-Diamond), software engineering (SWE-Bench Verified, PortBench), science (GPQA-Diamond, FrontierScience), abstract reasoning (ARC-AGI-2 semi-private) and math (OTIS-AIME-2025, PUMaC 2024, SMT 2025).

Is DeepSeek actually cheaper than US AI models?

On five of the seven benchmarks tested, DeepSeek V4 Pro was more cost-efficient than GPT-5.4 mini, ranging from 53% cheaper to 41% more expensive depending on the task.

NIST CAISI: China's DeepSeek V4 Pro Trails US Frontier AI by Eight Months

The US government has put a number on China's AI gap — and it is shrinking, but still real. The Center for AI Standards and Innovation (CAISI), housed at the National Institute of Standards and Technology, published its May 2026 evaluation of DeepSeek V4 Pro on Friday, concluding that the open-weight model lags the US frontier by approximately eight months while continuing to undercut American models on cost.

The report has rippled through industry coverage today, with The Decoder framing it bluntly as evidence that "China is falling behind in the AI race" by a US government benchmark — even as independent analysts question whether an eight-month gap meaningfully constrains Chinese deployment plans.

Eight Months Behind the Frontier

CAISI describes DeepSeek V4 Pro as the most capable PRC AI model it has evaluated to date. Across a broad benchmark suite, however, the model performs similarly to GPT-5, which shipped roughly eight months ago. That places it well below current frontier US systems such as GPT-5.5 and Claude Opus 4.6 on the agency's reasoning, science, and agent-style tasks.

The evaluation covered nine public and semi-private benchmarks: ARC-AGI-2 semi-private for abstract reasoning; CTF-Archive-Diamond for cybersecurity; PortBench and SWE-Bench Verified for software engineering; FrontierScience and GPQA-Diamond for natural sciences; and OTIS-AIME-2025, PUMaC 2024 and SMT 2025 for mathematics. CAISI also notes a discrepancy worth flagging for buyers: DeepSeek's own published evaluations portray V4 Pro as competitive with frontier models, while CAISI's independent runs show clearer gaps on reasoning and agentic workloads.

A Cost Advantage That Won't Disappear

The more uncomfortable finding for US labs is on price. Compared with GPT-5.4 mini — Washington's chosen cost-efficient reference model — DeepSeek V4 Pro was more cost-efficient on five of the seven benchmarks where pricing was compared, with task-level results ranging from 53% cheaper to 41% more expensive.

That gap matters for downstream developers choosing between a frontier-class US model and a slightly older but markedly cheaper open-weight alternative they can self-host. A capability deficit measured in months can be acceptable when paired with substantially lower inference costs and the freedom to deploy weights on private infrastructure.

Implications for Export Controls and the AI Race

The May 2026 report builds on CAISI's September 2025 evaluation of DeepSeek's R1, R1-0528, and V3.1 models, which had highlighted concerns around jailbreak susceptibility and the propagation of CCP-aligned narratives. The new V4 Pro analysis focuses on capability and cost rather than safety, and stops short of policy recommendations.

Still, the framing is unmistakable. By publishing a quantified, government-branded measure of the US lead — eight months, narrowing — CAISI is giving the Trump administration, Congress, and the AI industry a shared yardstick to argue over export controls, federal procurement, and the pace of frontier model releases. For Chinese labs, the takeaway is more direct: closing the gap is now an explicit US benchmark, not a vibes contest.

NIST CAISI: China's DeepSeek V4 Pro Trails US Frontier AI by Eight Months

Eight Months Behind the Frontier

A Cost Advantage That Won't Disappear

Implications for Export Controls and the AI Race

More in Policy

Academy Rules AI-Generated Actors and Screenplays Ineligible for Oscars

Pentagon CTO Calls Anthropic's Mythos a 'National Security Moment' Even as the Company Stays Blacklisted

OpenAI Staff Begged Altman to Call Police Before Tumbler Ridge Shooting, Lawsuits Allege