The US government has put a number on China's AI gap — and it is shrinking, but still real. The Center for AI Standards and Innovation (CAISI), housed at the National Institute of Standards and Technology, published its May 2026 evaluation of DeepSeek V4 Pro on Friday, concluding that the open-weight model lags the US frontier by approximately eight months while continuing to undercut American models on cost.
The report has rippled through industry coverage today, with The Decoder framing it bluntly as evidence that "China is falling behind in the AI race" by a US government benchmark — even as independent analysts question whether an eight-month gap meaningfully constrains Chinese deployment plans.
Eight Months Behind the Frontier
CAISI describes DeepSeek V4 Pro as the most capable PRC AI model it has evaluated to date. Across a broad benchmark suite, however, the model performs similarly to GPT-5, which shipped roughly eight months ago. That places it well below current frontier US systems such as GPT-5.5 and Claude Opus 4.6 on the agency's reasoning, science, and agent-style tasks.
The evaluation covered nine public and semi-private benchmarks: ARC-AGI-2 semi-private for abstract reasoning; CTF-Archive-Diamond for cybersecurity; PortBench and SWE-Bench Verified for software engineering; FrontierScience and GPQA-Diamond for natural sciences; and OTIS-AIME-2025, PUMaC 2024 and SMT 2025 for mathematics. CAISI also notes a discrepancy worth flagging for buyers: DeepSeek's own published evaluations portray V4 Pro as competitive with frontier models, while CAISI's independent runs show clearer gaps on reasoning and agentic workloads.
A Cost Advantage That Won't Disappear
The more uncomfortable finding for US labs is on price. Compared with GPT-5.4 mini — Washington's chosen cost-efficient reference model — DeepSeek V4 Pro was more cost-efficient on five of the seven benchmarks where pricing was compared, with task-level results ranging from 53% cheaper to 41% more expensive.
That gap matters for downstream developers choosing between a frontier-class US model and a slightly older but markedly cheaper open-weight alternative they can self-host. A capability deficit measured in months can be acceptable when paired with substantially lower inference costs and the freedom to deploy weights on private infrastructure.
Implications for Export Controls and the AI Race
The May 2026 report builds on CAISI's September 2025 evaluation of DeepSeek's R1, R1-0528, and V3.1 models, which had highlighted concerns around jailbreak susceptibility and the propagation of CCP-aligned narratives. The new V4 Pro analysis focuses on capability and cost rather than safety, and stops short of policy recommendations.
Still, the framing is unmistakable. By publishing a quantified, government-branded measure of the US lead — eight months, narrowing — CAISI is giving the Trump administration, Congress, and the AI industry a shared yardstick to argue over export controls, federal procurement, and the pace of frontier model releases. For Chinese labs, the takeaway is more direct: closing the gap is now an explicit US benchmark, not a vibes contest.



