Chinese AI lab DeepSeek on Friday released preview versions of DeepSeek-V4-Pro and DeepSeek-V4-Flash, the long-awaited follow-up to the reasoning model that rattled Silicon Valley in January 2025. The Hangzhou-based firm published both models under an MIT license with open weights, pairing a 1 million token context window with pricing that lands well below Western frontier providers.
The release lands almost exactly a year after DeepSeek's earlier R1 model triggered a global reassessment of how much capital and compute a competitive frontier model actually requires. With V4, the company is again betting that efficiency and openness, not just raw capability, are the pressure points on the incumbents.
Two sizes, one architecture
V4-Pro carries 1.6 trillion total parameters with roughly 49 billion active per token, using a mixture-of-experts design. V4-Flash is a leaner 284 billion parameters with about 13 billion active. Both share the same 1 million token context and are available as open weights, with published file sizes of 865GB for Pro and 160GB for Flash.
Performance, per DeepSeek's own disclosures, puts V4-Pro at the head of the open-weight pack — ahead of recent releases such as Kimi K2.6 and GLM-5.1 — while trailing closed frontier systems like GPT-5.4 and Gemini 3.1-Pro by what the team characterizes as roughly three to six months. The model "significantly leads other open-source models" in world-knowledge benchmarks, the company said in its WeChat announcement, with Gemini-Pro-3.1 the primary exception.
Aggressive pricing and agent focus
DeepSeek is charging $0.14 per million input tokens and $0.28 per million output tokens for V4-Flash, and $1.74 / $3.48 per million for V4-Pro. That positions Flash as one of the cheapest serving options for a long-context model at this capability tier.
The company explicitly optimized the models for agentic coding tools, naming Claude Code, OpenClaw, OpenCode, and CodeBuddy as targets for compatibility. DeepSeek also reports large efficiency gains versus V3.2: V4-Pro uses about 27% of the per-token compute and 10% of the KV-cache footprint at 1M-token contexts, with V4-Flash cutting further to roughly 10% and 7%.
Why it matters
The preview ships into a market where Western frontier labs have leaned on ever-larger capital commitments — multi-gigawatt data centers, multi-billion-dollar chip deals — to justify premium pricing. An open-weight model with a 1M context and sub-dollar per-million-token economics reframes that conversation, particularly for enterprise agent workloads where context length and unit cost dominate the bill.
It also reinforces a trend captured in Stanford's 2026 AI Index, which documented that the measured performance gap between leading Chinese and US models has narrowed to a few percentage points. DeepSeek's decision to release V4 openly, rather than behind a closed API, ensures the pressure propagates: every competitor that prices against proprietary frontier models now has a credible, downloadable alternative to benchmark against.
A final V4 release date was not disclosed.



