Alibaba's semiconductor unit T-Head unveiled the Zhenwu M890, an in-house AI accelerator it claims runs agentic inference workloads roughly three times faster than Nvidia's H20 — the throttled Hopper part Washington still permits for export to China. It arrives paired with Qwen 3.7-Max, a model Alibaba says can run autonomously for up to 35 hours and over 1,000 tool calls without performance degradation. BABA shares slipped premarket on the news, which landed just ahead of Nvidia's quarterly earnings.
The specs that matter
The M890 is built on T-Head's in-house PPU (Parallel Processing Unit) architecture with a Transformer core engine. The headline figures:
- 0.6 PFLOPs FP16 compute — roughly A100-class raw throughput
- 144GB of HBM3, a 50% jump over the prior Zhenwu 810E's 96GB
- 800 GB/s interchip bandwidth
Read the 3x-H20 claim carefully: it targets agentic inference, where memory capacity and interconnect — not peak FLOPs — are the binding constraint. Nvidia engineered the H20 with deliberately limited compute and bandwidth to clear export rules. The M890's 144GB of HBM3 lets it hold a larger KV cache and longer context per accelerator, which is exactly what long-horizon agents starve for.
A real deployment, not a paper launch
T-Head says it has shipped over 560,000 Zhenwu units to date, with 400+ external customers across 20 industries, including automakers and financial-services firms. The M890 reaches developers through Alibaba Cloud's Bailian platform and the Panjiu AL128 server — 128 M890 accelerators per rack.
A roadmap built to outpace the cadence
Alibaba published a multi-year line: the V900 in Q3 2027, promising another ~3x gain over the M890, and the J900 in Q3 2028. That is an aggressive annual rhythm aimed squarely at Nvidia's release cycle.
Why the timing
The launch is a self-sufficiency statement. US export controls have restricted advanced American silicon to Chinese entities since 2022, and the Trump administration tightened them again in April 2025 to block even China-market parts like the H20. Pairing a domestic accelerator with a domestic frontier model gives Chinese builders a vertically integrated stack — model, chip, and cloud — that no longer depends on a throttled or smuggled Nvidia GPU.
What changes for builders
For teams operating inside China or hedging against export risk, the M890 plus Qwen 3.7-Max is a credible escape hatch from Nvidia dependence. The KV-cache headroom and the 35-hour autonomous-run claim map directly onto the long-horizon agent workloads that blow up memory budgets on current inference fleets. Treat the 3x figure as a vendor benchmark, though — validate it against your own agent traces and tool-calling patterns before re-architecting around it.



