How does the Zhenwu M890 compare to Nvidia's H20?

Alibaba claims roughly 3x the H20's performance on agentic inference. The M890 delivers 0.6 PFLOPs of FP16 compute (about A100-class) and carries 144GB of HBM3 versus the H20's deliberately throttled bandwidth — the edge comes from memory capacity and interconnect, not peak FLOPs.

What is the chip roadmap?

T-Head plans the V900 in Q3 2027, promising another ~3x gain over the M890, followed by the J900 in Q3 2028 — putting Alibaba on a roughly annual accelerator cadence.

Can teams deploy on it today?

Yes, within Alibaba Cloud. It ships via the Bailian platform on Panjiu AL128 racks (128 M890 accelerators each), and T-Head says 400+ external customers across 20 industries already run 560,000+ Zhenwu units.

Alibaba's Zhenwu M890 Claims 3x Nvidia's H20, Ships With Qwen 3.7-Max

Alibaba's semiconductor unit T-Head unveiled the Zhenwu M890, an in-house AI accelerator it claims runs agentic inference workloads roughly three times faster than Nvidia's H20 — the throttled Hopper part Washington still permits for export to China. It arrives paired with Qwen 3.7-Max, a model Alibaba says can run autonomously for up to 35 hours and over 1,000 tool calls without performance degradation. BABA shares slipped premarket on the news, which landed just ahead of Nvidia's quarterly earnings.

The specs that matter

The M890 is built on T-Head's in-house PPU (Parallel Processing Unit) architecture with a Transformer core engine. The headline figures:

0.6 PFLOPs FP16 compute — roughly A100-class raw throughput
144GB of HBM3, a 50% jump over the prior Zhenwu 810E's 96GB
800 GB/s interchip bandwidth

Read the 3x-H20 claim carefully: it targets agentic inference, where memory capacity and interconnect — not peak FLOPs — are the binding constraint. Nvidia engineered the H20 with deliberately limited compute and bandwidth to clear export rules. The M890's 144GB of HBM3 lets it hold a larger KV cache and longer context per accelerator, which is exactly what long-horizon agents starve for.

A real deployment, not a paper launch

T-Head says it has shipped over 560,000 Zhenwu units to date, with 400+ external customers across 20 industries, including automakers and financial-services firms. The M890 reaches developers through Alibaba Cloud's Bailian platform and the Panjiu AL128 server — 128 M890 accelerators per rack.

A roadmap built to outpace the cadence

Alibaba published a multi-year line: the V900 in Q3 2027, promising another ~3x gain over the M890, and the J900 in Q3 2028. That is an aggressive annual rhythm aimed squarely at Nvidia's release cycle.

Why the timing

The launch is a self-sufficiency statement. US export controls have restricted advanced American silicon to Chinese entities since 2022, and the Trump administration tightened them again in April 2025 to block even China-market parts like the H20. Pairing a domestic accelerator with a domestic frontier model gives Chinese builders a vertically integrated stack — model, chip, and cloud — that no longer depends on a throttled or smuggled Nvidia GPU.

What changes for builders

For teams operating inside China or hedging against export risk, the M890 plus Qwen 3.7-Max is a credible escape hatch from Nvidia dependence. The KV-cache headroom and the 35-hour autonomous-run claim map directly onto the long-horizon agent workloads that blow up memory budgets on current inference fleets. Treat the 3x figure as a vendor benchmark, though — validate it against your own agent traces and tool-calling patterns before re-architecting around it.

Alibaba's Zhenwu M890 Claims 3x Nvidia's H20, Ships With Qwen 3.7-Max

The specs that matter

A real deployment, not a paper launch

A roadmap built to outpace the cadence

Why the timing

What changes for builders

More in Industry

Google Ships Antigravity 2.0: A Standalone Agent Platform That Retires the Gemini CLI

Cohere Buys Reliant AI to Build 'North for Pharma' Sovereign Biopharma Agents

Armada Raises $230M at $2B Valuation to Mass-Produce Modular AI Data Centers