AMD has detailed the Ryzen AI Max 400 series — codenamed "Gorgon Halo" — a refresh of its Strix Halo platform that pushes a single client SoC to 192GB of unified LPDDR5X memory, enough to hold a 300-billion-parameter model in FP4 without a discrete GPU or a cloud instance. AMD frames it as the first x86 client part capable of running a 300B+ LLM locally; until now, the only single-box alternative for that was Apple's Mac Studio.
The memory math is the headline
For local inference, capacity and bandwidth — not raw FLOPS — are usually the binding constraint, and that's where this refresh moves:
- 192GB unified memory (LPDDR5X-8533), with up to 160GB allocatable as VRAM and 32GB reserved for the OS.
- 273 GB/s of bandwidth, about 7% higher than the 300-series, courtesy of new 24GB (192Gbit) LPDDR5X modules.
- A single contiguous pool that holds both model weights and the KV cache — the reason a sub-$4K box can suddenly fit models that previously demanded multi-GPU servers.
The silicon underneath
The stack launches with three PRO SKUs: the Ryzen AI Max+ PRO 495 (16 Zen 5 cores / 32 threads, up to 5.2 GHz), the Max PRO 490 (12 cores, 5.0 GHz), and the Max PRO 485 (8 cores, 5.0 GHz). The flagship pairs that with an RDNA 3.5 integrated GPU running up to 40 compute units at 3.0 GHz, and an XDNA 2 NPU rated at up to 55 TOPS, up from 50 on the outgoing Max+ 395.
A direct shot at DGX Spark
This is AMD's first-party answer to Nvidia's DGX Spark. The Ryzen AI Halo developer box opens preorders in June at $3,999 — initially shipping with the prior-gen Max+ 395 and 128GB — undercutting Spark's roughly $4,700 (GB10, 128GB, 4TB, Linux-only). AMD's own benchmarks claim up to 14% higher tokens/sec than Spark on GLM 4.7 Flash (30B) and up to 4% on Qwen 3.6 (35B) under Linux, with ROCm support and both Linux and Windows. The trade-off: Spark's higher raw compute still produces faster prompt processing and lower time-to-first-token, especially as context length grows.
What changes for builders
OEM systems from ASUS, HP, and Lenovo are slated for Q3 2026. For teams running inference at the edge or on-prem — privacy-sensitive or air-gapped workloads, or anyone trying to escape per-token cloud bills — a box that holds a 300B model for under $4,000 reshapes the build-vs-rent math. The caveat is bandwidth: 273 GB/s is strong for a client chip but far below an HBM accelerator, so Gorgon Halo is a single-user inference and dev-prototyping tool, not a serving tier for concurrent traffic.



