How large a model can the Ryzen AI Max 400 actually hold?

With up to 192GB of unified memory and as much as 160GB addressable as VRAM, AMD says it can load a roughly 300B-parameter model quantized to FP4 — which it bills as the first single non-Mac-Studio SoC able to do so.

How does it stack up against Nvidia's DGX Spark on price and throughput?

AMD's Ryzen AI Halo dev box opens preorders in June at $3,999 versus roughly $4,700 for DGX Spark. AMD claims up to 14% higher tokens/sec on GLM 4.7 Flash 30B under Linux, but Spark's higher raw compute still wins prompt processing and time-to-first-token at long context.

What's the memory bandwidth, and is this a serving backend?

Bandwidth is 273 GB/s via LPDDR5X-8533 — generous for a client part but an order of magnitude below an HBM data-center GPU. It's a single-user inference and prototyping box, not a backend for concurrent production traffic.

AMD's Gorgon Halo Pushes 192GB Unified Memory, Runs 300B-Parameter Models on One Chip

AMD has detailed the Ryzen AI Max 400 series — codenamed "Gorgon Halo" — a refresh of its Strix Halo platform that pushes a single client SoC to 192GB of unified LPDDR5X memory, enough to hold a 300-billion-parameter model in FP4 without a discrete GPU or a cloud instance. AMD frames it as the first x86 client part capable of running a 300B+ LLM locally; until now, the only single-box alternative for that was Apple's Mac Studio.

The memory math is the headline

For local inference, capacity and bandwidth — not raw FLOPS — are usually the binding constraint, and that's where this refresh moves:

192GB unified memory (LPDDR5X-8533), with up to 160GB allocatable as VRAM and 32GB reserved for the OS.
273 GB/s of bandwidth, about 7% higher than the 300-series, courtesy of new 24GB (192Gbit) LPDDR5X modules.
A single contiguous pool that holds both model weights and the KV cache — the reason a sub-$4K box can suddenly fit models that previously demanded multi-GPU servers.

The silicon underneath

The stack launches with three PRO SKUs: the Ryzen AI Max+ PRO 495 (16 Zen 5 cores / 32 threads, up to 5.2 GHz), the Max PRO 490 (12 cores, 5.0 GHz), and the Max PRO 485 (8 cores, 5.0 GHz). The flagship pairs that with an RDNA 3.5 integrated GPU running up to 40 compute units at 3.0 GHz, and an XDNA 2 NPU rated at up to 55 TOPS, up from 50 on the outgoing Max+ 395.

A direct shot at DGX Spark

This is AMD's first-party answer to Nvidia's DGX Spark. The Ryzen AI Halo developer box opens preorders in June at $3,999 — initially shipping with the prior-gen Max+ 395 and 128GB — undercutting Spark's roughly $4,700 (GB10, 128GB, 4TB, Linux-only). AMD's own benchmarks claim up to 14% higher tokens/sec than Spark on GLM 4.7 Flash (30B) and up to 4% on Qwen 3.6 (35B) under Linux, with ROCm support and both Linux and Windows. The trade-off: Spark's higher raw compute still produces faster prompt processing and lower time-to-first-token, especially as context length grows.

What changes for builders

OEM systems from ASUS, HP, and Lenovo are slated for Q3 2026. For teams running inference at the edge or on-prem — privacy-sensitive or air-gapped workloads, or anyone trying to escape per-token cloud bills — a box that holds a 300B model for under $4,000 reshapes the build-vs-rent math. The caveat is bandwidth: 273 GB/s is strong for a client chip but far below an HBM accelerator, so Gorgon Halo is a single-user inference and dev-prototyping tool, not a serving tier for concurrent traffic.

AMD's Gorgon Halo Pushes 192GB Unified Memory, Runs 300B-Parameter Models on One Chip

The memory math is the headline

The silicon underneath

A direct shot at DGX Spark

What changes for builders

More in Industry

AMD Bets More Than $10B on Taiwan's Chip Ecosystem to Scale AI Packaging and Helios Racks

Standard Chartered Puts a Number on AI: 7,000+ Back-Office Jobs Gone by 2030

Samsung Averts HBM4 Strike With Tentative Bonus Deal as Union Vote Tops 74% on Day One