Back to stories
Industry

AMD's Gorgon Halo Pushes 192GB Unified Memory, Runs 300B-Parameter Models on One Chip

Michael Ouroumis2 min read
AMD's Gorgon Halo Pushes 192GB Unified Memory, Runs 300B-Parameter Models on One Chip

AMD has detailed the Ryzen AI Max 400 series — codenamed "Gorgon Halo" — a refresh of its Strix Halo platform that pushes a single client SoC to 192GB of unified LPDDR5X memory, enough to hold a 300-billion-parameter model in FP4 without a discrete GPU or a cloud instance. AMD frames it as the first x86 client part capable of running a 300B+ LLM locally; until now, the only single-box alternative for that was Apple's Mac Studio.

The memory math is the headline

For local inference, capacity and bandwidth — not raw FLOPS — are usually the binding constraint, and that's where this refresh moves:

The silicon underneath

The stack launches with three PRO SKUs: the Ryzen AI Max+ PRO 495 (16 Zen 5 cores / 32 threads, up to 5.2 GHz), the Max PRO 490 (12 cores, 5.0 GHz), and the Max PRO 485 (8 cores, 5.0 GHz). The flagship pairs that with an RDNA 3.5 integrated GPU running up to 40 compute units at 3.0 GHz, and an XDNA 2 NPU rated at up to 55 TOPS, up from 50 on the outgoing Max+ 395.

A direct shot at DGX Spark

This is AMD's first-party answer to Nvidia's DGX Spark. The Ryzen AI Halo developer box opens preorders in June at $3,999 — initially shipping with the prior-gen Max+ 395 and 128GB — undercutting Spark's roughly $4,700 (GB10, 128GB, 4TB, Linux-only). AMD's own benchmarks claim up to 14% higher tokens/sec than Spark on GLM 4.7 Flash (30B) and up to 4% on Qwen 3.6 (35B) under Linux, with ROCm support and both Linux and Windows. The trade-off: Spark's higher raw compute still produces faster prompt processing and lower time-to-first-token, especially as context length grows.

What changes for builders

OEM systems from ASUS, HP, and Lenovo are slated for Q3 2026. For teams running inference at the edge or on-prem — privacy-sensitive or air-gapped workloads, or anyone trying to escape per-token cloud bills — a box that holds a 300B model for under $4,000 reshapes the build-vs-rent math. The caveat is bandwidth: 273 GB/s is strong for a client chip but far below an HBM accelerator, so Gorgon Halo is a single-user inference and dev-prototyping tool, not a serving tier for concurrent traffic.

Learn AI for Free — FreeAcademy.ai

Take "AI for Business: Practical Implementation" — a free course with certificate to master the skills behind this story.

More in Industry

AMD Bets More Than $10B on Taiwan's Chip Ecosystem to Scale AI Packaging and Helios Racks
Industry

AMD Bets More Than $10B on Taiwan's Chip Ecosystem to Scale AI Packaging and Helios Racks

AMD is committing over $10 billion across Taiwan's semiconductor ecosystem to scale advanced packaging, ramp 2nm Venice EPYC at TSMC, and push its Helios rack-scale platform into hyperscale AI deployments this year.

6 min ago2 min read
Standard Chartered Puts a Number on AI: 7,000+ Back-Office Jobs Gone by 2030
Industry

Standard Chartered Puts a Number on AI: 7,000+ Back-Office Jobs Gone by 2030

Standard Chartered will cut more than 7,000 corporate-function roles — over 15% of its back-office staff — by 2030, becoming the first major global bank to bolt a hard headcount number and deadline directly to AI and automation.

3 hours ago2 min read
Samsung Averts HBM4 Strike With Tentative Bonus Deal as Union Vote Tops 74% on Day One
Industry

Samsung Averts HBM4 Strike With Tentative Bonus Deal as Union Vote Tops 74% on Day One

Samsung Electronics struck a tentative wage-and-bonus deal hours before an 18-day fab strike. Ratification voting runs through May 27, with day-one turnout above 74% — easing the most acute near-term risk to a sold-out 2026 HBM4 supply.

4 hours ago2 min read