AMD has unveiled the Instinct MI400, its most aggressive play yet for the AI accelerator market. The chip features 256GB of HBM4 memory — nearly double NVIDIA's current H200 — and AMD claims it matches H200 training performance on large language models at 40% lower cost. If the benchmarks hold up in production, the MI400 represents the first credible challenge to NVIDIA's dominance in AI training hardware.
Hardware Specifications
The MI400 is built on TSMC's 3nm process and uses AMD's CDNA 4 architecture. Key specifications:
- Memory: 256GB HBM4 with 8 TB/s bandwidth
- Compute: 2,800 TFLOPS FP8, 1,400 TFLOPS FP16
- Interconnect: 4th-generation Infinity Fabric with 900 GB/s chip-to-chip bandwidth
- Power: 700W TDP
- Networking: Native 800G Ethernet support
The memory capacity is the headline differentiator. At 256GB per chip, a single 8-chip node provides 2TB of HBM — enough to hold a 70-billion parameter model entirely in memory without sharding across nodes. This reduces inter-node communication overhead, which is one of the primary performance bottlenecks in distributed training.
Benchmark Claims
AMD presented training throughput numbers on Llama-class models comparing the MI400 to NVIDIA's H200. On a 70B parameter model, AMD showed the MI400 matching H200 throughput token-for-token on an 8-chip configuration. On a 405B parameter model, the MI400 came within 10% of H200 performance but required fewer total chips due to higher memory capacity.
AMD also highlighted inference performance, where the MI400's memory capacity provides a clear advantage. Larger models can be served from fewer chips, reducing the total cost of ownership for inference-heavy workloads.
NVIDIA has not yet responded to AMD's benchmark claims. Independent verification from MLCommons is expected within the next quarter.
Software Ecosystem
Hardware performance is only half the equation. NVIDIA's CUDA ecosystem remains the primary reason most AI teams choose NVIDIA chips. AMD addressed this directly, announcing expanded ROCm support including compatibility with PyTorch 3.0, JAX, and the newly released Triton compiler. The company also announced partnerships with Hugging Face and vLLM to ensure popular inference frameworks run optimally on MI400 hardware.
"CUDA lock-in is real, but it's weakening," said Lisa Su, AMD CEO. "Every major framework now supports ROCm. The software gap has closed enough that price and performance can drive the decision."
Customer Commitments
Microsoft Azure and Oracle Cloud have confirmed they will offer MI400 cloud instances at launch. Meta, which has used AMD MI300X chips in its training infrastructure, is evaluating the MI400 for future Llama model training runs.
Market Implications
NVIDIA controls approximately 80% of the AI accelerator market. The MI400 is unlikely to change that overnight, but it gives cloud providers and enterprises a credible alternative that can drive pricing pressure. If AMD delivers on its performance claims, NVIDIA may be forced to accelerate its own roadmap or adjust pricing — both of which benefit AI companies.
The MI400 begins shipping to hyperscaler partners in Q3 2026, with broader availability in Q4.



