Amazon Web Services and NVIDIA announced a significantly expanded partnership at GTC 2026, with AWS committing to deploy more than one million NVIDIA GPUs across its cloud regions — a scale of AI infrastructure investment that underscores just how aggressively hyperscalers are racing to meet enterprise AI demand.
A Million GPUs and Counting
The headline number is staggering: over one million NVIDIA GPUs deployed across AWS regions starting in 2026. The commitment makes AWS one of the largest single customers for NVIDIA's AI accelerators and positions Amazon's cloud division to capture a growing share of enterprise AI workloads that require massive compute.
The deployment spans multiple NVIDIA GPU generations. Notably, AWS will be the first major cloud provider to offer NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs on Amazon EC2, giving developers access to the latest Blackwell architecture for inference and training workloads.
Nemotron Models Come to Bedrock
Beyond hardware, the partnership brings NVIDIA's open AI models directly into Amazon's managed AI platform. NVIDIA Nemotron 3 Super — a hybrid mixture-of-experts model with 120 billion total parameters but only 12 billion active per forward pass — is coming to Amazon Bedrock. The model is designed for complex multi-agent workloads including software development, cybersecurity triaging, and extended reasoning tasks.
The smaller Nemotron 3 Nano is already available on Amazon Bedrock, and is also available separately within Salesforce Agentforce on NVIDIA's own infrastructure. Critically, developers will soon be able to fine-tune Nemotron models directly on Bedrock using reinforcement fine-tuning, lowering the barrier to customizing open models for enterprise-specific use cases.
Infrastructure Innovations
The collaboration extends into infrastructure optimization. AWS is integrating NVIDIA NIXL for interconnect acceleration to support disaggregated large language model inference on its Elastic Fabric Adapter network. The companies also highlighted 3x faster Apache Spark performance using Amazon EMR on Elastic Kubernetes Service with EC2 G7e instances — a meaningful improvement for data engineering pipelines that feed AI workloads.
The Bigger Picture
This partnership fits within NVIDIA CEO Jensen Huang's broader GTC narrative about the "inference inflection" — the idea that AI has crossed the threshold where it can do productive work at scale, and the bottleneck is now infrastructure supply rather than model capability.
For enterprises evaluating where to run AI workloads, the AWS-NVIDIA expansion means more GPU availability, tighter model integration, and a clearer path from prototyping on managed services to production-scale deployment. As AI spending accelerates — industry analysts project total global AI spending to reach $2.5 trillion by the end of 2026, with AI infrastructure accounting for approximately $1.37 trillion of that total — partnerships of this scale will determine which cloud platforms capture the next wave of enterprise AI adoption.



