Analyst(s): Ray Wang
Publication Date: July 25, 2025
AWS’s launch of EC2 P6e-GB200 UltraServers powered by NVIDIA Grace Blackwell, designed for trillion-parameter AI model workloads, combines liquid cooling, UltraClusters, and advanced interconnects to push GPU scalability to new heights.
What is Covered in this Article:
- Amazon Web Services launched EC2 P6e-GB200 UltraServers with NVIDIA Grace Blackwell Superchips for high-end AI workloads.
- The UltraServers support up to 72 NVIDIA Blackwell GPUs with 360 PFLOPS of FP8 compute and 13.4 TB of high-bandwidth memory.
- P6e-GB200 instances are AWS’s first large-scale liquid-cooled systems and are deployed in third-generation EC2 UltraClusters.
- The UltraServers are integrated with managed services like SageMaker HyperPod, Amazon EKS, and NVIDIA DGX Cloud.
The News: Amazon Web Services has made its EC2 P6e-GB200 UltraServers generally available, offering top-tier GPU power for AI workloads. These new servers pack NVIDIA GB200 NVL72 Superchips, combining Grace CPUs with Blackwell GPUs, and deliver up to 360 petaflops of FP8 compute and 13.4 TB of HBM3E 12hi memory within a single NVLink setup.
They are part of EC2 UltraClusters, offering up to 28.8 Tbps networking via Elastic Fabric Adapter v4 (EFAv4), and are now available in the Dallas Local Zone through EC2 Capacity Blocks for ML. Customers can access them through SageMaker HyperPod, Amazon EKS, or NVIDIA DGX Cloud.
Can AWS’s New AI Infrastructure Offering Stand Out from Its Competitors?
Analyst Take: AWS’s new EC2 P6e-GB200 UltraServers bring a dense, high-performance GPU setup built specifically for training and running powerful AI models. With 72 Blackwell GPUs linked in a single NVLink domain, these machines are built to handle trillion-parameter models. They combine NVIDIA Grace CPUs, liquid cooling, and EC2 UltraClusters to create a highly integrated system. Support across HyperPod, EKS, and DGX Cloud gives users flexibility while keeping performance strong. This launch represents AWS’s most advanced GPU offering to date and sets the stage for building next-gen AI systems.
In addition, AWS’s in‑house liquid‑cooling technology gives the cloud platform a system‑level edge while driving down cooling costs. Owning the design lets AWS source individual components rather than rely on system vendor solutions, delivering greater pricing, architecture, and performance flexibility so its cooling strategy can precisely match workload demands and cost management.
Performance and Scalability for Trillion-Parameter Models
Each UltraServer includes up to 72 Blackwell GPUs connected with fifth-gen NVLink, forming one massive compute unit. This setup delivers 360 petaflops of FP8 compute and 13.4 TB of high-speed HBM3E memory – far beyond what P5en instances offer. The Grace Blackwell Superchip ties together two GPUs and one CPU through NVLink-C2C, speeding up data movement inside the server. With EFAv4’s 28.8 Tbps bandwidth, this system is built for massive-scale AI tasks, including reasoning models and mixture-of-experts architectures.
Deployment Options with Full AWS Integration
The UltraServers can scale across AWS’s biggest data centers through EC2 UltraClusters. AWS makes them easy to use through managed services like SageMaker HyperPod, which automatically replaces faulty nodes within the same NVLink domain. Amazon EKS handles node setup and includes smart routing for NVLink connections. NVIDIA DGX Cloud gives access to NVIDIA’s full AI software stack and tools. These deployment paths make the new instances usable across ML pipelines, Kubernetes setups, and NVIDIA-optimized AI workflows.
Liquid-Cooled Hardware and Infrastructure Efficiency
This is AWS’s first major rollout of liquid-cooled servers, using In-Row Heat Exchanger (IRHX) tech to cool high-density GPU racks. This technology reduces the need for airflow cooling and fits into AWS’s third-gen UltraClusters, which cut power use by up to 40% and slash cabling by over 80%. Built on the AWS Nitro System, these servers stay secure and isolated while allowing live updates. This setup is focused on performance, reliability, and operational efficiency at scale.
Workload Optimization Across Model Sizes and Architectures
The P6e-GB200 UltraServers are made for cutting-edge models that need big inference power, long context handling, or high concurrency. Their design avoids the usual downsides of spreading workloads across multiple GPU clusters. With NVIDIA Dynamo, they support disaggregated serving, making them more efficient for inference-heavy tasks. Teams working on generative AI, video/image generation, or speech processing will benefit from steady latency and throughput. All of this makes the P6e-GB200 a strong pick for enterprises building the future of AI.
What to Watch:
- Deployment success may depend on alignment with NVLink architectures and the new availability of liquid-cooled infrastructure and associated technologies.
- EC2 Capacity Block scheduling and upfront pricing could influence workload migration and cost planning.
- Adoption will require tuning for EFAv4 and Nitro-based networking stacks to realize full throughput and reliability.
- UltraServer availability in other regions will be critical for global scaling of AI development workloads.
See the complete announcement on the launch of Amazon EC2 P6e-GB200 UltraServers accelerated by NVIDIA Grace Blackwell on the AWS blog.
Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.
Other insights from Futurum:
Amazon Q1 FY 2025 Earnings Reflect Cloud Momentum, Operating Margin Gains
Oracle Database@AWS Rollout Deliver Multicloud Synergies
Application Security, Identity, & More at AWS re:Inforce 2025
Author Information
Ray Wang is the Research Director for Semiconductors, Supply Chain, and Emerging Technology at Futurum. His coverage focuses on the global semiconductor industry and frontier technologies. He also advises clients on global compute distribution, deployment, and supply chain. In addition to his main coverage and expertise, Wang also specializes in global technology policy, supply chain dynamics, and U.S.-China relations.
He has been quoted or interviewed regularly by leading media outlets across the globe, including CNBC, CNN, MarketWatch, Nikkei Asia, South China Morning Post, Business Insider, Science, Al Jazeera, Fast Company, and TaiwanPlus.
Prior to joining Futurum, Wang worked as an independent semiconductor and technology analyst, advising technology firms and institutional investors on industry development, regulations, and geopolitics. He also held positions at leading consulting firms and think tanks in Washington, D.C., including DGA–Albright Stonebridge Group, the Center for Strategic and International Studies (CSIS), and the Carnegie Endowment for International Peace.
