Menu

Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference?

Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference

Analyst(s): Nick Patience
Publication Date: January 27, 2026

Amazon has announced the general availability of EC2 G7e instances, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. The new instances target generative AI inference and graphics workloads, offering higher GPU memory, bandwidth, and networking capabilities compared to the prior G6e generation.

What is Covered in this Article:

  • Amazon’s launch of EC2 G7e instances powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs
  • Performance and architectural improvements over the previous G6e instance family
  • Supported workloads, instance configurations, and deployment options
  • Regional availability and purchasing models for EC2 G7e instances

The News: Amazon announced the general availability of Amazon Elastic Compute Cloud (EC2) G7e instances, accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. The new G7e instances are optimized for generative AI and graphics-intensive workloads, delivering up to 2.3x higher inference performance than the prior G6e generation.

G7e instances support up to eight Blackwell GPUs with 96 GB of memory per GPU, up to 192 vCPUs, up to 1,600 Gbps of networking bandwidth, and up to 2,048 GiB of system memory. The instances are available today in the US East (N. Virginia) and US East (Ohio) regions and can be purchased as On-Demand, Spot, or Savings Plan instances.

Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference?

Analyst Take: Amazon’s introduction of EC2 G7e instances marks the latest expansion of its GPU-accelerated compute portfolio, centered on higher inference performance and expanded memory capacity. G7e instances are positioned to support generative AI inference, spatial computing, scientific computing, and mixed graphics-and-AI workloads using NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. Compared to G6e, the new instances emphasize increased GPU memory, higher memory bandwidth, and improved interconnect and networking capabilities. Amazon states that these changes enable customers to run medium-sized models of up to 70B parameters with FP8 precision on a single GPU.

Increased GPU Memory and Bandwidth

G7e instances double GPU memory and deliver 1.85x higher GPU memory bandwidth compared to G6e instances, according to Amazon. Each Blackwell GPU provides 96 GB of memory, enabling larger models to run on a single GPU without sharding. Amazon specifically notes that this configuration supports medium-sized models of up to 70B parameters using FP8 precision. This increase in on-device memory reduces reliance on multi-GPU partitioning for certain inference workloads. As a result, G7e targets workloads that benefit from higher memory density per GPU rather than solely raw compute throughput.

Multi-GPU Scaling and Inter-GPU Communication

For workloads that exceed the capacity of a single GPU, G7e instances support NVIDIA GPUDirect Peer-to-Peer (P2P) over PCIe. Amazon highlights lower peer-to-peer latency for GPUs on the same PCIe switch and up to four times higher inter-GPU bandwidth compared to the L40s GPUs used in G6e instances. These improvements allow inference workloads to scale across multiple GPUs within a single node, supporting up to 768 GB of total GPU memory. Amazon positions this capability for larger models that require multi-GPU execution rather than single-GPU inference. The emphasis remains on reducing communication overhead within a node rather than across clusters.

Networking and Multi-Node Capabilities

G7e instances offer four times the networking bandwidth of G6e, enabling support for small-scale multi-node workloads. Multi-GPU configurations support NVIDIA GPUDirect RDMA with Elastic Fabric Adapter (EFA), reducing latency for GPU-to-GPU communication across nodes. Amazon also states that G7e supports NVIDIA GPUDirectStorage with Amazon FSx for Lustre, delivering up to 1.2 Tbps of throughput for faster model loading. These capabilities extend G7e beyond single-node inference into limited multi-node scenarios. However, Amazon frames these improvements as incremental enhancements rather than a shift toward large-scale distributed training.

Instance Configurations and Deployment Options

Amazon offers six G7e instance sizes, ranging from a single-GPU g7e.2xlarge to the eight-GPU g7e.48xlarge configuration. At the high end, instances support 192 vCPUs, 2 TB of system memory, and up to 15.2 TB of local NVMe SSD storage. G7e instances can be deployed using AWS Management Console, CLI, or SDKs, and are supported on Amazon ECS, Amazon EKS, and AWS Parallel Computing Service, with Amazon SageMaker support coming soon. The breadth of configurations suggests Amazon is targeting a wide range of inference and graphics use cases rather than a narrow workload profile. Overall, G7e extends Amazon’s EC2 GPU lineup with higher memory density and networking capacity rather than redefining its compute strategy.

What to Watch:

  • Adoption of G7e instances for single-GPU versus multi-GPU inference workloads
  • Customer uptake of GPUDirect P2P and RDMA features for multi-GPU configurations
  • Expansion of G7e regional availability beyond the US East regions
  • Timeline for Amazon SageMaker AI support for G7e instances

See the complete blog on the general availability of Amazon EC2 G7e instances accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on the Amazon website.

Declaration of generative AI and AI-assisted technologies in the writing process: This content has been generated with the support of artificial intelligence technologies. Due to the fast pace of content creation and the continuous evolution of data and information, The Futurum Group and its analysts strive to ensure the accuracy and factual integrity of the information presented. However, the opinions and interpretations expressed in this content reflect those of the individual author/analyst. The Futurum Group makes no guarantees regarding the completeness, accuracy, or reliability of any information contained herein. Readers are encouraged to verify facts independently and consult relevant sources for further clarification.

Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.

Other insights from Futurum:

AWS European Sovereign Cloud Debuts with Independent EU Infrastructure

Amazon Q3 FY 2025 Earnings: AWS Reaccelerates, Retail and Ads Grow

AWS re:Invent 2025: Wrestling Back AI Leadership

Author Information

Nick Patience is VP and Practice Lead for AI Platforms at The Futurum Group. Nick is a thought leader on AI development, deployment, and adoption - an area he has researched for 25 years. Before Futurum, Nick was a Managing Analyst with S&P Global Market Intelligence, responsible for 451 Research’s coverage of Data, AI, Analytics, Information Security, and Risk. Nick became part of S&P Global through its 2019 acquisition of 451 Research, a pioneering analyst firm that Nick co-founded in 1999. He is a sought-after speaker and advisor, known for his expertise in the drivers of AI adoption, industry use cases, and the infrastructure behind its development and deployment. Nick also spent three years as a product marketing lead at Recommind (now part of OpenText), a machine learning-driven eDiscovery software company. Nick is based in London.

Related Insights
NVIDIA and CoreWeave Team to Break Through Data Center Real Estate Bottlenecks
January 27, 2026

NVIDIA and CoreWeave Team to Break Through Data Center Real Estate Bottlenecks

Nick Patience, AI Platforms Practice Lead at Futurum, shares his insights on NVIDIA’s $2 billion investment in CoreWeave to accelerate the buildout of over 5 gigawatts of specialized AI factories...
Did SPIE Photonics West 2026 Set the Stage for Scale-up Optics
January 27, 2026

Did SPIE Photonics West 2026 Set the Stage for Scale-up Optics?

Brendan Burke, Research Director at The Futurum Group, explains how SPIE Photonics West 2026 revealed that scaling co-packaged optics depends on cross-domain engineering, thermal materials, and manufacturing testing....
Will Microsoft’s “Frontier Firms” Serve as Models for AI Utilization
January 26, 2026

Will Microsoft’s “Frontier Firms” Serve as Models for AI Utilization?

Keith Kirkpatrick, VP and Research Director at Futurum, covers the New York Microsoft AI Tour stop and discusses how the company is shifting the conversation around AI from features to...
Snowflake Acquires Observe Operationalizing the Data Cloud
January 26, 2026

Snowflake Acquires Observe: Operationalizing the Data Cloud

Brad Shimmin, VP & Practice Lead at Futurum, examines Snowflake’s intent to acquire Observe and integrate AI-powered observability into the AI Data Cloud....
Intel Q4 FY 2025 AI PC Ramp Meets Supply Constraints
January 26, 2026

Intel Q4 FY 2025: AI PC Ramp Meets Supply Constraints

Futurum Research analyzes Intel’s Q4 FY 2025 results, highlighting AI PC and data center demand, 18A/14A progress, and near-term supply constraints with guidance improving as supply recovers from Q2 FY...
ServiceNow Bets on OpenAI to Power Agentic Enterprise Workflows
January 23, 2026

ServiceNow Bets on OpenAI to Power Agentic Enterprise Workflows

Keith Kirkpatrick, Research Director at Futurum, examines ServiceNow’s multi-year collaboration with OpenAI, highlighting a shift toward agentic AI embedded in core enterprise workflows....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.