Analyst(s): Nick Patience
Publication Date: January 27, 2026

Amazon has announced the general availability of EC2 G7e instances, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. The new instances target generative AI inference and graphics workloads, offering higher GPU memory, bandwidth, and networking capabilities compared to the prior G6e generation.

What is Covered in this Article:

Amazon’s launch of EC2 G7e instances powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs
Performance and architectural improvements over the previous G6e instance family
Supported workloads, instance configurations, and deployment options
Regional availability and purchasing models for EC2 G7e instances

The News: Amazon announced the general availability of Amazon Elastic Compute Cloud (EC2) G7e instances, accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. The new G7e instances are optimized for generative AI and graphics-intensive workloads, delivering up to 2.3x higher inference performance than the prior G6e generation.

G7e instances support up to eight Blackwell GPUs with 96 GB of memory per GPU, up to 192 vCPUs, up to 1,600 Gbps of networking bandwidth, and up to 2,048 GiB of system memory. The instances are available today in the US East (N. Virginia) and US East (Ohio) regions and can be purchased as On-Demand, Spot, or Savings Plan instances.

Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference?

Analyst Take: Amazon’s introduction of EC2 G7e instances marks the latest expansion of its GPU-accelerated compute portfolio, centered on higher inference performance and expanded memory capacity. G7e instances are positioned to support generative AI inference, spatial computing, scientific computing, and mixed graphics-and-AI workloads using NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. Compared to G6e, the new instances emphasize increased GPU memory, higher memory bandwidth, and improved interconnect and networking capabilities. Amazon states that these changes enable customers to run medium-sized models of up to 70B parameters with FP8 precision on a single GPU.

Increased GPU Memory and Bandwidth

G7e instances double GPU memory and deliver 1.85x higher GPU memory bandwidth compared to G6e instances, according to Amazon. Each Blackwell GPU provides 96 GB of memory, enabling larger models to run on a single GPU without sharding. Amazon specifically notes that this configuration supports medium-sized models of up to 70B parameters using FP8 precision. This increase in on-device memory reduces reliance on multi-GPU partitioning for certain inference workloads. As a result, G7e targets workloads that benefit from higher memory density per GPU rather than solely raw compute throughput.

Multi-GPU Scaling and Inter-GPU Communication

For workloads that exceed the capacity of a single GPU, G7e instances support NVIDIA GPUDirect Peer-to-Peer (P2P) over PCIe. Amazon highlights lower peer-to-peer latency for GPUs on the same PCIe switch and up to four times higher inter-GPU bandwidth compared to the L40s GPUs used in G6e instances. These improvements allow inference workloads to scale across multiple GPUs within a single node, supporting up to 768 GB of total GPU memory. Amazon positions this capability for larger models that require multi-GPU execution rather than single-GPU inference. The emphasis remains on reducing communication overhead within a node rather than across clusters.

Networking and Multi-Node Capabilities

G7e instances offer four times the networking bandwidth of G6e, enabling support for small-scale multi-node workloads. Multi-GPU configurations support NVIDIA GPUDirect RDMA with Elastic Fabric Adapter (EFA), reducing latency for GPU-to-GPU communication across nodes. Amazon also states that G7e supports NVIDIA GPUDirectStorage with Amazon FSx for Lustre, delivering up to 1.2 Tbps of throughput for faster model loading. These capabilities extend G7e beyond single-node inference into limited multi-node scenarios. However, Amazon frames these improvements as incremental enhancements rather than a shift toward large-scale distributed training.

Instance Configurations and Deployment Options

Amazon offers six G7e instance sizes, ranging from a single-GPU g7e.2xlarge to the eight-GPU g7e.48xlarge configuration. At the high end, instances support 192 vCPUs, 2 TB of system memory, and up to 15.2 TB of local NVMe SSD storage. G7e instances can be deployed using AWS Management Console, CLI, or SDKs, and are supported on Amazon ECS, Amazon EKS, and AWS Parallel Computing Service, with Amazon SageMaker support coming soon. The breadth of configurations suggests Amazon is targeting a wide range of inference and graphics use cases rather than a narrow workload profile. Overall, G7e extends Amazon’s EC2 GPU lineup with higher memory density and networking capacity rather than redefining its compute strategy.

What to Watch:

Adoption of G7e instances for single-GPU versus multi-GPU inference workloads
Customer uptake of GPUDirect P2P and RDMA features for multi-GPU configurations
Expansion of G7e regional availability beyond the US East regions
Timeline for Amazon SageMaker AI support for G7e instances

See the complete blog on the general availability of Amazon EC2 G7e instances accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on the Amazon website.

Declaration of generative AI and AI-assisted technologies in the writing process: This content has been generated with the support of artificial intelligence technologies. Due to the fast pace of content creation and the continuous evolution of data and information, The Futurum Group and its analysts strive to ensure the accuracy and factual integrity of the information presented. However, the opinions and interpretations expressed in this content reflect those of the individual author/analyst. The Futurum Group makes no guarantees regarding the completeness, accuracy, or reliability of any information contained herein. Readers are encouraged to verify facts independently and consult relevant sources for further clarification.

Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.

Other insights from Futurum:

AWS European Sovereign Cloud Debuts with Independent EU Infrastructure

Amazon Q3 FY 2025 Earnings: AWS Reaccelerates, Retail and Ads Grow

AWS re:Invent 2025: Wrestling Back AI Leadership

Author Information

Nick Patience

Nick Patience is VP and Practice Lead for AI Platforms at The Futurum Group. Nick is a thought leader on AI development, deployment, and adoption - an area he has researched for 25 years. Before Futurum, Nick was a Managing Analyst with S&P Global Market Intelligence, responsible for 451 Research’s coverage of Data, AI, Analytics, Information Security, and Risk. Nick became part of S&P Global through its 2019 acquisition of 451 Research, a pioneering analyst firm that Nick co-founded in 1999. He is a sought-after speaker and advisor, known for his expertise in the drivers of AI adoption, industry use cases, and the infrastructure behind its development and deployment. Nick also spent three years as a product marketing lead at Recommind (now part of OpenText), a machine learning-driven eDiscovery software company. Nick is based in London.

Analyze

Data & Intelligence

Advise

Research & Advisory

Amplify

Content & Campaigns

Assess

Testing, Labs & Validation

Practice Areas

Featured Insights

Futurum Research 2026: Key Issues and Predictions

2026 Research Agenda: Key Topics and Coverage Areas

Insights

Premium Insights

Newsletter

Media Partners

Podcasts

Video Series

Featured Insights

Is AI Ready for Real Work, or Are Enterprises Still Stuck in Experimentation?

Compliance as Code Is No Longer Optional: Why Manual Reviews Can’t Keep Up

Futurum Group

Portfolio Companies

Featured Insights

Is AI Ready for Real Work, or Are Enterprises Still Stuck in Experimentation?

Compliance as Code Is No Longer Optional: Why Manual Reviews Can’t Keep Up

Trusted by 100+ industry leaders

Featured Case Study

Scaling Smarter: How Google Cloud Marketplace Is Reshaping Partner Sales and GTM Strategy

Maximizing ROI with Agentic AI: Why Agentforce Is the Fast Path to Enterprise Value

Futurum and Kearney Reveal CEOs’ Readiness for AI Transformation in Landmark Study

Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference?

What is Covered in this Article:

Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference?

Increased GPU Memory and Bandwidth

Multi-GPU Scaling and Inter-GPU Communication

Networking and Multi-Node Capabilities

Instance Configurations and Deployment Options

What to Watch:

Other insights from Futurum:

Author Information

Welcome to The Futurum Group

Book a Demo

Welcome

Benjamin Brown

Newsletter Sign-up Form

Thank you, we received your request, a member of our team will be in contact with you.