Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference?

Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference

Analyst(s): Nick Patience
Publication Date: January 27, 2026

Amazon has announced the general availability of EC2 G7e instances, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. The new instances target generative AI inference and graphics workloads, offering higher GPU memory, bandwidth, and networking capabilities compared to the prior G6e generation.

What is Covered in this Article:

  • Amazon’s launch of EC2 G7e instances powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs
  • Performance and architectural improvements over the previous G6e instance family
  • Supported workloads, instance configurations, and deployment options
  • Regional availability and purchasing models for EC2 G7e instances

The News: Amazon announced the general availability of Amazon Elastic Compute Cloud (EC2) G7e instances, accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. The new G7e instances are optimized for generative AI and graphics-intensive workloads, delivering up to 2.3x higher inference performance than the prior G6e generation.

G7e instances support up to eight Blackwell GPUs with 96 GB of memory per GPU, up to 192 vCPUs, up to 1,600 Gbps of networking bandwidth, and up to 2,048 GiB of system memory. The instances are available today in the US East (N. Virginia) and US East (Ohio) regions and can be purchased as On-Demand, Spot, or Savings Plan instances.

Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference?

Analyst Take: Amazon’s introduction of EC2 G7e instances marks the latest expansion of its GPU-accelerated compute portfolio, centered on higher inference performance and expanded memory capacity. G7e instances are positioned to support generative AI inference, spatial computing, scientific computing, and mixed graphics-and-AI workloads using NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. Compared to G6e, the new instances emphasize increased GPU memory, higher memory bandwidth, and improved interconnect and networking capabilities. Amazon states that these changes enable customers to run medium-sized models of up to 70B parameters with FP8 precision on a single GPU.

Increased GPU Memory and Bandwidth

G7e instances double GPU memory and deliver 1.85x higher GPU memory bandwidth compared to G6e instances, according to Amazon. Each Blackwell GPU provides 96 GB of memory, enabling larger models to run on a single GPU without sharding. Amazon specifically notes that this configuration supports medium-sized models of up to 70B parameters using FP8 precision. This increase in on-device memory reduces reliance on multi-GPU partitioning for certain inference workloads. As a result, G7e targets workloads that benefit from higher memory density per GPU rather than solely raw compute throughput.

Multi-GPU Scaling and Inter-GPU Communication

For workloads that exceed the capacity of a single GPU, G7e instances support NVIDIA GPUDirect Peer-to-Peer (P2P) over PCIe. Amazon highlights lower peer-to-peer latency for GPUs on the same PCIe switch and up to four times higher inter-GPU bandwidth compared to the L40s GPUs used in G6e instances. These improvements allow inference workloads to scale across multiple GPUs within a single node, supporting up to 768 GB of total GPU memory. Amazon positions this capability for larger models that require multi-GPU execution rather than single-GPU inference. The emphasis remains on reducing communication overhead within a node rather than across clusters.

Networking and Multi-Node Capabilities

G7e instances offer four times the networking bandwidth of G6e, enabling support for small-scale multi-node workloads. Multi-GPU configurations support NVIDIA GPUDirect RDMA with Elastic Fabric Adapter (EFA), reducing latency for GPU-to-GPU communication across nodes. Amazon also states that G7e supports NVIDIA GPUDirectStorage with Amazon FSx for Lustre, delivering up to 1.2 Tbps of throughput for faster model loading. These capabilities extend G7e beyond single-node inference into limited multi-node scenarios. However, Amazon frames these improvements as incremental enhancements rather than a shift toward large-scale distributed training.

Instance Configurations and Deployment Options

Amazon offers six G7e instance sizes, ranging from a single-GPU g7e.2xlarge to the eight-GPU g7e.48xlarge configuration. At the high end, instances support 192 vCPUs, 2 TB of system memory, and up to 15.2 TB of local NVMe SSD storage. G7e instances can be deployed using AWS Management Console, CLI, or SDKs, and are supported on Amazon ECS, Amazon EKS, and AWS Parallel Computing Service, with Amazon SageMaker support coming soon. The breadth of configurations suggests Amazon is targeting a wide range of inference and graphics use cases rather than a narrow workload profile. Overall, G7e extends Amazon’s EC2 GPU lineup with higher memory density and networking capacity rather than redefining its compute strategy.

What to Watch:

  • Adoption of G7e instances for single-GPU versus multi-GPU inference workloads
  • Customer uptake of GPUDirect P2P and RDMA features for multi-GPU configurations
  • Expansion of G7e regional availability beyond the US East regions
  • Timeline for Amazon SageMaker AI support for G7e instances

See the complete blog on the general availability of Amazon EC2 G7e instances accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on the Amazon website.

Declaration of generative AI and AI-assisted technologies in the writing process: This content has been generated with the support of artificial intelligence technologies. Due to the fast pace of content creation and the continuous evolution of data and information, The Futurum Group and its analysts strive to ensure the accuracy and factual integrity of the information presented. However, the opinions and interpretations expressed in this content reflect those of the individual author/analyst. The Futurum Group makes no guarantees regarding the completeness, accuracy, or reliability of any information contained herein. Readers are encouraged to verify facts independently and consult relevant sources for further clarification.

Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.

Other insights from Futurum:

AWS European Sovereign Cloud Debuts with Independent EU Infrastructure

Amazon Q3 FY 2025 Earnings: AWS Reaccelerates, Retail and Ads Grow

AWS re:Invent 2025: Wrestling Back AI Leadership

Author Information

Nick Patience is VP and Practice Lead for AI Platforms at The Futurum Group. Nick is a thought leader on AI development, deployment, and adoption - an area he has researched for 25 years. Before Futurum, Nick was a Managing Analyst with S&P Global Market Intelligence, responsible for 451 Research’s coverage of Data, AI, Analytics, Information Security, and Risk. Nick became part of S&P Global through its 2019 acquisition of 451 Research, a pioneering analyst firm that Nick co-founded in 1999. He is a sought-after speaker and advisor, known for his expertise in the drivers of AI adoption, industry use cases, and the infrastructure behind its development and deployment. Nick also spent three years as a product marketing lead at Recommind (now part of OpenText), a machine learning-driven eDiscovery software company. Nick is based in London.

Related Insights
Is AI Ready for Real Work, or Are Enterprises Still Stuck in Experimentation?
July 4, 2026

Is AI Ready for Real Work, or Are Enterprises Still Stuck in Experimentation?

Most enterprises claim advanced AI maturity, but lack governance and deployment strategies. Leading organizations are moving from experimentation to measurable AI impact....
Compliance as Code Is No Longer Optional: Why Manual Reviews Can’t Keep Up
July 4, 2026

Compliance as Code Is No Longer Optional: Why Manual Reviews Can’t Keep Up

Qodo's 'Compliance as Code' framework automates enterprise AI compliance through PR checks, solving the data privacy and security gaps that plague manual reviews at scale....
Databricks AI’s GPU Reliability Push Exposes Hidden Risks for Large-Scale Training
July 3, 2026

Databricks AI’s GPU Reliability Push Exposes Hidden Risks for Large-Scale Training

Databricks AI reveals critical GPU reliability challenges in distributed training environments. Silent slowdowns and numerical corruption pose greater risks than visible failures, threatening model quality and compute efficiency at enterprise...
AI Code Review Hits a Wall: Why Speed Without Trust Risks Engineering Chaos
July 3, 2026

AI Code Review Hits a Wall: Why Speed Without Trust Risks Engineering Chaos

A survey shows 94% of engineering leaders use agentic AI coding tools, but 55% struggle with reliability and hallucinations—revealing a critical gap between development speed and production quality....
Brave's Browser Containers Raise the Bar for Privacy and Workflow Flexibility
July 3, 2026

Brave’s Browser Containers Raise the Bar for Privacy and Workflow Flexibility

As AI platform adoption accelerates to $181.3B projected market size, Brave's v1.92 release introduces native browser containers addressing data privacy concerns for 52.6% of enterprise decision makers managing multi-cloud AI...
Is Self-Healing ITOps Ready to Replace Manual Incident Response?
July 3, 2026

Is Self-Healing ITOps Ready to Replace Manual Incident Response?

LogicMonitor's AI-driven ITOps framework combines root-cause analysis with governed automation to reduce alert fatigue and accelerate issue resolution, as agentic AI reshapes enterprise infrastructure management....

Book a Demo

Welcome

The vision behind everything in Futurum’s Custom Research practice is this: research should show you what is happening, what comes next, and what to do about it. It should be personal to each audience, easy for people to grasp, and structured so LLMs can reason over it accurately. And it should be fast and turnkey; you want answers now, not another project to carry for quarters.

Whether you are defining business, channel, or go-to-market strategy; evaluating vendors or justifying ROI; or commissioning research to fill an emerging market need, we have your back, with a program that answers your questions with the objectivity and credibility to drive real decisions.

To do it, we bring unmatched data to bear: Futurum research, surveys, and market projections; validated market feeds; ETR’s 15 years of insight from 10,000 technology decision-makers; G2’s buyer and user data; and what our analysts hear every day. Add leading primary collection, from AI-moderated voice interviews to surveys and analyst-led interviews, all turnkey, and every project comes out credible, nuanced, and actionable.

And we don’t just drop the results in your lap. For internal work, we provide analyst-led sessions, interactive dashboards, and a range of formats. For market-facing work, Futurum delivers turnkey activation and amplification that actually gets seen, by people and by LLMs, through our media and share of voice. This is research that moves decisions and markets.

We will meet you wherever you are, from a fast-turn brief to a multi-year program, and shape the work to your goals, timeline, and budget. The right program for your moment.

If any of this is useful, I would love to talk.

Benjamin Brown, VP Custom Research, Futurum Research

Benjamin Brown

VP, Custom Research · The Futurum Group

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.