Analyst(s): Brendan Burke
Publication Date: June 24, 2026
What Is Covered in This Article:
- Amazon EC2 G7 instances become the first major cloud offering accelerated by NVIDIA RTX PRO 4500 Blackwell GPUs, paired with custom Intel Xeon CPUs for inference and graphics.
- AWS deepens its collaboration with QuEra to bring Libra, the first fault-tolerant quantum computer, to Amazon Braket by 2028.
- AWS Outposts racks add bmn-cx3a, the first AMD-based instances with accelerated networking, delivering up to 800 Gbps of bare-metal throughput at the edge.
- AWS details a $200 billion 2026 AI infrastructure investment and a new RGN (Randomized Graph Networks) topology built to remove fabric bottlenecks for dense inference.
- Why AWS’s full stack of custom silicon positions it to ground agentic context at scale.
The Event—Major Themes & Vendor Moves: The AWS Summit in New York returned to the Javits Center in mid-June 2026 as the company’s flagship regional event of the year, drawing thousands of builders, customers, and partners. Swami Sivasubramanian, AWS VP of Agentic AI, delivered the keynote, and a parallel Analyst Forum exposed the data center and silicon layer beneath the headline agent announcements.
The narrative was agentic AI — Amazon Bedrock AgentCore additions, the new AWS Context knowledge-graph service, and autonomous agents in Amazon Quick. But the infrastructure disclosures were the foundation. AWS reiterated a $200 billion 2026 AI infrastructure investment spanning 39 regions and 123 availability zones, a 20-million-kilometer network backbone, and 4 GW of data center power added in 2025 that it plans to double by the end of 2027.
On compute, Amazon EC2 G7 instances reached general availability as the first major cloud offering on NVIDIA RTX PRO 4500 Blackwell GPUs, delivering up to 4.6x the AI inference of G6. AWS deepened its QuEra collaboration to deliver a Megaquop-scale fault-tolerant quantum computer, Libra, on Amazon Braket by 2028. And AWS Outposts gained bmn-cx3a, its first AMD-based instances with accelerated networking, at up to 800 Gbps bare metal for the edge.
AWS Summit NY 2026: Is AI Infrastructure AWS’s Real Agentic Moat?
Analyst Take: Amazon Web Services (AWS) used Summit New York 2026 to argue that AWS AI infrastructure is the moat for the agentic era, not just models or frameworks. The data center announcements made part of the case with the G7 Blackwell launch, the QuEra fault-tolerant quantum collaboration, the AMD-powered Outposts instances, and the RGN networking fabric. The enthusiasm for agentic applications like Amazon Quick and AWS AgentCore suggests that general-purpose hardware can be leveraged to scale agentic applications across large enterprises. AWS’s existing footprint of CPUs and storage points to a vertically integrated stack purpose-built for dense inference. By offering scaled services and continuing to invest in customer savings rather than AI hype, AWS stepped up to the challenge of agentic context as only its infrastructure can enable.

Agentic Context Needs an Infrastructure Foundation
Swami Sivasubramanian addressed the mass market with his keynote, pushing agents into the daily workflows of Slack lookups and calendar invites that slow down the entire workforce. The new AWS Context service stood out as an agent enabler. Personal knowledge graphs that continuously improve offer an extensible foundation for agentic surfaces like Amazon Quick.
Delivering AWS Context at enterprise scale leans on full-stack infrastructure. The core is a continuously updated, organization-wide knowledge graph, so AWS needs graph storage that stays low-latency under heavy concurrency. The graph also “learns from how your agents work” — ranking sources, remembering good join paths, resolving schema ambiguities, and propagating that across the org — which implies a feedback/ranking pipeline running over usage telemetry, with GPU inference reserved for the LLM-driven relationship inference and AI-assisted curation steps. There’s “no infrastructure to provision” for the customer precisely because AWS absorbs a demanding mix of graph databases, permission-aware serving, S3/Iceberg storage, telemetry-driven learning loops, and CPU-dominant agentic compute on its side.
The “context layer for agents” is a convergent hyperscaler bet, not an AWS-only one. Where AWS can win is upstream of the data catalog. If ~80% of an agent loop is CPU-bound, the cost driver is the serving tier, not the graph. Graviton’s ~40% better price-performance versus x86, plus S3 as the cheapest large-scale storage spine, gives AWS a structural per-query advantage on the part of the bill that actually grows with agent usage. That edge will only work if the AWS Context service fee itself is modest, and unbundled consumption pricing has a habit of looking cheap in a POC and surprising you at scale
The GPU Crowd Showed Up, but Agentic AI Is Turning Into a CPU-Heavy Workload
The AI frontier was out in force. Presentations and booths from Anthropic, OpenAI, NVIDIA, Weights & Biases, and more did nothing to deny AWS’s claim to be the best place to run GPUs. Yet CPU co-optimization may make the cloud an inference winner. AWS entered the agentic CPU fray at the Summit’s Analyst Forum, acknowledging that agentic AI has fundamentally changed the compute profile of inference, and it skews heavily toward the CPU. As AWS describes the agentic use case, a user request is picked up by an orchestrator, which fires an LLM call to a reasoning model to plan the next steps, then issues a cascade of database queries, API calls, retrieval steps, code execution, and guardrail checks — evaluating each result before deciding whether to loop again or return an answer.
By AWS’s own accounting, a single request triggers only one to two LLM calls — the GPU-bound part — but generates roughly five to 15 total executions, and those orchestration, tool-use, and guardrail steps are CPU-centric. AWS estimates that about 80% of agentic compute lands on the CPU, not the accelerator. That logic fits the new Graviton 5 CPU, announced the week before the Summit, with up to 5x the local cache and 25% better fabric performance than the prior generation, and BF16 and vector extension support tuned for the ML and agentic pathways. Pairing the Blackwell-based G7 with custom Intel Xeon, and routing the heavy orchestration tail to Graviton at roughly 40% better price-performance than x86, lets AWS serve the whole agentic loop on co-optimized silicon rather than burning GPU cycles on CPU work. If most agentic tokens are really CPU tokens, the cheapest CPU wins the workload.

RGN: A Flat Fabric for Heterogeneous Systems
AWS’s breakthrough networking topology RGN (Randomized Graph Networks) has also been rolling out in new data centers this year to counteract the trend of increasing networking hardware content in AI data centers. Replacing the traditional fat-tree design with a randomized graph that operates within a building, RGN is claimed to cut in-building network power by roughly 40% and lower the cost of building and operating the network by about 27%. New data centers get it by default, and AWS folds it into existing sites as it refreshes aging racks and reclaims power. It’s a well-timed innovation to remove fabric bottlenecks for dense inference. AWS effectively stumbled onto random paths in EC2 simulations, met deep internal skepticism that a non-deterministic fabric could be operated at scale, and then proved out that graph theory pushes more throughput through random paths than a structured tree, with a custom routing protocol delivering sub-second convergence.

The more important point for the agentic era is what a flat RGN fabric enables: heterogeneous systems on one network. Rather than the rigid, delicate backend GPU networks, much of the industry runs protocols that don’t interoperate well across multiple GPU types or over distance — AWS leans on Elastic Fabric Adapter to stitch CPU, storage, and GPU workloads into a single VPC with RDMA performance across all of them. RGN is positioned as the best network for CPUs, storage, and everything that isn’t a dedicated GPU cluster, with oversubscription similar to the fabric it replaces. GPU clusters stay one-to-one, non-blocking on the multi-cluster network, and can shift between training and inference. A flatter network of mixed CPUs, accelerators, and storage is exactly the substrate a CPU-heavy agentic loop needs.
What to Watch:
- Whether Graviton 5 plus G7 makes AWS the inference cost leader.
- Whether randomized graph fabric’s 40% power and ~27% cost claims hold across regions and translate into lower customer pricing.
- Watch Libra roadmap progress and competition from AWS’s own Ocelot cat-qubit work, IBM, and Google.
- Whether 800 Gbps bare-metal networking pulls edge and latency-sensitive workloads onto on-prem AWS Outposts.
- Whether the data center buildout keeps pace with agentic demand.
You can read the full roundup of announcements at AWS’s website.
Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.
Other Insights From Futurum:
Will QuEra’s Neutral Atoms Deliver Fault-Tolerant Quantum on AWS by 2028?
AWS Graviton5 Reframes the CPU as Agentic AI Infrastructure
Is Anthropic’s $100 Billion Pact for AWS Silicon a Bargain in a Supply-Constrained Market?
Author Information
Brendan is Research Director, Semiconductors, Supply Chain, and Emerging Tech. He advises clients on strategic initiatives and leads the Futurum Semiconductors Practice. He is an experienced tech industry analyst who has guided tech leaders in identifying market opportunities spanning edge processors, generative AI applications, and hyperscale data centers.
Before joining Futurum, Brendan consulted with global AI leaders and served as a Senior Analyst in Emerging Technology Research at PitchBook. At PitchBook, he developed market intelligence tools for AI, highlighted by one of the industry’s most comprehensive AI semiconductor market landscapes encompassing both public and private companies. He has advised Fortune 100 tech giants, growth-stage innovators, global investors, and leading market research firms. Before PitchBook, he led research teams in tech investment banking and market research.
Brendan is based in Seattle, Washington. He has a Bachelor of Arts Degree from Amherst College.
