AWS Summit NY 2026: Is AI Infrastructure AWS’s Real Agentic Moat?

Analyst(s): Brendan Burke
Publication Date: June 24, 2026

What Is Covered in This Article:

  • Amazon EC2 G7 instances become the first major cloud offering accelerated by NVIDIA RTX PRO 4500 Blackwell GPUs, paired with custom Intel Xeon CPUs for inference and graphics.
  • AWS deepens its collaboration with QuEra to bring Libra, the first fault-tolerant quantum computer, to Amazon Braket by 2028.
  • AWS Outposts racks add bmn-cx3a, the first AMD-based instances with accelerated networking, delivering up to 800 Gbps of bare-metal throughput at the edge.
  • AWS details a $200 billion 2026 AI infrastructure investment and a new RGN (Randomized Graph Networks) topology built to remove fabric bottlenecks for dense inference.
  • Why AWS’s full stack of custom silicon positions it to ground agentic context at scale.

The Event—Major Themes & Vendor Moves: The AWS Summit in New York returned to the Javits Center in mid-June 2026 as the company’s flagship regional event of the year, drawing thousands of builders, customers, and partners. Swami Sivasubramanian, AWS VP of Agentic AI, delivered the keynote, and a parallel Analyst Forum exposed the data center and silicon layer beneath the headline agent announcements.

The narrative was agentic AI — Amazon Bedrock AgentCore additions, the new AWS Context knowledge-graph service, and autonomous agents in Amazon Quick. But the infrastructure disclosures were the foundation. AWS reiterated a $200 billion 2026 AI infrastructure investment spanning 39 regions and 123 availability zones, a 20-million-kilometer network backbone, and 4 GW of data center power added in 2025 that it plans to double by the end of 2027.

On compute, Amazon EC2 G7 instances reached general availability as the first major cloud offering on NVIDIA RTX PRO 4500 Blackwell GPUs, delivering up to 4.6x the AI inference of G6. AWS deepened its QuEra collaboration to deliver a Megaquop-scale fault-tolerant quantum computer, Libra, on Amazon Braket by 2028. And AWS Outposts gained bmn-cx3a, its first AMD-based instances with accelerated networking, at up to 800 Gbps bare metal for the edge.

AWS Summit NY 2026: Is AI Infrastructure AWS’s Real Agentic Moat?

Analyst Take: Amazon Web Services (AWS) used Summit New York 2026 to argue that AWS AI infrastructure is the moat for the agentic era, not just models or frameworks. The data center announcements made part of the case with the G7 Blackwell launch, the QuEra fault-tolerant quantum collaboration, the AMD-powered Outposts instances, and the RGN networking fabric. The enthusiasm for agentic applications like Amazon Quick and AWS AgentCore suggests that general-purpose hardware can be leveraged to scale agentic applications across large enterprises. AWS’s existing footprint of CPUs and storage points to a vertically integrated stack purpose-built for dense inference. By offering scaled services and continuing to invest in customer savings rather than AI hype, AWS stepped up to the challenge of agentic context as only its infrastructure can enable.

AWS Summit NY 2026 Is AI Infrastructure AWS’s Real Agentic Moat

Agentic Context Needs an Infrastructure Foundation

Swami Sivasubramanian addressed the mass market with his keynote, pushing agents into the daily workflows of Slack lookups and calendar invites that slow down the entire workforce. The new AWS Context service stood out as an agent enabler. Personal knowledge graphs that continuously improve offer an extensible foundation for agentic surfaces like Amazon Quick.

Delivering AWS Context at enterprise scale leans on full-stack infrastructure. The core is a continuously updated, organization-wide knowledge graph, so AWS needs graph storage that stays low-latency under heavy concurrency. The graph also “learns from how your agents work” — ranking sources, remembering good join paths, resolving schema ambiguities, and propagating that across the org — which implies a feedback/ranking pipeline running over usage telemetry, with GPU inference reserved for the LLM-driven relationship inference and AI-assisted curation steps. There’s “no infrastructure to provision” for the customer precisely because AWS absorbs a demanding mix of graph databases, permission-aware serving, S3/Iceberg storage, telemetry-driven learning loops, and CPU-dominant agentic compute on its side.

The “context layer for agents” is a convergent hyperscaler bet, not an AWS-only one. Where AWS can win is upstream of the data catalog. If ~80% of an agent loop is CPU-bound, the cost driver is the serving tier, not the graph. Graviton’s ~40% better price-performance versus x86, plus S3 as the cheapest large-scale storage spine, gives AWS a structural per-query advantage on the part of the bill that actually grows with agent usage. That edge will only work if the AWS Context service fee itself is modest, and unbundled consumption pricing has a habit of looking cheap in a POC and surprising you at scale

The GPU Crowd Showed Up, but Agentic AI Is Turning Into a CPU-Heavy Workload

The AI frontier was out in force. Presentations and booths from Anthropic, OpenAI, NVIDIA, Weights & Biases, and more did nothing to deny AWS’s claim to be the best place to run GPUs. Yet CPU co-optimization may make the cloud an inference winner. AWS entered the agentic CPU fray at the Summit’s Analyst Forum, acknowledging that agentic AI has fundamentally changed the compute profile of inference, and it skews heavily toward the CPU. As AWS describes the agentic use case, a user request is picked up by an orchestrator, which fires an LLM call to a reasoning model to plan the next steps, then issues a cascade of database queries, API calls, retrieval steps, code execution, and guardrail checks — evaluating each result before deciding whether to loop again or return an answer.

By AWS’s own accounting, a single request triggers only one to two LLM calls — the GPU-bound part — but generates roughly five to 15 total executions, and those orchestration, tool-use, and guardrail steps are CPU-centric. AWS estimates that about 80% of agentic compute lands on the CPU, not the accelerator. That logic fits the new Graviton 5 CPU, announced the week before the Summit, with up to 5x the local cache and 25% better fabric performance than the prior generation, and BF16 and vector extension support tuned for the ML and agentic pathways. Pairing the Blackwell-based G7 with custom Intel Xeon, and routing the heavy orchestration tail to Graviton at roughly 40% better price-performance than x86, lets AWS serve the whole agentic loop on co-optimized silicon rather than burning GPU cycles on CPU work. If most agentic tokens are really CPU tokens, the cheapest CPU wins the workload.

AWS Summit NY 2026 Is AI Infrastructure AWS’s Real Agentic Moat

RGN: A Flat Fabric for Heterogeneous Systems

AWS’s breakthrough networking topology RGN (Randomized Graph Networks) has also been rolling out in new data centers this year to counteract the trend of increasing networking hardware content in AI data centers. Replacing the traditional fat-tree design with a randomized graph that operates within a building, RGN is claimed to cut in-building network power by roughly 40% and lower the cost of building and operating the network by about 27%. New data centers get it by default, and AWS folds it into existing sites as it refreshes aging racks and reclaims power. It’s a well-timed innovation to remove fabric bottlenecks for dense inference. AWS effectively stumbled onto random paths in EC2 simulations, met deep internal skepticism that a non-deterministic fabric could be operated at scale, and then proved out that graph theory pushes more throughput through random paths than a structured tree, with a custom routing protocol delivering sub-second convergence.

AWS Summit NY 2026 Is AI Infrastructure AWS’s Real Agentic Moat

The more important point for the agentic era is what a flat RGN fabric enables: heterogeneous systems on one network. Rather than the rigid, delicate backend GPU networks, much of the industry runs protocols that don’t interoperate well across multiple GPU types or over distance — AWS leans on Elastic Fabric Adapter to stitch CPU, storage, and GPU workloads into a single VPC with RDMA performance across all of them. RGN is positioned as the best network for CPUs, storage, and everything that isn’t a dedicated GPU cluster, with oversubscription similar to the fabric it replaces. GPU clusters stay one-to-one, non-blocking on the multi-cluster network, and can shift between training and inference. A flatter network of mixed CPUs, accelerators, and storage is exactly the substrate a CPU-heavy agentic loop needs.

What to Watch:

  • Whether Graviton 5 plus G7 makes AWS the inference cost leader.
  • Whether randomized graph fabric’s 40% power and ~27% cost claims hold across regions and translate into lower customer pricing.
  • Watch Libra roadmap progress and competition from AWS’s own Ocelot cat-qubit work, IBM, and Google.
  • Whether 800 Gbps bare-metal networking pulls edge and latency-sensitive workloads onto on-prem AWS Outposts.
  • Whether the data center buildout keeps pace with agentic demand.

You can read the full roundup of announcements at AWS’s website.

Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.

Other Insights From Futurum:

Will QuEra’s Neutral Atoms Deliver Fault-Tolerant Quantum on AWS by 2028?

AWS Graviton5 Reframes the CPU as Agentic AI Infrastructure

Is Anthropic’s $100 Billion Pact for AWS Silicon a Bargain in a Supply-Constrained Market?

Author Information

Brendan Burke, Research Director

Brendan is Research Director, Semiconductors, Supply Chain, and Emerging Tech. He advises clients on strategic initiatives and leads the Futurum Semiconductors Practice. He is an experienced tech industry analyst who has guided tech leaders in identifying market opportunities spanning edge processors, generative AI applications, and hyperscale data centers. 

Before joining Futurum, Brendan consulted with global AI leaders and served as a Senior Analyst in Emerging Technology Research at PitchBook. At PitchBook, he developed market intelligence tools for AI, highlighted by one of the industry’s most comprehensive AI semiconductor market landscapes encompassing both public and private companies. He has advised Fortune 100 tech giants, growth-stage innovators, global investors, and leading market research firms. Before PitchBook, he led research teams in tech investment banking and market research.

Brendan is based in Seattle, Washington. He has a Bachelor of Arts Degree from Amherst College.

Related Insights
Agentic AI
June 24, 2026

Is Adobe’s Agentic AI Push the New Standard for Enterprise Customer Experience?

Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Di at Futurum, analyzes how Adobe's deep partnerships with leading agencies and integration into Microsoft 365 Copilot and Claude Enterprise...
Can Nokia's Gemini-Based Network Agents Make Autonomous Networks Practical
June 24, 2026

Can Nokia’s Gemini-Based Network Agents Make Autonomous Networks Practical?

Tom Hollingsworth, Networking Technology Advisor and Event Lead at The Futurum Group, examines how Nokia and Google Cloud are using AI agents to automate network assurance, accelerate fault resolution, and...
Voice Agent Latency: Why Milliseconds Matter for Enterprise AI Adoption
June 24, 2026

Voice Agent Latency: Why Milliseconds Matter for Enterprise AI Adoption

ElevenLabs reveals that voice agent latency impacts enterprise AI adoption. With 56% of organizations prioritizing AI-driven customer experience, optimizing response times becomes a key business differentiator....
Epicor Indago Warehouse Management
June 23, 2026

Can Epicor’s Indago-Karmak Integration Redefine Heavy-Duty Dealership Efficiency?

Epicor's Indago Warehouse Management System earns certified integration with Karmak Fusion, delivering real-time inventory visibility and operational accuracy improvements for heavy-duty truck dealerships....
Will a U.S. Quantum Foundry Leverage $4.6 Billion in New Capital to Become the Next TSMC?
June 23, 2026

Will a U.S. Quantum Foundry Leverage $4.6 Billion in New Capital to Become the Next TSMC?

Brendan Burke, Research Director at Futurum, shares his insights on how $4.6B in new capital is fueling U.S. quantum foundries modeled after TSMC, while identifying the cryogenic and packaging chain...
Can Zoom's Agent Architect Redefine the AI Agent Lifecycle for Enterprise CX
June 22, 2026

Can Zoom’s Agent Architect Redefine the AI Agent Lifecycle for Enterprise CX?

Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Di at Futurum, Zoom's Agent Architect and Performance Suite transform enterprise AI creation, deployment, and optimization with outcome-based pricing and...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.