AWS Graviton5 Reframes the CPU as Agentic AI Infrastructure

AWS Graviton5 Reframes the CPU as Agentic AI Infrastructure

AWS makes its most powerful custom Arm processor generally available, positioning 192-core density and formally verified isolation as an essential orchestration fabric for multi-agent workloads at hyperscale.

What Is Covered in This Article:

  • AWS Graviton5 general availability via M9g and M9gd instances
  • The CPU’s repositioning as an agentic AI orchestration infrastructure
  • Meta’s commitment to tens of millions of Graviton cores
  • Formally verified security through the Nitro Isolation Engine
  • Real-time context platforms validating latency and throughput gains

The News: AWS announced that Graviton5-powered Amazon EC2 M9g and M9gd instances are now generally available. First previewed at re:Invent 2025, Graviton5 features 192 cores, a 5x larger L3 cache, DDR5-8800 memory, PCIe Gen 6, and up to 33% lower inter-core communication latency compared to its predecessor.

Early access customers reported significant gains: ClickHouse achieved a 36% performance boost with zero code changes, Honeycomb saw 36% better throughput per core across a six-month production test, and HubSpot observed query durations drop by up to 60% on MySQL databases. Meta is deploying Graviton at scale, starting with tens of millions of cores to support its agentic AI efforts, making Meta one of the largest Graviton customers in the world. Tacnode, an AI-native real-time data platform, benchmarked M9g against its current Graviton4 fleet and reported 20-30% throughput improvement and P99 tail latency more than halving under load.

Xiaowei Jiang, Tacnode CEO and Chief Architect, stated: “Graviton5’s higher memory bandwidth is a particularly good match for Tacnode’s bandwidth-sensitive mixed read/write paths at scale. Graviton5 will become the default compute tier for Tacnode on AWS.”

AWS Graviton5 Reframes the CPU as Agentic AI Infrastructure

Analyst Take: AWS Graviton5 represents a deliberate architectural thesis that agentic AI workloads require CPU infrastructure optimized for massive concurrency rather than single-thread peak performance. The general availability of M9g and M9gd instances, combined with Meta’s commitment to tens of millions of cores, signals that the industry’s largest AI builders view high-core-count Arm CPUs as essential scaffolding for orchestrating autonomous agent fleets. Tacnode’s benchmarks, showing P99 tail latency more than halving under load, validate that the gains translate directly into the stability profile that high-stakes agentic decisioning demands. This positions Graviton5 not as a GPU substitute but as the coordination fabric that keeps accelerators fed and agents responsive across concurrent execution environments. The central question is whether AWS can convert this architectural advantage into agentic workload gravity.

192 Cores Redefine Concurrency Economics for Agent Orchestration

The decision to pack 192 cores into a single socket with 33% lower inter-core latency reflects AWS’s view that agentic AI creates a fundamentally different compute profile than traditional cloud workloads. Agents that reason, generate code, and coordinate multi-step tasks require processors capable of sustaining large numbers of lightweight, latency-sensitive threads simultaneously without degrading under concurrent load. Graviton5’s 5x larger L3 cache and DDR5-8800 memory support reduce the data-fetch penalties that throttle concurrent workloads on lower-density architectures, while the move to PCIe Gen 6 ensures I/O does not become a secondary bottleneck as agent fleets scale.

AWS explicitly frames this generation around the shift from AI answering questions to AI taking actions, running code, using tools, evaluating results, and orchestrating multi-step tasks, positioning the CPU as the substrate on which these operations execute concurrently. The 25% compute improvement over Graviton4 compounds with the architectural changes to create a generation that targets workload density rather than peak per-core speed. CPU selection criteria for agentic AI workloads are shifting from peak compute throughput to sustained concurrent density and inter-core communication efficiency.

Meta’s Core Commitment Validates the CPU-as-AI-Infrastructure Thesis

Meta’s pledge to deploy tens of millions of Graviton cores for agentic AI represents one of the largest known CPU procurement commitments for AI-specific workloads. This commitment reframes the conventional narrative that AI infrastructure investment flows exclusively toward GPUs and accelerators, revealing a complementary demand for CPU fabric capable of managing context, coordinating tool use, and sustaining the reasoning loops that sit between inference calls.

Meta’s decision suggests that orchestrating agent fleets at scale requires dedicated CPU resources optimized for the concurrency, memory bandwidth, and low-latency characteristics that agentic workloads demand in ways that general-purpose processors cannot deliver at equivalent density. The breadth of Graviton’s existing footprint, powering over 350 instance types serving more than 120,000 customers across eight years of continuous investment, means Meta joins an established ecosystem rather than pioneering an unproven platform. AWS positions this customer concentration as evidence that the agentic CPU demand pattern has moved from speculative to structural across the industry.

Real-Time Context Platforms Expose the Latency Demands of Agentic Decisioning

Tacnode’s qualification benchmarks provide direct evidence that Graviton5’s architectural improvements translate into measurable gains for the most demanding agentic AI workloads. Tacnode Context Lake, an AI-native real-time data platform providing agents with millisecond-fresh context for fraud and risk decisions, reported 20-30% throughput improvements across standard workloads and P99 tail latency reductions of more than 50% on its most demanding context-serving paths. CEO and Chief Architect Xiaowei Jiang confirmed that the DDR5-8800 specification delivers a practical advantage for data-intensive agent orchestration.

The decision to make Graviton5 the default compute tier for Tacnode on AWS demonstrates that data infrastructure vendors are treating this generation as a step-change rather than an incremental upgrade. Platforms such as Tacnode sit at the intersection of agent orchestration and real-time decision-making, precisely where latency variance translates directly into operational risk for high-stakes use cases such as fraud prevention. These early proof points will need to translate into multi-agent workflows to prove that CPUs can attach to GPUs for autonomous inference.

Formally Verified Security Establishes a New Isolation Standard for Multi-Tenant AI

The Nitro Isolation Engine introduces formal verification into production cloud security, establishing a new baseline for workload isolation in multi-tenant environments running sensitive AI operations. This approach moves beyond conventional testing to mathematically demonstrate that the hypervisor behaves as intended across all possible states, not merely in specific test cases, eliminating categories of security vulnerabilities that probabilistic testing cannot guarantee against. AWS describes the Nitro Isolation Engine as a purpose-built component responsible for mediating all access to virtual machine memory, CPU register state, and I/O devices through a minimal set of APIs, reducing attack surface by architectural constraint rather than through layered mitigation.

Nitro represents a structural claim about the security standard required for autonomous agents that access sensitive data and execute consequential actions across organizational boundaries in shared infrastructure. For regulated industries where agentic AI adoption has been constrained by governance and isolation concerns, formally proven separation of tenant workloads addresses a specific barrier to deployment at scale. The competitive pressure this places on other hyperscalers to demonstrate equivalent isolation assurances could reshape procurement criteria for AI workloads in financial services, healthcare, and government sectors.

What to Watch:

  • Whether Meta’s deployment translates into publicly disclosed agentic AI products or remains an infrastructure-layer investment.
  • How Intel and AMD respond with competitive core density, inter-core latency, and agentic workload benchmarks for their next-generation server processors.
  • Whether the Nitro Isolation Engine’s formal verification approach becomes a procurement requirement for regulated-industry AI deployments.
  • How real-time context infrastructure vendors such as Tacnode shape purchasing criteria around tail latency predictability rather than average throughput.
  • The degree to which competing hyperscalers accelerate their own custom CPU programs in response to AWS’s agentic AI positioning

Read the full announcement on the AWS website.


Declaration of generative AI and AI-assisted technologies in the writing process: This content has been generated with the support of artificial intelligence technologies. Due to the fast pace of content creation and the continuous evolution of data and information, The Futurum Group and its analysts strive to ensure the accuracy and factual integrity of the information presented. However, the opinions and interpretations expressed in this content reflect those of the individual author/analyst. The Futurum Group makes no guarantees regarding the completeness, accuracy, or reliability of any information contained herein. Readers are encouraged to verify facts independently and consult relevant sources for further clarification.
Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.
Read the full Futurum Group Disclosure.

Other Insights From Futurum:

AWS Bets on Random Graph Theory: Will Cloud Network Resilience Define the Next Decade?

Bedrock Advanced Prompt Optimization Cuts the Cost of Model Switching

AWS Pushes the Agent Stack: Quick, Connect Verticals, OpenAI on Amazon Bedrock

Author Information

Brendan Burke, Research Director

Brendan is Research Director, Semiconductors, Supply Chain, and Emerging Tech. He advises clients on strategic initiatives and leads the Futurum Semiconductors Practice. He is an experienced tech industry analyst who has guided tech leaders in identifying market opportunities spanning edge processors, generative AI applications, and hyperscale data centers. 

Before joining Futurum, Brendan consulted with global AI leaders and served as a Senior Analyst in Emerging Technology Research at PitchBook. At PitchBook, he developed market intelligence tools for AI, highlighted by one of the industry’s most comprehensive AI semiconductor market landscapes encompassing both public and private companies. He has advised Fortune 100 tech giants, growth-stage innovators, global investors, and leading market research firms. Before PitchBook, he led research teams in tech investment banking and market research.

Brendan is based in Seattle, Washington. He has a Bachelor of Arts Degree from Amherst College.

Related Insights
Oracle Makes the Case for AI Inside Everyday Leadership Workflows
July 2, 2026

Oracle Makes the Case for AI Inside Everyday Leadership Workflows

Keith Kirkpatrick, Research Director at The Futurum Group, examines how Oracle Manager Edge embeds AI-powered coaching into Oracle Cloud HCM, bringing real-time guidance into managers' daily workflows and strengthening Oracle's...
Domino Data Lab From MLOps Platform to Governed AI Application Factory
July 2, 2026

Domino Data Lab: From MLOps Platform to Governed AI Application Factory

Nick Patience, VP and Practice Lead, AI Platforms at Futurum, examines Domino Data Lab's pivot to governed AI application delivery, its agentic AI governance framework, and what the strategy means...
Siemens and IFS Announce Alliance to Advance Industrial AI
July 2, 2026

Siemens and IFS Announce Alliance to Advance Industrial AI

Siemens and IFS have partnered to advance Industrial AI solutions, merging Siemens' industrial automation depth with IFS's AI-embedded ERP platform. The alliance targets asset-intensive industries as enterprise software demand accelerates....
Shopify’s PyTorch Foundation Move Signals a Power Shift in Open Source AI for Commerce
July 2, 2026

Shopify’s PyTorch Foundation Move Signals a Power Shift in Open Source AI for Commerce

Shopify's Platinum membership in the PyTorch Foundation signals a shift toward community-governed AI frameworks, avoiding vendor lock-in as enterprises increasingly deploy generative AI in production....
How Anthropic and OpenAI Are Building Everywhere Ecosystems
July 1, 2026

How Anthropic and OpenAI Are Building “Everywhere Ecosystems”

Alex Smith, VP & Practice Lead, Ecosystems, Channels & Marketplaces at Futurum, shares insights on how Anthropic and OpenAI are building 'Everywhere Ecosystems' and the multidimensional go-to-market strategies designed to...
NVIDIA Jetson in Lunar Orbit Signals Commercial GPUs Are Ready for Spaceflight
July 1, 2026

NVIDIA Jetson in Lunar Orbit Signals Commercial GPUs Are Ready for Spaceflight

Brendan Burke, Research Director at Futurum, analyzes how Firefly Aerospace's deployment of NVIDIA Jetson in lunar orbit proves commercial GPUs now support demanding long-duration spaceflight missions....

Book a Demo

Welcome

The vision behind everything in Futurum’s Custom Research practice is this: research should show you what is happening, what comes next, and what to do about it. It should be personal to each audience, easy for people to grasp, and structured so LLMs can reason over it accurately. And it should be fast and turnkey; you want answers now, not another project to carry for quarters.

Whether you are defining business, channel, or go-to-market strategy; evaluating vendors or justifying ROI; or commissioning research to fill an emerging market need, we have your back, with a program that answers your questions with the objectivity and credibility to drive real decisions.

To do it, we bring unmatched data to bear: Futurum research, surveys, and market projections; validated market feeds; ETR’s 15 years of insight from 10,000 technology decision-makers; G2’s buyer and user data; and what our analysts hear every day. Add leading primary collection, from AI-moderated voice interviews to surveys and analyst-led interviews, all turnkey, and every project comes out credible, nuanced, and actionable.

And we don’t just drop the results in your lap. For internal work, we provide analyst-led sessions, interactive dashboards, and a range of formats. For market-facing work, Futurum delivers turnkey activation and amplification that actually gets seen, by people and by LLMs, through our media and share of voice. This is research that moves decisions and markets.

We will meet you wherever you are, from a fast-turn brief to a multi-year program, and shape the work to your goals, timeline, and budget. The right program for your moment.

If any of this is useful, I would love to talk.

Benjamin Brown, VP Custom Research, Futurum Research

Benjamin Brown

VP, Custom Research · The Futurum Group

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.