Google Debuts Ironwood TPU to Drive Inference-Focused AI Architecture at Scale

Google Debuts Ironwood TPU to Drive Inference-Focused AI Architecture at Scale

Analyst(s): Daniel Newman
Publication Date: April 23, 2025

Google launched Ironwood, its seventh-generation TPU, at Cloud Next ’25. Designed specifically for inference, Ironwood scales to 9,216 chips delivering 42.5 exaflops of performance and introduces major improvements in efficiency, memory, and interconnect.

What is Covered in this Article:

  • Google announced Ironwood, its seventh-generation TPU, at Cloud Next ’25
  • Ironwood is built specifically for inference, unlike prior TPUs, which support both training and inference
  • At full scale (9,216 chips), Ironwood delivers 42.5 exaflops of compute
  • Each chip includes 192 GB of HBM, 7.2 TBps memory bandwidth, and 4,614 TFLOPs
  • Ironwood offers 2x the performance-per-watt of Trillium and is nearly 30x more efficient than TPU v2

The News: Google unveiled Ironwood, its seventh-generation Tensor Processing Unit (TPU), at the Cloud Next ’25 conference. Ironwood is the first TPU designed specifically for inference workloads, marking a departure from prior designs that combined training and inference capabilities.

Each Ironwood chip delivers 4,614 TFLOPs of compute performance and includes 192 GB of High Bandwidth Memory (HBM) and 7.2 TBps of memory bandwidth. When scaled to a 9,216-chip pod configuration, Ironwood delivers a total of 42.5 exaflops – more than 24 times the compute power of the El Capitan supercomputer, which offers 1.7 exaflops. Google claims Ironwood offers double the energy efficiency of its previous Trillium TPU (TPU v6) and nearly 30x the efficiency of the original TPU v2 from 2018.

Google Debuts Ironwood TPU to Drive Inference-Focused AI Architecture at Scale

Analyst Take: As the first TPU designed specifically for inference, Google’s launch of Ironwood marks a notable transition in how the company addresses the compute demands of AI at scale. The chip reflects a clear focus on inference optimization, supporting models that not only respond to prompts but proactively generate insights. Ironwood introduces architectural improvements across compute, memory, interconnect, and efficiency, and is integrated into Google Cloud’s AI Hypercomputer architecture alongside the Pathways software stack. As with previous TPUs, Ironwood is designed to accelerate machine learning workloads, particularly deep learning, and aims to deliver superior performance-per-dollar compared to general-purpose GPUs or CPUs, helping reduce infrastructure costs or expand compute capacity within existing budgets.

Google has already used previous TPU generations to train its Gemini models, a fact it revealed last year. With Ironwood, the company is doubling down on inference as it sees scalable infrastructure for generative video, language, text, and agentic models as key to its internal product roadmap and a critical test of value for enterprise cloud customers. Whether customers will use TPUs the way Google does—or continue favoring alternatives like NVIDIA—remains an open and important question.

Beyond technical enhancements, Ironwood strengthens Google’s positioning amid intensifying competition in the AI silicon market. NVIDIA continues to lead with its Blackwell Ultra and upcoming Rubin and Feynman chips, which power models like OpenAI’s GPT series. Amazon is ramping up with Trainium3, and Microsoft has begun deploying Maia for inferencing alongside NVIDIA hardware. In this context, Ironwood provides Google with an alternative that enables deeper control of cost, performance, and workload alignment, particularly for inference-heavy use cases.

Inference Optimization as the Core Design Principle

Ironwood is the first TPU purpose-built for inference. This shift aligns with what Google describes as the “age of inference,” where AI systems transition from reactive models to “thinking models” that retrieve, interpret, and generate information collaboratively. According to Amin Vahdat, VP/GM of ML, Systems & Cloud AI at Google Cloud, the importance of inference has risen significantly, and Ironwood is positioned to meet this demand with large-scale synchronous compute capability. The chip’s high throughput and low-latency interconnects aim to manage workloads such as large language models (LLMs), mixture of experts (MoEs), and advanced reasoning tasks.

Significant Scale and Performance Improvements

Ironwood delivers a peak of 4,614 TFLOPs per chip, scaling up to 42.5 exaflops in a full 9,216-chip configuration. This performance is over 24 times that of El Capitan, currently the world’s most powerful supercomputer. Each chip supports 192 GB of HBM – six times more than Trillium – and offers 7.2 TBps of bandwidth, a 4.5x increase. Ironwood’s performance is further enhanced by Google’s Inter-Chip Interconnect (ICI), which now supports 1.2 TBps of bidirectional communication. These specifications make Ironwood particularly well-suited for agentic AI applications, such as enterprise AI assistants, autonomous infrastructure management, and intelligent support systems, where models must operate independently and respond with contextually rich outputs.

Efficiency and Thermal Performance as Differentiators

Power efficiency was a central focus in Ironwood’s design. Google states that Ironwood offers 2x the performance per watt relative to Trillium and is nearly 30x more power-efficient than TPU v2. It also uses advanced liquid cooling to sustain continuous, heavy workloads, allowing twice the performance of standard air-cooled systems. These improvements address a growing constraint in AI infrastructure: energy availability. With AI models becoming larger and more resource-intensive, Google’s ability to scale capacity per watt could serve as a meaningful differentiator.

Software and System-Level Integration

Ironwood is integrated into Google’s AI Hypercomputer architecture, which brings together custom silicon, low-latency networking, and orchestration through Pathways—the machine learning runtime developed by Google DeepMind. Pathways enables inference workloads to scale beyond a single Ironwood pod, allowing hundreds of thousands of chips to be composed into large distributed systems. Ironwood also features an enhanced SparseCore accelerator for processing ultra-large embeddings, broadening its use beyond traditional AI to include ranking, recommendation, financial, and scientific workloads. Overall, Ironwood reflects a 10x generational performance leap and showcases the results of more than a decade of investment in Google’s TPU development. Its architectural scale and coordination make it a foundational component for powering Google’s own Gemini models, including Gemini 2.5 and Gemini Flash, within the AI Hypercomputer architecture that integrates custom silicon and the Pathways software stack.

While Amazon and Microsoft continue advancing their own chip development with Trainium and Maia, Google’s sustained investment in its TPU roadmap—now in its seventh generation with Ironwood—positions it to effectively support inference-heavy, energy-efficient, and memory-intensive workloads. At the same time, Google is working to make its software stack more accessible to developers, including those who are used to NVIDIA’s CUDA-based toolchains, which remain dominant in AI development. Google has emphasized its continued alignment with NVIDIA, but TPUs offer a viable alternative—and potentially one that could shift share over time, especially as the overall AI infrastructure market expands.

What to Watch:

  • Ironwood remains limited to Google Cloud, constraining broader enterprise experimentation and multi-cloud deployment flexibility
  • Competing hyperscalers like Microsoft and AWS may accelerate the development of inference-optimized silicon (e.g., Inferentia, Azure Maia)
  • The adoption of Gemini models and Pathways runtime will play a pivotal role in determining Ironwood’s customer traction
  • Enterprises must compare the total cost of ownership versus NVIDIA’s offerings, especially in mixed training/inference environments
  • A2A protocol uptake and multi-agent orchestration will be essential to making Ironwood relevant for real-world business use cases

See the complete blog on the launch of the Ironwood TPU at Google Cloud Next ’25 on the Google website.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Google Cloud Next 2025: The Yellow Brick Road to AI Transformation

At Google Cloud Next, Google Brings its Databases to Bear on Agentic AI Opportunity

Does Salesforce and Google’s Partnership Raise the Bar for AI Agent Capability?

Image Credit: Google Cloud

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

SHARE:

Latest Insights:

Lightmatter Breakthrough 3D Photonic Interposer Tech Provides Highest Bandwidth and Largest Die Complexes for AI Infrastructure Chips
The Futurum Group’s Ron Westfall shares his insights on why Lightmatter can be a frontrunner in photonic supercomputing through its Passage M1000 and L200 products aimed at transforming AI infrastructure with breakthrough 3D photonic innovation.
Sap Q1 FY 2025 Sees Continued Success in Cloud Transformation, With 27% Cloud Growth and Operating Margin up 810 BPS
Keith Kirkpatrick and Daniel Newman at The Futurum Group examine SAP’s Q1 FY 2025 earnings, highlighting strong cloud ERP growth, rising AI adoption, and resilience amid macro and tariff pressures.
As It Gathered Its Community, CyberArk Moves Beyond PAM With Bold Plays in Securing AI Agents, Machine Identity Control, and Unified Platform Integration
Krista Case, Research Director at The Futurum Group, shares her insights on CyberArk IMPACT 2025.
The Micron G9 NAND Mobile UFS 4.1 Solution Provides the Innovation Key to Enabling Faster and More Responsive Experiences on Flagship Smartphones
The Futurum Group’s Ron Westfall shares his insights on why Micron’s G9 NAND with UFS 4.1 and UFS 3.1 drives the mobile ecosystem forward by delivering faster, more efficient, and AI-optimized storage solutions.

Book a Demo

Thank you, we received your request, a member of our team will be in contact with you.