Google Cloud’s TPU v5e Accelerates the AI Compute War

Google Cloud’s TPU v5e Accelerates the AI Compute War

The News: On August 29, as part of Google Cloud Next ‘23, Google Cloud announced the preview of its next-generation AI chip, the Cloud TPU v5e. TPUs, or Tensor Processing Units, were invented by Google and specifically designed for AI workloads, both for AI training and inference. The first Google TPUs were made available outside of Google use in 2018.

Here are some of the other pertinent details:

  • Google Cloud says TPU v5e delivers up to 2x higher training performance per dollar and 2.5x higher inference performance per dollar for large language models (LLMs) and generative AI models when compared to Google Cloud TPU v4.
  • Google Cloud says TPU v5e will cost less than half of TPU v4.
  • TPU v5e is versatile, supporting eight different virtual machine configurations, from one chip to more than 250 chips. This versatility enables configurations for a wide range of LLM and generative AI model sizes.
  • TPU v5e chips are now powering (as opposed to being in preview) Google Cloud’s Kubernetes service, Cloud TPUs in GKE, and Google Cloud’s managed AI service, Vertex AI.

Read the full post on the introduction of Google Cloud’s TPU v5e chip on the Google Cloud blog.

Google Cloud’s TPU v5e Accelerates the AI Compute War

Analyst Take: Current AI workloads are big and expensive, putting pressure on chipmakers and cloud providers to find cheaper, better faster ways to enable AI. A market ecosystem is emerging to address this, from AI-specific designed chips such as TPUs, language processing units (LPUs), neural processing units (NPUs), and edge-focused silicon to redesigned data centers and the possible resurrection of on-premises compute.

Business leaders understandably worry about cost to understand ROI, profit margins, etc. AI compute costs are completely nebulous right now for two reasons: the tech is still experimental and being refined (so it will scale), and a slew of players want to handle enterprise AI compute. Where this goes and how it ends up is tricky.

Google Cloud is a key player in the entire AI ecosystem stack, particularly in AI compute. The latest Google Cloud TPU will have an impact on AI compute economics. Here’s how:

More Efficient, Cheaper Compute

While the world marvels at what generative AI can do, CIOs and other IT leaders are scrambling to find reasonable ways to run massive AI compute workloads required for AI outputs. Google’s TPUs continue to get faster, use less power, are more affordable than previous iterations and, importantly, compared to NVIDIA’s GPUs. As noted, Google Cloud says TPU v5e delivers up to 2x higher training performance per dollar and 2.5x higher inference performance per dollar for LLMs and generative AI models when compared to Google Cloud TPU v4. In April of this year, Google said that TPU v4 outperformed the TPU v3 by 2.1x and 2.7x better performance by watt. While these are not necessarily apples-to-apples performance comparisons, the point here is Google Cloud’s TPUs are getting better and better. If Google has the inventory and makes TPUs readily available, enterprises running their own AI compute workloads might have a more efficient and economic path to lower AI workload costs. Additionally, some enterprises might be more interested in Google Cloud’s managed AI compute options, Cloud TPUs in GKE, and Vertex AI for the same economic reasons.

Uneasy Partnership with NVIDIA

Ironically in the TPU announcement, co-authors Amin Vahdat, VP/GM ML, Systems, and Cloud AI and Mark Lohmeyer, VP/GM Compute and ML Infrastructure, go on to talk about Google Cloud and NVIDIA’s ongoing partnership – in this case, the new A3 virtual machines:

“Today, we’re thrilled to announce that A3 VMs will be generally available next month. Powered by NVIDIA’s H100 Tensor Core GPUs, which feature the Transformer Engine to address trillion-parameter models, NVIDIA’s H100 GPU, A3 VMs are purpose-built to train and serve especially demanding gen AI workloads and LLMs. Combining NVIDIA GPUs with Google Cloud’s leading infrastructure technologies provides massive scale and performance and is a huge leap forward in supercomputing capabilities, with 3x faster training and 10x greater networking bandwidth compared to the prior generation. A3 is also able to operate at scale, enabling users to scale models to tens of thousands of NVIDIA H100 GPUs.”

It will be interesting to see how this partnership evolves. Already, it appears there are moves by NVIDIA to protect its AI compute dominance with GPUs, as noted in our previous research note on the emergence of CoreWeave. The company is gaining significant investment and customers because it is a cloud provider built specifically for AI, not generalized, workloads, and is proving to be highly efficient with NVIDIA hardware. NVIDIA exclusively provides GPU firepower for CoreWeave.

In the near term, enterprises will likely invest in a range of AI compute options because there is not a proven path. AI compute must become more efficient and cheaper for AI applications to flourish.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit

AMD and Hugging Face Team Up to Democratize AI Compute – Shrewd Alliance Could Lead to AI Compute Competition, Lower AI Cost

CoreWeave Secures $2.3 Billion in Debt Financing, Challenges for AI Compute

Author Information

Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Related Insights
Industrial AI
April 23, 2026

Can Lenovo’s AI Manufacturing Push at Hannover Messe Rewrite the Playbook for Industrial Scale?

Lenovo showcases AI solutions at Hannover Messe 2026, claiming 85% faster lead times. With 94% of manufacturers planning AI investment increases, competition intensifies between Lenovo, Siemens, and Rockwell Automation....
Is Anthropic’s $100 Billion Pact for AWS Silicon a Bargain in a Supply-Constrained Market?
April 23, 2026

Is Anthropic’s $100 Billion Pact for AWS Silicon a Bargain in a Supply-Constrained Market?

Brendan Burke, Research Director at Futurum, examines how Anthropic's $100 billion decade-long commitment to AWS Trainium and Graviton reshapes frontier AI infrastructure economics and supply dynamics....
ChatGPT Images 2.0 Raises the Stakes in Enterprise AI—But Will Reliability Keep Pace?
April 23, 2026

ChatGPT Images 2.0 Raises the Stakes in Enterprise AI—But Will Reliability Keep Pace?

OpenAI's ChatGPT Images 2.0 intensifies competition with Microsoft and Google, but enterprise adoption hinges on reliability. Futurum Group's Decision Maker Survey reveals 55% cite AI agent hallucination management as the...
Qodo Hands PR-Agent to the Community: Will Open Governance Accelerate AI Code Review?
April 23, 2026

Qodo Hands PR-Agent to the Community: Will Open Governance Accelerate AI Code Review?

Qodo's transfer of PR-Agent to community ownership marks a pivotal test for open-source AI against proprietary competitors demanding transparency and rapid innovation....
Qualcomm’s Snapdragon Wear Elite Redefines the AI Wearable Stakes—But Who Wins the Wrist War?
April 22, 2026

Qualcomm’s Snapdragon Wear Elite Redefines the AI Wearable Stakes—But Who Wins the Wrist War?

Qualcomm's Snapdragon Wear Elite marks a turning point in wearable AI, delivering a dedicated neural processing unit for on-device intelligence, privacy, and real-time voice interactions—positioning the company against Apple and...
VAST Data Valuation Triples. Can a Unified Platform Scale AI Globally?
April 22, 2026

VAST Data Valuation Triples. Can a Unified Platform Scale AI Globally?

Brad Shimmin, Vice President & Practice Lead at Futurum, analyzes VAST Data valuation and its AI operating system strategy, questioning whether unified infrastructure can scale amid persistent market fragmentation....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.