Google Cloud’s TPU v5e Accelerates the AI Compute War

Google Cloud’s TPU v5e Accelerates the AI Compute War

The News: On August 29, as part of Google Cloud Next ‘23, Google Cloud announced the preview of its next-generation AI chip, the Cloud TPU v5e. TPUs, or Tensor Processing Units, were invented by Google and specifically designed for AI workloads, both for AI training and inference. The first Google TPUs were made available outside of Google use in 2018.

Here are some of the other pertinent details:

  • Google Cloud says TPU v5e delivers up to 2x higher training performance per dollar and 2.5x higher inference performance per dollar for large language models (LLMs) and generative AI models when compared to Google Cloud TPU v4.
  • Google Cloud says TPU v5e will cost less than half of TPU v4.
  • TPU v5e is versatile, supporting eight different virtual machine configurations, from one chip to more than 250 chips. This versatility enables configurations for a wide range of LLM and generative AI model sizes.
  • TPU v5e chips are now powering (as opposed to being in preview) Google Cloud’s Kubernetes service, Cloud TPUs in GKE, and Google Cloud’s managed AI service, Vertex AI.

Read the full post on the introduction of Google Cloud’s TPU v5e chip on the Google Cloud blog.

Google Cloud’s TPU v5e Accelerates the AI Compute War

Analyst Take: Current AI workloads are big and expensive, putting pressure on chipmakers and cloud providers to find cheaper, better faster ways to enable AI. A market ecosystem is emerging to address this, from AI-specific designed chips such as TPUs, language processing units (LPUs), neural processing units (NPUs), and edge-focused silicon to redesigned data centers and the possible resurrection of on-premises compute.

Business leaders understandably worry about cost to understand ROI, profit margins, etc. AI compute costs are completely nebulous right now for two reasons: the tech is still experimental and being refined (so it will scale), and a slew of players want to handle enterprise AI compute. Where this goes and how it ends up is tricky.

Google Cloud is a key player in the entire AI ecosystem stack, particularly in AI compute. The latest Google Cloud TPU will have an impact on AI compute economics. Here’s how:

More Efficient, Cheaper Compute

While the world marvels at what generative AI can do, CIOs and other IT leaders are scrambling to find reasonable ways to run massive AI compute workloads required for AI outputs. Google’s TPUs continue to get faster, use less power, are more affordable than previous iterations and, importantly, compared to NVIDIA’s GPUs. As noted, Google Cloud says TPU v5e delivers up to 2x higher training performance per dollar and 2.5x higher inference performance per dollar for LLMs and generative AI models when compared to Google Cloud TPU v4. In April of this year, Google said that TPU v4 outperformed the TPU v3 by 2.1x and 2.7x better performance by watt. While these are not necessarily apples-to-apples performance comparisons, the point here is Google Cloud’s TPUs are getting better and better. If Google has the inventory and makes TPUs readily available, enterprises running their own AI compute workloads might have a more efficient and economic path to lower AI workload costs. Additionally, some enterprises might be more interested in Google Cloud’s managed AI compute options, Cloud TPUs in GKE, and Vertex AI for the same economic reasons.

Uneasy Partnership with NVIDIA

Ironically in the TPU announcement, co-authors Amin Vahdat, VP/GM ML, Systems, and Cloud AI and Mark Lohmeyer, VP/GM Compute and ML Infrastructure, go on to talk about Google Cloud and NVIDIA’s ongoing partnership – in this case, the new A3 virtual machines:

“Today, we’re thrilled to announce that A3 VMs will be generally available next month. Powered by NVIDIA’s H100 Tensor Core GPUs, which feature the Transformer Engine to address trillion-parameter models, NVIDIA’s H100 GPU, A3 VMs are purpose-built to train and serve especially demanding gen AI workloads and LLMs. Combining NVIDIA GPUs with Google Cloud’s leading infrastructure technologies provides massive scale and performance and is a huge leap forward in supercomputing capabilities, with 3x faster training and 10x greater networking bandwidth compared to the prior generation. A3 is also able to operate at scale, enabling users to scale models to tens of thousands of NVIDIA H100 GPUs.”

It will be interesting to see how this partnership evolves. Already, it appears there are moves by NVIDIA to protect its AI compute dominance with GPUs, as noted in our previous research note on the emergence of CoreWeave. The company is gaining significant investment and customers because it is a cloud provider built specifically for AI, not generalized, workloads, and is proving to be highly efficient with NVIDIA hardware. NVIDIA exclusively provides GPU firepower for CoreWeave.

In the near term, enterprises will likely invest in a range of AI compute options because there is not a proven path. AI compute must become more efficient and cheaper for AI applications to flourish.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit

AMD and Hugging Face Team Up to Democratize AI Compute – Shrewd Alliance Could Lead to AI Compute Competition, Lower AI Cost

CoreWeave Secures $2.3 Billion in Debt Financing, Challenges for AI Compute

Author Information

Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.

Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.


Latest Insights:

Dell’s Justin Bandholz joins Dave Nicholson and Lisa Martin to share his insights on Dell's innovative new CSP Edition products — the PowerEdge R670 & R770, designed specifically for the challenges faced by Communication Service Providers.
Christina Day, Director of DRAM Product Marketing at Samsung Semiconductor, joins hosts Dave Nicholson and Lisa Martin to share her insights on how advanced memory technology is critical for accelerating and enhancing AI capabilities, highlighting the potential of Processing-In-Memory (PIM).
Skymel Introduces Groundbreaking AI Inferencing Technology to Optimize Costs and Enhance Application Performance
Paul Nashawaty, Practice Lead, and Sam Holschuh, Analyst, at The Futurum Group share their insight on how Skymel's NeuroSplit is set to reshape AI application development by optimizing GPU economics and enhancing end-user experience.
Net Loss Has Jumped over the Previous Quarter, But the Quantum Technology Is Improving
The Futurum Group’s Dr. Bob Sutor looks at quantum computing company Rigetti’s earnings in the first quarter of 2024. The net loss is troubling, but the company is making good technical progress on its small 9-qubit Novera quantum processing unit.