The News: On August 29, as part of Google Cloud Next ‘23, Google Cloud announced the preview of its next-generation AI chip, the Cloud TPU v5e. TPUs, or Tensor Processing Units, were invented by Google and specifically designed for AI workloads, both for AI training and inference. The first Google TPUs were made available outside of Google use in 2018.
Here are some of the other pertinent details:
- Google Cloud says TPU v5e delivers up to 2x higher training performance per dollar and 2.5x higher inference performance per dollar for large language models (LLMs) and generative AI models when compared to Google Cloud TPU v4.
- Google Cloud says TPU v5e will cost less than half of TPU v4.
- TPU v5e is versatile, supporting eight different virtual machine configurations, from one chip to more than 250 chips. This versatility enables configurations for a wide range of LLM and generative AI model sizes.
- TPU v5e chips are now powering (as opposed to being in preview) Google Cloud’s Kubernetes service, Cloud TPUs in GKE, and Google Cloud’s managed AI service, Vertex AI.
Read the full post on the introduction of Google Cloud’s TPU v5e chip on the Google Cloud blog.
Google Cloud’s TPU v5e Accelerates the AI Compute War
Analyst Take: Current AI workloads are big and expensive, putting pressure on chipmakers and cloud providers to find cheaper, better faster ways to enable AI. A market ecosystem is emerging to address this, from AI-specific designed chips such as TPUs, language processing units (LPUs), neural processing units (NPUs), and edge-focused silicon to redesigned data centers and the possible resurrection of on-premises compute.
Business leaders understandably worry about cost to understand ROI, profit margins, etc. AI compute costs are completely nebulous right now for two reasons: the tech is still experimental and being refined (so it will scale), and a slew of players want to handle enterprise AI compute. Where this goes and how it ends up is tricky.
Google Cloud is a key player in the entire AI ecosystem stack, particularly in AI compute. The latest Google Cloud TPU will have an impact on AI compute economics. Here’s how:
More Efficient, Cheaper Compute
While the world marvels at what generative AI can do, CIOs and other IT leaders are scrambling to find reasonable ways to run massive AI compute workloads required for AI outputs. Google’s TPUs continue to get faster, use less power, are more affordable than previous iterations and, importantly, compared to NVIDIA’s GPUs. As noted, Google Cloud says TPU v5e delivers up to 2x higher training performance per dollar and 2.5x higher inference performance per dollar for LLMs and generative AI models when compared to Google Cloud TPU v4. In April of this year, Google said that TPU v4 outperformed the TPU v3 by 2.1x and 2.7x better performance by watt. While these are not necessarily apples-to-apples performance comparisons, the point here is Google Cloud’s TPUs are getting better and better. If Google has the inventory and makes TPUs readily available, enterprises running their own AI compute workloads might have a more efficient and economic path to lower AI workload costs. Additionally, some enterprises might be more interested in Google Cloud’s managed AI compute options, Cloud TPUs in GKE, and Vertex AI for the same economic reasons.
Uneasy Partnership with NVIDIA
Ironically in the TPU announcement, co-authors Amin Vahdat, VP/GM ML, Systems, and Cloud AI and Mark Lohmeyer, VP/GM Compute and ML Infrastructure, go on to talk about Google Cloud and NVIDIA’s ongoing partnership – in this case, the new A3 virtual machines:
“Today, we’re thrilled to announce that A3 VMs will be generally available next month. Powered by NVIDIA’s H100 Tensor Core GPUs, which feature the Transformer Engine to address trillion-parameter models, NVIDIA’s H100 GPU, A3 VMs are purpose-built to train and serve especially demanding gen AI workloads and LLMs. Combining NVIDIA GPUs with Google Cloud’s leading infrastructure technologies provides massive scale and performance and is a huge leap forward in supercomputing capabilities, with 3x faster training and 10x greater networking bandwidth compared to the prior generation. A3 is also able to operate at scale, enabling users to scale models to tens of thousands of NVIDIA H100 GPUs.”
It will be interesting to see how this partnership evolves. Already, it appears there are moves by NVIDIA to protect its AI compute dominance with GPUs, as noted in our previous research note on the emergence of CoreWeave. The company is gaining significant investment and customers because it is a cloud provider built specifically for AI, not generalized, workloads, and is proving to be highly efficient with NVIDIA hardware. NVIDIA exclusively provides GPU firepower for CoreWeave.
In the near term, enterprises will likely invest in a range of AI compute options because there is not a proven path. AI compute must become more efficient and cheaper for AI applications to flourish.
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other insights from The Futurum Group:
Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit
CoreWeave Secures $2.3 Billion in Debt Financing, Challenges for AI Compute
Author Information
Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.
Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.