The News: On September 11, Intel announced the results of the Gaudi2 Intel chip tested in the MLCommons MLPerf Inference performance benchmark for GPT-J. GPT-J is an open source AI model from Eleuther AI, developed as an alternative to OpenAI’s GPT-3. The MLPerf tests are the most widely used and recognized machine learning (ML) benchmark tests.
Here are some of the pertinent details of Gaudi2’s performance:
- Gaudi2 inference performance on GPT-J-99 and GPT-J-99.9 for server queries and offline samples are 78.58 per second and 84.08 per second, respectively.
- Gaudi2 delivers compelling performance versus NVIDIA’s H100, with H100 showing a slight advantage of 1.09x (server) and 1.28x (offline) performance relative to Gaudi2.
- Gaudi2 outperforms NVIDIA’s A100 by 2.4x (server) and 2x (offline).
- The Gaudi2 submission employed FP8 and reached 99.9% accuracy on this new data type.
Read the full blog post on the Gaudi2 MLPerf GPT-J performance test on the Intel website.
Intel Gaudi2: A CPU Alternative to GPUs in the AI War?
Analyst Take: Current AI workloads are big and expensive, putting pressure on chipmakers and cloud providers to find cheaper, better faster ways to enable AI. A market ecosystem is emerging to address this challenge, from AI-specific-designed chips such as Tensor Processing Units (TPUs), Language Processing Units (LPUs), Neural Processing Units (NPUs), and reimagined central processing units (CPUs).
Business leaders understandably worry about AI compute costs and struggle with ROI, profit margins, and similar considerations. AI compute costs are completely nebulous right now for two reasons: the technology is still experimental and being refined (so that it will scale) and a slew of players want to handle enterprise AI compute. Determining where this market goes and how it ends up is tricky.
Intel is the best-known chipmaker in the world. However, CPUs, which are the chips Intel is so masterful with, have not been the chips that run AI workloads, so far. Intel has been investing heavily to develop chips that will run AI workloads more efficiently, and the Gaudi line is Intel’s cornerstone play. Intel will have an impact on AI compute economics. Here is how.
More Efficient, Cheaper Compute
While the world marvels at what generative AI can do, CIOs and other IT leaders are scrambling to find reasonable ways to run the massive AI compute workloads required for AI output. Most experts believe AI workloads will shift over time from the majority being AI training, which to date has required graphics processing units (GPUs), to the majority being AI inference, which is more efficiently handled by CPUs and other non-GPU chips. If the shift comes to fruition, it bodes well for the Intel Gaudi line – the performance level of these early Gaudi chips is very close to the NVIDIA H100 and A100 options – but maybe more importantly, the Gaudi chips are less expensive and run more efficiently than do the NVIDIA GPUs. This price-performance balance is the Intel strategy.
However, Intel is challenged as are all chipmakers these days in terms of meeting demand. Intel executives told The Futurum Group and a few other analysts that they have availability of Gaudi2 chips through their OEM partners and they are ramping up production for Gaudi2 chips, but they do feel the crunch of accelerated demand.
Perhaps what might be more intriguing is the potential of the next-generation Gaudi chip, Gaudi3, which is slated to debut some time next year. Intel believes this latest chip will improve further on AI inference performance and enable Intel to be even more competitive in price-performance comparisons to GPU options.
In the near term, enterprises will likely invest in a range of AI compute options because there is not a proven path. AI compute must become more efficient and cheaper for AI applications to flourish.
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other insights from The Futurum Group:
Google Cloud’s TPU v5e Accelerates the AI Compute War
Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit
Author Information
Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.
Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.