The News: On January 4, Databricks published a technical blog post outlining the results of its performance testing of Intel’s Gaudi 2 AI accelerators. In describing itself, Databricks says its customers “rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI.” The posts’ authors, Abhi Venigalla and Daya Khudia, said the genesis of the project is that Databricks customers come to them for advice and help on how to train a custom AI model.
“One lever we have to address this challenge is machine learning (ML) hardware optimization; to that end, we have been working tirelessly to ensure our LLM stack can seamlessly support a variety of ML hardware platforms (e.g., NVIDIA, AMD). Today, we are excited to discuss another major player in the AI training and inference market: the Intel Gaudi family of AI Accelerators! These accelerators are available via AWS (first-generation Gaudi), the Intel Developer Cloud (Gaudi 2), and for on-premise implementations, Supermicro and WiWynn (Gaudi and Gaudi 2 accelerators).” It should be noted that Databricks has not tested NVIDIA’s latest AI Accelerator, the GH200 Grace Hopper.
Here are the key details of the tests:
- For large language model (LLM) training workloads, Intel’s Gaudi 2 accelerator has the second best training performance per chip that Databricks has tested, only bested by NVIDIA’s H100.
- For LLM inference workloads, Intel’s Gaudi 2 accelerator matches NVIDIA’s H100 system in decoding latency, the most expensive phase of LLM inference.
- Based on public pricing, Databricks found Intel’s Gaudi 2 accelerator has the best training and inference performance per dollar of the systems tested.
AI Compute Relief? Intel Gaudi 2 Databricks Testing Indicates Yes
Analyst Take: Current AI workloads are big and expensive, putting pressure on chipmakers and cloud providers to find cheaper, better, and faster ways to enable AI. A market ecosystem is emerging to address this challenge, from AI-specific-designed chips such as tensor processing units (TPUs), language processing units (LPUs), neural processing units (NPUs), and reimagined central processing units (CPUs).
Business leaders understandably worry about AI compute costs and struggle with ROI, profit margins, and similar considerations. AI compute costs are completely nebulous right now for two reasons: the technology is still experimental and being refined (so that it will scale) and a slew of players want to handle enterprise AI compute. Determining where this market goes and how it ends up is tricky.
Intel has been investing heavily to develop chips that will run AI workloads more efficiently, and the Gaudi line is Intel’s cornerstone play. Intel will have an impact on AI compute economics. Here are our thoughts.
Intel and AMD’s AI Chips Will Expand AI Use in 2024, Possibly Driving AI Compute Costs Lower
Some tech experts will likely nitpick at Databrick’s testing regimen and challenge the performance results. Regardless, it is clear that both Intel and AMD are bringing viable AI accelerators to market to augment NVIDIA’s standard-bearing AI accelerators. NVIDIA’s hardware has been in short supply and has limited the expansion of operationalized AI applications. If Intel and AMD have adequate supply, real-life use of AI will explode in 2024, triggering the actual productivity gains, new revenue, etc. that have been envisioned with AI use cases. Further, if supply and demand and competition principles hold true, the cost of these AI accelerators will start to drop, which will help lower AI compute costs. However, more significant AI compute workload cost reductions will come from the higher cost/performance ratings of these new generation AI accelerators over legacy systems being used today.
Open Source Interoperability Drives Efficiencies
Databricks noted an efficiency that is quickly taking hold—the growth of open source, interoperable software to manage the AI compute stack: “Thanks to the interoperability of PyTorch and open source libraries (e.g., DeepSpeed, Composer, StreamingDataset, LLM Foundry, Habana Optimum), users can run the same LLM workloads on NVIDIA, AMD, or Intel or even switch between the platforms.”
Space-Race Technological Progress Points to Further Efficiency Gains
Considering the product development cycle for silicon, the speed in which this remarkably better, new generation of AI accelerators might someday be compared with the technological marvel that was the space race of the 1960s. Chip manufacturers are committed and investing heavily into making AI compute workloads efficient and cheap enough to support widespread operationalized AI applications. Note what the Databricks authors had to say about Intel Gaudi 3, expected to reach the market sometime in 2024: “Looking ahead to the Intel Gaudi 3, we expect the same interoperability but with even higher performance. The projected public information specs suggest that Intel Gaudi 3 should have more FLOP/s and memory bandwidth than all the major competitors (NVIDIA H100, AMD MI300X). Given the great training and inference utilization numbers we already see today on Gaudi 2, we are very excited about Gaudi 3 and look forward to profiling it when it arrives.”
Conclusion
This bit of news is promising for Intel following a year of questions surrounding its ability to compete against NVIDIA in the so-called “AI chip” race and fears of a potentially dangerous miss, especially given the size of the market opportunity. Gaudi 2’s performance also speaks to Intel’s ability to execute and should help validate the fundamentals of the company’s product roadmap, particularly at this critical juncture.
Intel’s Gaudi 2 stellar performance testing points to wider availability of a growing range of highly efficient purpose-built AI accelerators. This key trigger should lead to an explosion of operationalized AI in 2024.
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other Insights from The Futurum Group:
Intel Gaudi2: A CPU Alternative to GPUs in the AI War?
Intel AI Everywhere: Ambitious Vision for the Tech Titan
Google Cloud’s TPU v5e Accelerates the AI Compute War
Author Information
Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.
Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.
Olivier Blanchard has extensive experience managing product innovation, technology adoption, digital integration, and change management for industry leaders in the B2B, B2C, B2G sectors, and the IT channel. His passion is helping decision-makers and their organizations understand the many risks and opportunities of technology-driven disruption, and leverage innovation to build stronger, better, more competitive companies. Read Full Bio.