The News: Intel introduced the Intel Gaudi 3 accelerator to bring performance, openness, and choice to enterprise generative AI (GenAI), and unveiled a suite of new open scalable systems, next-gen products, and strategic collaborations aimed at accelerating GenAI adoption. Read the full press release on the Intel website.
Intel Vision 2024: Intel Unleashes Gaudi 3 Led Enterprise AI Strategy
Analysts Take: Intel unleashed a comprehensive AI strategy for enterprises, with open, scalable systems that are built to work across the full continuum of AI segments. Spearheading the new and refreshed Intel enterprise AI proposition is the launch of the Intel Gaudi 3 accelerator product. Intel emphasized that with only 10% of enterprises successfully moving GenAI projects into production last year, Intel’s latest offerings address the challenges businesses face in scaling AI initiatives.
The Intel Gaudi 3 AI accelerator is built to power AI systems with up to tens of thousands of accelerators connected through the common standard of Ethernet. Intel Gaudi 3 promises fourfold more AI compute for BF16 and a 1.5x increase in memory bandwidth over its predecessor. The accelerator can deliver sizable improvements in AI training and inference for global enterprises looking to deploy GenAI at scale.
Intel Gaudi 3 is designed to offer open, community-based software and industry-standard Ethernet networking. And it can allow enterprises to scale flexibly from a single node to clusters, super-clusters, and mega-clusters with thousands of nodes, supporting inference, fine-tuning, and training at the largest scale. Intel Gaudi 3 will be available to OEMs, such as Dell Technologies, Hewlett Packard Enterprise (HPE), Lenovo, and Supermicro, in Q2 2024.
Ron Westfall’s Take: From my view, Intel is solidly positioned to win immediate mindshare and capture new market share across the enterprise AI ecosystem through the emphasis on sales and marketing differentiators against major rival NVIDIA. In comparison to NVIDIA H100, Intel Gaudi 3 is projected to deliver 50% faster time-to-train on average across Meta’s Llama2 models with 7B (billion) and 13B parameters, and OpenAI’s GPT-3 175B parameter model. Plus, Intel Gaudi 3 accelerator inference throughput is projected to outperform the H100 by 50% on average and 40% for inference power-efficiency averaged across Llama 7B and 70B parameters, and TII’s Falcon 180B parameter models.
Moreover, I find that Intel Gaudi 3 is positioned to compete against NVIDIA’s recently launched Blackwell B200 AI accelerator chip while real-world testing of both offerings is pending. Integral to differentiation is that Gaudi 3 is constituted of two identical silicon dies joined by a high-bandwidth connection. Each has a principal region of 48 MB of cache memory. Surrounding that are four engines for matrix multiplication and 32 programmable tensor processor cores that are encircled by connections to memory and capped with media processing and network infrastructure at one end.
The Gaudi 3 power efficiency advantages are critical in meeting customer power challenges throughout datacenter environments including scenarios where the large language models (LLMs) are tasked with providing a longer output. The Gaudi architecture uses large-matrix math engines that are 512 bits across that require less memory bandwidth in relation to competing architectures, such as NVIDIA H100/H200/B200 and AMD M1300, that rely on smaller engines to execute the same calculations.
For memory bandwidth, Gaudi 3 uses the less costly high-bandwidth memory HBM2E in contrast to the HBM3/HBM3E implementations being used by rivals. Applying some back-of-envelope math indicates that Intel is poised to take advantage of ongoing memory price stratification that suggest that 141 GB of HBM3E is valued at around $25,000, a key factor in driving up the street price of NVIDIA’s H200 to north of $40,000. In contrast, 96 GB of HBM2E comes in at about $10,600. Bear in mind that these are guesstimates and do not provide an exacting “apples to apples” comparison; however, the upshot is that Intel Gaudi 3, by virtue of using HBM2E price advantages, can draw great interest through sheer cost savings including across key metrics such as price performance.
Russ Fellows’ Take: Intel is providing companies a choice of hardware vendors for their training and for operating their critical AI workloads. While competing AI accelerators often use proprietary technologies and have limited availability, Intel Gaudi offers a credible alternative without excessive cost or lead-times for AI training and inferencing workloads using popular LLMs and multimodal models.
The Gaudi line fits within Intel’s broader strategy around AI, termed AI Everywhere as an acknowledgement that in the rapidly approaching future, AI will be run on nearly every device, from handhelds and embedded systems, edge devices, computers using Intel processors along with datacenter AI workloads where Intel’s Gaudi line of accelerators are targeted.
In the fast-moving world of AI, it is imperative for companies to have alternative choices for developing and running new GenAI applications, including LLMs and other emerging GenAI applications. Intel’s Gaudi lineup of AI accelerators provides an alternative to existing proprietary accelerators that often have limited availability.
Based upon previously published reports, and upcoming Futurum Group Lab Insight Reports, we have found that companies who want to deploy production inferencing or develop fine-tuned LLMs and Retrieval Augmented Generation (RAG) models need access to leading AI accelerators.
The current Intel Gaudi 2 accelerator has industry-leading price performance based upon current prices and benchmark results from MLCommons. To date, Intel’s Gaudi line of AI hardware accelerators is one of only three vendors to have published results for MLCommons Datacenter AI Inferencing and AI Training workloads. We expect the upcoming Gaudi 3 to maintain its price performance leadership, offering a solid option for companies focused on exploring and productizing customized GenAI applications.
Intel Enterprise AI Ecosystem Credentials Strengthened by Extensive Customers and Partnerships
Intel is putting sales and marketing emphasis on its strategy for open scalable AI systems, consisting of hardware, software, frameworks, and tools. We see that Intel’s approach is conducive to catalyzing an extensive, open ecosystem of AI players to offer solutions that fulfill enterprise-specific GenAI needs. As such, this approach aligns with ensuring that enterprises can collaborate with the ecosystem partners and solutions that they already trust and know.
Intel’s partners and customers include equipment manufacturers, database providers, systems integrators, software suppliers, service providers, and other specialists encompassing NAVER, Bosch, IBM, Ola/Krutrim, NielsenIQ, an Advent International portfolio company, Seekr, IFF, CtrlS Group, Bharti Airtel, Landing AI, Roboflow, and Infosys. For example, IBM is using 5th Gen Intel Xeon processors for its watsonx/data data store and coordinating with Intel to validate the watsonx platform for Intel Gaudi accelerators. Infosys is working to bring Intel technologies including 4th and 5th Gen Intel Xeon processors, Intel Gaudi 2 AI accelerators, and Intel Core Ultra to Infosys Topaz—an AI-first set of services, solutions, and platforms that targets accelerating business value using GenAI technologies.
Intel also announced collaborations with Google Cloud, Thales, and Cohesity to use Intel’s confidential computing capabilities in their cloud instances. This includes Intel Trust Domain Extensions (Intel TDX), Intel Software Guard Extensions (Intel SGX), and Intel’s attestation service. From our view, “in Intel customers can trust” by running their AI models and algorithms in a trusted execution environment (TEE) alongside Intel’s trust services for independently verifying the trust worthiness of such TEEs.
To further stimulate broad ecosystem support of an open platform for enterprise AI, Intel is collaborating with Anyscale, Articul8, DataStax, Domino, Hugging Face, KX Systems, MariaDB, MinIO, Qdrant, RedHat, Redis, SAP, VMware, Yellowbrick, and Zilliz. The ecosystem-wide effort aims to develop open, multivendor GenAI systems that deliver ease-of-deployment, performance, and value, enabled by RAG. RAG enables the augmentation of massive enterprise proprietary data sources running on standard cloud infrastructure using open LLM capabilities, playing an integral role in accelerating GenAI adoption across enterprises.
The Intel Xeon 6 Processor Factor: Datacenter, Edge, and Cloud Advances
To further boost its enterprise AI proposition, Intel unveiled new Intel Xeon 6 processors targeted at datacenter, edge, and cloud environments. Intel Xeon 6 processors will have new Efficient-cores (E-cores), code-named Sierra Forest, offering improvements such as 2.4x performance per watt improvement and 2.7x better rack density compared with 2nd Gen Intel Xeon processors. As a result, customers can replace older systems at a ratio of nearly 3-to-1, lower energy consumption substantially, fulfill sustainability goals, and improve business outcomes.
Plus, Intel Xeon 6 with Performance-cores (P-cores), code-named Granite Rapids, will offer increased AI performance and are targeted to launch soon after the E-core processors. Intel Xeon 6 processors with P-cores incorporate software support for the MXFP4 data format, which can reduce next token latency by up to 6.5x versus 4th Gen Intel Xeon processors using FP16, with the ability to run 70B parameter Llama-2 models.
The Client, Edge, Connectivity, and UEC Dimension
Further bolstering its overall enterprise AI proposition, Intel spotlighted client, edge, and connectivity portfolio offerings and capabilities. Specifically, Intel Core Ultra processors are powering new capabilities for productivity, security, and content creation. From my perspective, this can help stimulate organizations to refresh their PC fleets to provide further warrant for their AI investment priorities, especially as the next-generation Intel Core Ultra client processor family, code-named Lunar Lake, launching in 2024, will have more than 100 platform tera operations per second (TOPS) and more than 45 neural processing unit (NPU) TOPS for next-generation AI PCs. Impressively, Intel expects to ship 40 million AI PCs in 2024, with more than 230 designs, from ultra-thin PCs to handheld gaming devices.
Intel announced new edge silicon across the Intel Core Ultra, Intel Core and Intel Atom processor and Intel Arc graphics processing unit (GPU) families of products, targeting key markets such as retail, industrial manufacturing, and healthcare. All new additions to Intel’s edge AI portfolio will be available Q2 2024 and will be supported by the Intel Tiber Edge Platform in 2024.
Notably, Intel unveiled the Intel Tiber portfolio of business solutions to streamline the deployment of enterprise software and services, including for GenAI. Full rollout is planned for Q3 2024. Intel Tiber, effectively the re-brand of the Edge Platform announced at MWC24, aims to provide the unified experience key to making it easier for enterprise customers and developers to find solutions that fit their needs, kindle innovation, and unlock value without compromising on security, compliance, or performance.
Through the Ultra Ethernet Consortium (UEC), Intel is cultivating open Ethernet networking for AI fabrics, introducing an array of AI-optimized Ethernet solutions. Designed to transform large scale-up and scale-out AI fabrics, these offerings can enable training and inferencing for increasingly vast models, with sizes expanding by an order of magnitude in each generation. The lineup includes the Intel AI NIC, AI connectivity chiplets for integration into XPUs, Gaudi-based systems, and a range of soft and hard reference AI interconnect designs for Intel Foundry.
I anticipate that UEC specs due in 2024 can help assure the Ethernet standard and technology will be ready to host and scale massive AI workloads, providing enduring competitive advantages over proprietary Nvidia-backed InfiniBand implementations. In accord with Ethernet-related breakthroughs, such as RDMA over Converged Ethernet (RoCE) advances, including RoCEv2 progress, aa well as lossless Ethernet advanced flow control, improved congestion handling, hashing improvements, buffering, and advanced flow telemetry that improve switch capabilities, UEC openness and flexibility can deliver cost and reduced complexity advantages over InfiniBand.
Key Takeaways: Intel Is Ready for Enterprise AI Prime Time
We believe that the array of new Intel Gaudi, Xeon, and Core Ultra portfolio offerings immediately bolster the Intel enterprise AI proposition, bringing AI innovation everywhere throughout enterprise PC, datacenter, cloud, and edge environments. As such, Intel can meet the swiftly evolving demands of the AI era, including GenAI workload optimization, by providing a unified set of agile solutions adapted to ensuring that customers and partners can attain business outcome gains from their expanding AI/GenAI investments.
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other insights from The Futurum Group:
Intel Q4 and FY 2023 Results: Transformation Progress Continues
Intel 5th Gen Xeon Scalable Processors Make Breakthroughs
Intel Enters the AI PC Race With Its NPU-Powered Core Ultra Processor
Author Information
Ron is an experienced, customer-focused research expert and analyst, with over 20 years of experience in the digital and IT transformation markets, working with businesses to drive consistent revenue and sales growth.
He is a recognized authority at tracking the evolution of and identifying the key disruptive trends within the service enablement ecosystem, including a wide range of topics across software and services, infrastructure, 5G communications, Internet of Things (IoT), Artificial Intelligence (AI), analytics, security, cloud computing, revenue management, and regulatory issues.
Prior to his work with The Futurum Group, Ron worked with GlobalData Technology creating syndicated and custom research across a wide variety of technical fields. His work with Current Analysis focused on the broadband and service provider infrastructure markets.
Ron holds a Master of Arts in Public Policy from University of Nevada — Las Vegas and a Bachelor of Arts in political science/government from William and Mary.
Russ brings over 25 years of diverse experience in the IT industry to his role at The Futurum Group. As a partner at Evaluator Group, he built the highly successful lab practice, including IOmark benchmarking.
Prior to Evaluator Group he worked as a Technology Evangelist and Storage Marketing Manager at Sun Microsystems. He was previously a technologist at Solbourne Computers in their test department and later moved to Fujitsu Computer Products. He started his tenure at Fujitsu as an engineer and later transitioned into IT administration and management.
Russ possesses a unique perspective on the industry through his experience as both a product marketing and IT consumer.
A Colorado native, Russ holds a Bachelor of Science in Applied Math and Computer Science from University of Colorado, Boulder, as well as a Master of Business Administration in International Business and Information Technology from University of Colorado, Denver.