Deciding When to Use Intel Xeon CPUs for AI Inference, AI Field Day

Alastair Cooke
| March 6, 2024

Introduction

Intel presented the capabilities of Intel Xeon CPUs for AI inference at AI Field Day, filling out a complete day with a series of Intel partner presentations following the same theme. Intel has been building workload-specific acceleration into CPU designs for over a decade. The 5th Generation Xeon Scalable CPUs added an AI-specific accelerator (AMX) alongside a few new built-in accelerators. This is part of the evidence that Intel is dedicated to allowing customers to run AI on their CPUs rather than requiring add-in card accelerators for every AI use.

Ronak Shah presented this continuing vision at AI Field Day 4 where delegates wanted to understand the decision points for using older Xeon CPUs, 5th Generation Xeon Scalable or adding an off-CPU accelerator such as an NVIDIA GPU. Ronak was very clear that not all AI use cases suit Intel Xeon CPUs for AI inference and that the decision is not clear-cut. The rule of thumb seems to be that large language models (LLMs) with over 20 billion parameters will seldom deliver acceptable performance on CPUs. Smaller models and non-LLM-based AI can often use Intel Xeon CPUs for AI inference and deliver the required latency.

The AI Pipeline CPU-GPU Sandwich

Ronak outlined Intel’s view of an AI pipeline, starting with training data preparation, a CPU-dominated task that mostly involved moving data and extract-transform-load (ETL) tasks. After data preparation, the next phase is model training, which is almost always a GPU-dominated task where the massive parallelization of a GPU can be continuously loaded. The third stage is inference, deploying the AI model to do its job. Ronak sees many production uses of Intel Xeon CPUs for AI inference. Mainly, when the AI is a part of a complete business application, this use of CPU for data prep, GPU for training, and CPU for inference is what I’m calling the AI pipeline CPU-GPU sandwich.

One of the big benefits of Intel Xeon CPUs for AI inference is that you already have them. There is no need to build a specialized infrastructure just for AI. The AI application can live alongside other applications on your shared computing platform. It is essential to recognize that generative AI is not the only player in the game; most production use of AI uses much smaller models. These smaller models are ideally suited to CPUs. Notably, the AMX accelerator speeds machine vision use cases up to two orders of magnitude compared with 4th Generation Xeon Scalable. In many production use cases, using Intel Xeon CPUs for AI inference makes sense.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

Intel’s AI Everywhere Event Unveils Strategic Moves in the Era of AI

Intel Developer Cloud: Driving AI Chip Design, Filling AI Workload Gap

Intel 5th Gen Xeon Scalable Processors Make Breakthroughs

Author Information

Alastair Cooke

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

Latest Insights:

June 30, 2025

Daniel Newman

Six Five Webcast

The Six Five Pod | EP 265: Will AI Take Your Job? Plus Intel’s Quiet Transformation & AMD’s Opportunity in the AI Chip Race

On episode 265 of The Six Five Pod, Patrick Moorhead and Daniel Newman, hosts of the Six Five podcast, dive into the latest tech news and trends. They discuss HPE Discover highlights, OpenAI's legal battles with Microsoft, and Amazon's AI-driven workforce changes. The hosts debate the ethics of AI companies using web data for training and analyze Intel's strategic shifts under new leadership. They also explore Micron's strong earnings and Nvidia's continued dominance in the AI chip market. Throughout, Patrick and Daniel offer insightful commentary on the rapidly evolving tech landscape, punctuated with their signature banter and industry expertise.

AI Platforms, Semiconductors, Supply Chain, & Emerging Tech

June 27, 2025

Keith Kirkpatrick

Six Five Webcast

Transforming Workplace Experiences with Logitech + Microsoft – Six Five Media Webcast

In this episode of The Six Five Media webcast, Logitech’s Rishi Kumar and Microsoft’s Sandhya Rao discuss the new meeting technologies announced at their respective events, focusing on Microsoft Copilot, efficiency gains, and BYOD.

AI Platforms, Enterprise Software, & Digital Workflows

June 27, 2025

Olivier Blanchard

Market Coverage, News

Meta and Oakley Launch Performance AI Glasses With 3K Video and Built-in Meta AI

Meta and Oakley Introduce Oakley Meta HSTN, a New Category of Performance AI Glasses Tailored for Athletes and Fans

Olivier Blanchard, Research Director at Futurum, shares insights on Oakley Meta HSTN - smart glasses from Meta and Oakley that blend AI, 3K video, and sport-focused design for hands-free performance and real-world utility.

AI Platforms, Intelligent Devices, Semiconductors, Supply Chain, & Emerging Tech

June 27, 2025

David Nicholson & Keith Townsend

Six Five Webcast

Data is Your Strategy: Building Tomorrow Begins with Your Storage Infrastructure – Six Five In The Booth at HPE Discover Las Vegas 2025

Jim O'Dorisio, SVP & GM at HPE Storage, joins David Nicholson and Keith Townsend to share insights on transforming storage strategies for a data-driven world, the integration of AI in data management, and the imperative of cyber resilience.

AI Platforms, Hybrid Cloud, Infrastructure, and Operations

The Futurum Group

(833) 722-5337

501 West Ave., Suite 2102
Austin TX 78701

Deciding When to Use Intel Xeon CPUs for AI Inference, AI Field Day

Introduction

The AI Pipeline CPU-GPU Sandwich

Other Insights from The Futurum Group:

Author Information

Alastair Cooke

SHARE:

Latest Insights:

The Six Five Pod | EP 265: Will AI Take Your Job? Plus Intel’s Quiet Transformation & AMD’s Opportunity in the AI Chip Race

Transforming Workplace Experiences with Logitech + Microsoft – Six Five Media Webcast

Meta and Oakley Launch Performance AI Glasses With 3K Video and Built-in Meta AI

Data is Your Strategy: Building Tomorrow Begins with Your Storage Infrastructure – Six Five In The Booth at HPE Discover Las Vegas 2025

The Futurum Group

Welcome to The Futurum Group

Book a Demo

Thank you, we received your request, a member of our team will be in contact with you.