VMware Private AI at AI Field Day: CPUs If You Can, GPUs If You Must

VMware Private AI at AI Field Day: CPUs If You Can, GPUs If You Must

Introduction

VMware Private AI presented at AI Field Day about how to run your AI on-premises and get the best out of the CPUs you already have in your data center. If I had to summarize the presentation in one sentence, it would be this: You can run large language model (LLM) AI inference in virtual machines (VMs) on Intel Sapphire Rapids CPUs; you don’t always need GPUs. The vital part is that Sapphire Rapids (4th Generation Xeon Scalable) CPUs added the Advanced Matrix Extensions (AMX) instructions that allow the CPU to do matrix math efficiently.

This matrix math is precisely what GPUs do at large scale, so Intel adding them to a CPU is a big deal for AI workloads. From VMware’s Earl Ruby, we heard there are a few hurdles for the infrastructure team to ensure vSphere VMs can access the AMX instructions because they are so new. Those hurdles are familiar to seasoned vSphere administrators: minimum ESXi version 8.0U2, VM hardware V20, and Linux kernel, preferably above 5.19. The good news is that support for Intel AMX is already present in many popular AI tools, such as PyTorch, so the AI development team does not need to do anything special to benefit from AMX. One element of optimization is essential: quantization, which changes the LLM to use lower precision integer math to replace floating point. Quantization balances the required precision against the resource cost to get the question. Less precise math costs less CPU and memory but at the cost of a less accurate answer.

Earl had an interesting comparison, running the Llama-2 7 billion parameter model as a chatbot on Intel Ice Lake CPUs (no AMX) compared to Sapphire Rapids with AMX. Both were shown in VMs without any GPU capability. The chatbot on Sapphire Rapids ran approximately 8x faster than on Ice Lake. Without AMX, the Ice Lake CPU didn’t deliver a responsive chatbot, and we would have needed to add a GPU for acceptable performance. On Sapphire Rapids with AMX, the chatbot was perfectly usable. Would the chatbot have run faster if the Sapphire Rapids VM had a GPU? Undoubtedly. Would the GPU’s additional cost and power consumption have delivered better value than using AMX in the CPU? That will depend entirely on the value of the lower latency in your application and your business.

Earl also led us through hurdles in getting the AMX instructions available to pods running in VMware Tanzu Kubernetes. Again, these are relatively well-known hurdles for a seasoned vSphere and Tanzu administrator. Like the ESXi hurdles, these Tanzu hurdles will disappear over time as software versions in use catch up with the AMX features added in Sapphire Rapids.

As we have seen before, Intel adds built-in accelerators to its CPUs to remove the need for add-in card accelerators. I’m old enough to remember when SSL offload cards were essential for web servers, along came the AES-NI instructions and now cryptography, including SSL, is baked into the CPU. Over time, I expect to see more and more AI use cases that the CPU can fulfil alone. Running mixed workloads on a shared pool of servers has always been a value proposition for VMware. Allowing AI onto this same pool of servers without requiring additional hardware is a clear win.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

VMware Explore: Making Moves with Multi-Cloud and Private AI

VMware VCF and Tanzu Post Broadcom: Lessons and Evolution

Broadcom Redefines VMware

Author Information

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

Related Insights
Qualcomm’s Snapdragon Wear Elite Redefines the AI Wearable Stakes—But Who Wins the Wrist War?
April 22, 2026

Qualcomm’s Snapdragon Wear Elite Redefines the AI Wearable Stakes—But Who Wins the Wrist War?

Qualcomm's Snapdragon Wear Elite marks a turning point in wearable AI, delivering a dedicated neural processing unit for on-device intelligence, privacy, and real-time voice interactions—positioning the company against Apple and...
VAST Data Valuation Triples. Can a Unified Platform Scale AI Globally?
April 22, 2026

VAST Data Valuation Triples. Can a Unified Platform Scale AI Globally?

Brad Shimmin, Vice President & Practice Lead at Futurum, analyzes VAST Data valuation and its AI operating system strategy, questioning whether unified infrastructure can scale amid persistent market fragmentation....
Cerebras S-1 Teardown: Is the $23B Wafer-Scale IPO the End of GPU Homogeneity?
April 22, 2026

Cerebras S-1 Teardown: Is the $23B Wafer-Scale IPO the End of GPU Homogeneity?

Brendan Burke, Research Director at Futurum, examines Cerebras Systems' S-1 filing and $23B valuation, dissecting the $20B OpenAI deal, 86% UAE revenue concentration, and whether wafer-scale silicon can survive the...
Free Notification Sound Effects: Are Royalty-Free SFX the Next Enterprise UX Edge?
April 22, 2026

Free Notification Sound Effects: Are Royalty-Free SFX the Next Enterprise UX Edge?

ElevenLabs' new free royalty-free SFX offering removes licensing barriers for enterprise audio branding. As digital products compete for user attention, professional-grade notification sounds become a strategic UX differentiator....
Free Notification SFX: Does High-Quality Audio Democratize Digital Experience?
April 22, 2026

Free Notification SFX: Does High-Quality Audio Democratize Digital Experience?

ElevenLabs democratizes audio creation with free, high-quality notification sound effects for developers and creators. This strategic move lowers barriers to professional sound design while reshaping the competitive landscape for SFX...
Brand Visibility Solution
April 21, 2026

Will Adobe’s Brand Visibility Solution Rewrite the Rules of AI-Driven Customer Experience?

Adobe expands Experience Manager with a brand visibility solution for AI-driven customer engagement, positioning itself against Salesforce, Oracle, and SAP as generative AI becomes enterprises' primary discovery channel....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.