Qualcomm NPU: A Key to Unlocking On-Device Generative AI?

Qualcomm NPU: A Key to Unlocking On-Device Generative AI?

The News: On February 8, Qualcomm Senior Vice President and General Manager of Technology Planning & Edge Solutions Durga Malladi published a blog post called “What is an NPU? And why is it key to unlocking on-device generative AI?” The post is a brief summary of a deeper whitepaper the company published called “Unlocking on-device generative AI with an NPU and heterogeneous computing.” The whitepaper is an in-depth look at Qualcomm’s latest on-device computing architecture which has been designed to enable generative AI applications. The paper also provides a glimpse of pragmatic on-device generative AI use cases.

Here are the key details:

  • Qualcomm has significant experience in building compute to enable on-device applications. They have been designing on-device compute for AI since 2015. Experience in these areas helped them to quickly understand generative AI compute demand and potential use cases.
  • The company’s latest Neural Processing Units (NPU) have been designed from the ground up for generative AI.
  • Because of the diverse requirements and computational demands of generative AI, different processors are needed. A heterogeneous computing architecture with processing diversity gives the opportunity to use each processor’s strengths, namely an AI-centric custom-designed NPU, along with the CPU and GPU, each excelling in different task domains.
  • Integrating processors into SoCs is key to on-device AI. This integration in chip design provides many benefits, including improvements in peak performance, power efficiency, performance per area, chip size, and cost.
  • The CPU and GPU are general-purpose processors. Designed for flexibility, they are very programmable and have ‘day jobs’ running the operating system, games, and other applications, which limits their available capacity for AI workloads at any point in time. The NPU is built specifically for AI — AI is its day job. It trades off some ease of programmability for peak performance, power efficiency, and area efficiency to run the large number of multiplications, additions, and other operations required in machine learning.
  • Applying a system approach to this heterogeneous computing solution is essential since heterogeneous computing encompasses the entire SoC, which has three layers — the diverse processors, the system architecture, and the software. The holistic view allows Qualcomm architects to evaluate constraints, requirements, and dependencies between each of these layers and then make the most appropriate choices for the SoC and end-product usage, such as designing the shared memory subsystem or deciding what data types each processor should support.
    • Since Qualcomm custom designs the entire system, they can make the appropriate design tradeoffs and use that insight to deliver a more synergistic solution.
  • Qualcomm sees on-device generative AI use cases in three categories:
    • On-demand use cases are triggered by a user, require an immediate response, and include photo/video capture, image generation/editing, code generation, audio recording transcription/summarization, and text (email, document, etc.) creation/summarization. This includes creating a custom image while texting on your phone, generating a meeting summary on your PC, or using voice to locate the nearest gas station while driving your car.
    • Sustained use cases run for a longer period and include speech recognition, gaming and video super resolution, video call audio/video processing, and real-time translation. This includes using your phone as a real-time conversation interpreter while on a business travel overseas and running super resolution every frame while gaming on your PC.
    • Pervasive use cases constantly run in the background and include always-on predictive AI assistants, AI personalization based on contextual awareness, and advanced text auto-complete. This includes your phone suggesting a meeting with a colleague based on your conversation, or your tutor assistant on your PC adjusting study material based on your answers to questions.
  • These AI use cases have two key challenges in common. First, their demanding and diverse computational requirements are difficult to meet in power- and thermally-constrained devices using general-purpose CPUs or GPUs, which serve multiple needs on the platform. Second, they are constantly evolving, so implementing them in purely fixed-function hardware can be impractical. As a result, a heterogeneous computing architecture with processing diversity gives the opportunity to use each processor’s strengths, namely an AI-centric custom-designed NPU, along with the CPU and GPU.

Read the blog post, “What is an NPU? And why is it key to unlocking on-device generative AI?” here.

You can download the whitepaper from the blog post.

Qualcomm NPU: A Key to Unlocking On-Device Gen AI?

Analyst Take: One of the biggest stories of the generative AI era to date has been the massive compute workloads generative AI requires and the scarcity of data center compute resources to meet generative AI demand. With that in mind, compute on-device processing or use cases for on-device generative AI seemed challenging. But then in October Qualcomm introduced powerful new AI-focused SoCs – Snapdragon X Elite and Snapdragon 8 Gen 3 – and became the first devices chip maker to show generative AI possibilities and share a bit about the potential. Now, Qualcomm is sharing more detail about how on device AI compute can work, and the best types of on-device generative AI use cases. Here are my thoughts on the findings.

Experience Matters

Since the beginning, processors for mobile devices have had to fit a limiting form factor. Experienced mobile chip makers like Qualcomm have eternally worked to get the most compute processing using the least power consumption as a matter of necessity. As such, they have found themselves in a unique position to lead in addressing the challenge presented by on-device compute for generative AI. Qualcomm and other mobile device processor makers have been building not only CPUs, but GPUs and NPUs for some time, and have placed those processors in a SoC for some time. Their explanation of the need for heterogeneous computing makes logical sense because of their heritage in addressing similar challenges.

Parsing Duties Within an SoC

From the whitepaper: Applying a system approach to this heterogeneous computing solution is essential since heterogeneous computing encompasses the entire SoC, which has three layers — the diverse processors, the system architecture, and the software. The holistic view allows Qualcomm architects to evaluate constraints, requirements, and dependencies between each of these layers and then make the most appropriate choices for the SoC and end-product usage, such as designing the shared memory subsystem or deciding what data types each processor should support.

Diversity of compute capabilities being built into mobile devices (smartphones, laptops, tablets) is the key to efficient on-device generative AI. It doesn’t exist (in such a manner) for cloud-based Generative AI.

Segmented Use Cases Draw a Clearer Picture

As developers think about on-device generative AI, it will help for them to think of applications in terms of the segments Qualcomm outlined – on-demand, sustained and pervasive. The segmentation also showed an understanding of the types of tasks users have come to expect from their mobile devices. In the segment descriptions, there is a strong sense that most of the use cases lean toward the utility of particular devices — the always with us nature of smartphones (photo/video capture, interpretation/translation, speech recognition) and the ubiquitous work tools that are laptop/tablets (audio recording transcription/summarization, text summarization, super video resolution, video call audio/video processing, real time translation). Many span both, including AI assistants.

Conclusion

The Qualcomm whitepaper goes into a lot more detail about the reasons the company’s SoC processors and approach are great choices. Regardless, the company does a great job of explaining how on-device generative AI can become a reality through solving the computational challenges and sheds some light on the types of on-device generative AI applications that might make sense.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

With Snapdragon, Qualcomm Sets the Pace for On-Device AI

On-Device AI – The AI Moment, Episode 4

On-Device AI, Part 2 | The AI Moment, Episode 6

Author Information

Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.

Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

SHARE:

Latest Insights:

On this episode of The Six Five Webcast, hosts Patrick Moorhead and Daniel Newman discuss Meta, Qualcomm, Nvidia and more.
A Transformative Update Bringing New Hardware Architecture, Enhanced Write Performance, and Innovative Data Management Solutions for Hyperscale and Enterprise Environments
Camberley Bates, Chief Technology Advisor at The Futurum Group, shares insights on VAST Data Version 5.2, highlighting the EBox architecture, enhanced write performance, and data resilience features designed to meet AI and hyperscale storage environments.
A Closer Look At Hitachi Vantara’s Innovative Virtual Storage Platform One, Offering Scalable and Energy-Efficient Storage Solutions for Hybrid and Multi-Cloud Environments
Camberley Bates, Chief Technology Advisor at The Futurum Group, shares insights on Hitachi Vantara’s expanded hybrid cloud storage platform and the integration of all-QLC flash, object storage, and advanced cloud capabilities.
Dipti Vachani, SVP & GM at Arm, joins Olivier Blanchard to discuss how Arm is revolutionizing the automotive industry with AI-enabled vehicles at CES 2025.

Thank you, we received your request, a member of our team will be in contact with you.