Menu

Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

The News: On July 18, Qualcomm announced that it is working with Meta to implement Llama 2-based AI capabilities on smartphones and PCs starting in 2024. The two companies are working to optimize the execution of Meta’s newest LLM directly on device, without relying on the (note word here) sole use of cloud services. The vision is to be able to enable the creation of powerful generative AI use cases and applications. Developers can start creating applications for these devices today, leveraging the Qualcomm AI Stack, a set of tools designed process AI more efficiently on Snapdragon.

Read the full announcement on the Qualcomm website.

The announcement is further proof of Qualcomm’s investment and vision for AI – that a significant portion of AI applications will run on edge devices that leverage both local and cloud compute.

Read the Qualcomm whitepaper, “The future of AI is hybrid” here.

Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

Analyst Take: The idea that LLM applications could be run on compute-cost efficient edge devices is enough to make most AI application developers’ imaginations run wild. This capability would bring lots of potential new use cases and business opportunities. Qualcomm and Meta are building the pathway to LLM apps at the edge. Here is a look at the why, the how, and the impact LLMs at the edge could have via Qualcomm-Llama 2.

The Market Drivers for Edge AI

Moving AI compute to the edge has two big potential advantages over cloud AI compute – lower latency and lower cost. If an edge device can handle an AI workload locally, there is no cloud compute cost. Latency drops because there is no lag in the compute. When you consider the compute cost for AI, especially for generative AI and LLMs, moving it offline to local compute has massive appeal and opens up a lot more AI opportunities.

The Market Barriers for Edge AI

The market barriers for edge AI are:

  • Compute and memory constraints – Which makes it very hard to run large AI apps.
  • Asymmetry – Edge devices are varied in size and shape, capabilities, and limitations. That makes it difficult for application developers to build AI applications that will run on a broad range of devices.
  • Security and privacy – Most edge devices are connected devices, which are therefore exposed to cyber-attacks.

Concept: The Lightweight LLM

Some LLM players have thought about the promise of Edge AI and the challenge they present for AI in compute. The solution has been to build LLMs that use less compute but deliver similar results in creative ways. Google created the Gecko Edition of the PaLM 2 model with that idea in mind. Another is Meta’s Llama models.

Under the Hood of Making LLMs Lightweight

Lightweight LLMs leverage model compression to optimize for edge devices. There are three main techniques: knowledge distillation, quantization, and pruning.

  • Pruning – A technique that removes redundant and inconsequential parameters, such as connectors, neurons, channels, or layers.
  • Knowledge distillation – A technique where a smaller model is trained to mimic the behavior of a larger model on a smaller data set.
  • Quantization – A technique where the model’s weights and activation accuracy are reduced without significantly impacting the model’s overall accuracy.

Bringing It Back to Hybrid

While leveraging lightweight LLM models locally can make an impact at the edge, Qualcomm’s concept of hybrid is the approach that makes the most sense for generative AI/LLM apps at the edge. AI compute loads that make sense to process locally are processed locally while other, likely larger AI compute loads are processed in the cloud. Edge AI then gets to benefit from some lower costs and latency of local compute but are still able to leverage the more robust compute power of the cloud to deliver potent LLM apps.

Conclusions

At first blush, the idea of embedding Llama 2 in edge devices seems far-fetched, but if you consider the model compression techniques available to make LLMs more lightweight, combined with the hybrid edge/cloud approach, the path to unleashing a new wave of generative AI apps at the edge has real potential. The end of 2024 will be a time to gauge how the idea will work. By then, there should be enough market adoption of the Llama 2-powered Qualcomm devices to get a sense of Edge AI direction.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Generative AI Investment Accelerating: $1.3 Billion for LLM Inflection

Not Nothing: Nothing 2 Powered by the Qualcomm Snapdragon 8 Gen 1 SoC

Qualcomm Snapdragon Wear 4100+ Platform: Helping Keep Children Safe

Author Information

Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Related Insights
January 21, 2026

AI-Enabled Enterprise Workspace – Futurum Signal

The enterprise workspace is entering a new phase—one shaped less by device refresh cycles and more by intelligent integration. As AI-enabled PCs enter the mainstream, the real challenge for IT...
Does Smartsheet's Partner Program Transformation Signal Market Consolidation?
January 21, 2026

Does Smartsheet’s Partner Program Transformation Signal Market Consolidation?

Keith Kirkpatrick and Alex Smith of Futurum cover Smartsheet’s enhancements to its Aligned Partner Program, which may serve as a key differentiator for the work management platform provider....
AWS European Sovereign Cloud Debuts with Independent EU Infrastructure
January 16, 2026

AWS European Sovereign Cloud Debuts with Independent EU Infrastructure

Nick Patience, AI Platforms Practice Lead at Futurum, shares his/her insights on AWS’s launch of its European Sovereign Cloud. It is an independently-run cloud in the EU aimed at meeting...
Qualcomm Unveils Future of Intelligence at CES 2026, Pushes the Boundaries of On-Device AI
January 16, 2026

Qualcomm Unveils Future of Intelligence at CES 2026, Pushes the Boundaries of On-Device AI

Olivier Blanchard, Research Director at Futurum, shares his/her insights on Qualcomm’s CES 2026 announcements, which highlight both the breadth of Qualcomm’s Snapdragon and Dragonwing portfolios, and the velocity with which...
Five9 Expands Google Cloud Partnership With a Unified Enterprise CX AI Platform
January 16, 2026

Five9 Expands Google Cloud Partnership With a Unified Enterprise CX AI Platform

Keith Kirkpatrick, Research Director at Futurum, examines Five9’s expanded partnership with Google Cloud and the launch of a joint Enterprise CX AI offering integrating Gemini Enterprise and Vertex AI....
SiMa.ai and Synopsys Unveil Automotive AI SoC Blueprint. Is Pre-Silicon the New Baseline
January 15, 2026

SiMa.ai and Synopsys Unveil Automotive AI SoC Blueprint. Is Pre-Silicon the New Baseline?

Olivier Blanchard, Research Director at Futurum, shares his insights on the joint SiMa.ai–Synopsys blueprint, which targets earlier architecture exploration and software development for ADAS and IVI SoCs....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.