Menu

Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

The News: On July 18, Qualcomm announced that it is working with Meta to implement Llama 2-based AI capabilities on smartphones and PCs starting in 2024. The two companies are working to optimize the execution of Meta’s newest LLM directly on device, without relying on the (note word here) sole use of cloud services. The vision is to be able to enable the creation of powerful generative AI use cases and applications. Developers can start creating applications for these devices today, leveraging the Qualcomm AI Stack, a set of tools designed process AI more efficiently on Snapdragon.

Read the full announcement on the Qualcomm website.

The announcement is further proof of Qualcomm’s investment and vision for AI – that a significant portion of AI applications will run on edge devices that leverage both local and cloud compute.

Read the Qualcomm whitepaper, “The future of AI is hybrid” here.

Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

Analyst Take: The idea that LLM applications could be run on compute-cost efficient edge devices is enough to make most AI application developers’ imaginations run wild. This capability would bring lots of potential new use cases and business opportunities. Qualcomm and Meta are building the pathway to LLM apps at the edge. Here is a look at the why, the how, and the impact LLMs at the edge could have via Qualcomm-Llama 2.

The Market Drivers for Edge AI

Moving AI compute to the edge has two big potential advantages over cloud AI compute – lower latency and lower cost. If an edge device can handle an AI workload locally, there is no cloud compute cost. Latency drops because there is no lag in the compute. When you consider the compute cost for AI, especially for generative AI and LLMs, moving it offline to local compute has massive appeal and opens up a lot more AI opportunities.

The Market Barriers for Edge AI

The market barriers for edge AI are:

  • Compute and memory constraints – Which makes it very hard to run large AI apps.
  • Asymmetry – Edge devices are varied in size and shape, capabilities, and limitations. That makes it difficult for application developers to build AI applications that will run on a broad range of devices.
  • Security and privacy – Most edge devices are connected devices, which are therefore exposed to cyber-attacks.

Concept: The Lightweight LLM

Some LLM players have thought about the promise of Edge AI and the challenge they present for AI in compute. The solution has been to build LLMs that use less compute but deliver similar results in creative ways. Google created the Gecko Edition of the PaLM 2 model with that idea in mind. Another is Meta’s Llama models.

Under the Hood of Making LLMs Lightweight

Lightweight LLMs leverage model compression to optimize for edge devices. There are three main techniques: knowledge distillation, quantization, and pruning.

  • Pruning – A technique that removes redundant and inconsequential parameters, such as connectors, neurons, channels, or layers.
  • Knowledge distillation – A technique where a smaller model is trained to mimic the behavior of a larger model on a smaller data set.
  • Quantization – A technique where the model’s weights and activation accuracy are reduced without significantly impacting the model’s overall accuracy.

Bringing It Back to Hybrid

While leveraging lightweight LLM models locally can make an impact at the edge, Qualcomm’s concept of hybrid is the approach that makes the most sense for generative AI/LLM apps at the edge. AI compute loads that make sense to process locally are processed locally while other, likely larger AI compute loads are processed in the cloud. Edge AI then gets to benefit from some lower costs and latency of local compute but are still able to leverage the more robust compute power of the cloud to deliver potent LLM apps.

Conclusions

At first blush, the idea of embedding Llama 2 in edge devices seems far-fetched, but if you consider the model compression techniques available to make LLMs more lightweight, combined with the hybrid edge/cloud approach, the path to unleashing a new wave of generative AI apps at the edge has real potential. The end of 2024 will be a time to gauge how the idea will work. By then, there should be enough market adoption of the Llama 2-powered Qualcomm devices to get a sense of Edge AI direction.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Generative AI Investment Accelerating: $1.3 Billion for LLM Inflection

Not Nothing: Nothing 2 Powered by the Qualcomm Snapdragon 8 Gen 1 SoC

Qualcomm Snapdragon Wear 4100+ Platform: Helping Keep Children Safe

Author Information

Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Related Insights
Amazon CES 2026 Do Ring, Fire TV, and Alexa+ Add Up to One Strategy
February 5, 2026

Amazon CES 2026: Do Ring, Fire TV, and Alexa+ Add Up to One Strategy?

Olivier Blanchard, Research Director at The Futurum Group, examines Amazon’s CES 2026 announcements across Ring, Fire TV, and Alexa+, focusing on AI-powered security, faster interfaces, and expanded assistant access across...
Is 2026 the Turning Point for Industrial-Scale Agentic AI?
February 5, 2026

Is 2026 the Turning Point for Industrial-Scale Agentic AI?

VP and Practice Lead Fernando Montenegro shares insights from the Cisco AI Summit 2026, where leaders from the major AI ecosystem providers gathered to discuss bridging the AI ROI gap...
NXP Q4 FY 2025: Auto Stabilises, Edge AI Platforms Gain Traction
February 5, 2026

NXP Q4 FY 2025: Auto Stabilises, Edge AI Platforms Gain Traction

Futurum Research analyzes NXP’s Q4 FY 2025 earnings, highlighting SDV design wins, edge AI platform traction, and portfolio focus, with guidance pointing to steady margins and disciplined channel management into...
AMD Q4 FY 2025: Record Data Center And Client Momentum
February 5, 2026

AMD Q4 FY 2025: Record Data Center And Client Momentum

Futurum Research analyzes AMD’s Q4 FY 2025 results, highlighting data center CPU/GPU momentum, AI software progress, and a potential H2 FY 2026 rack-scale inflection, amid mixed client, gaming, and embedded...
Teradyne Q4 FY 2025 Shifts the Narrative to Data Center and Physical AI
February 4, 2026

Teradyne Q4 FY 2025 Shifts the Narrative to Data Center and Physical AI

Brendan Burke, Research Director at Futurum, analyzes Teradyne’s Q4 2025 results, where AI-driven demand reached 60% of revenue. Record memory test growth and the strategic scaling of Physical AI in...
Accenture Bets on Palantir Momentum
February 4, 2026

Accenture Bets on Palantir Momentum

Alex Smith, GM Futurum Research, shares his insights on how Accenture is strategically betting on Palantir's exceptional Q4 2025 results to become the key services partner for enterprise AI and...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.