Menu

Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

The News: On July 18, Qualcomm announced that it is working with Meta to implement Llama 2-based AI capabilities on smartphones and PCs starting in 2024. The two companies are working to optimize the execution of Meta’s newest LLM directly on device, without relying on the (note word here) sole use of cloud services. The vision is to be able to enable the creation of powerful generative AI use cases and applications. Developers can start creating applications for these devices today, leveraging the Qualcomm AI Stack, a set of tools designed process AI more efficiently on Snapdragon.

Read the full announcement on the Qualcomm website.

The announcement is further proof of Qualcomm’s investment and vision for AI – that a significant portion of AI applications will run on edge devices that leverage both local and cloud compute.

Read the Qualcomm whitepaper, “The future of AI is hybrid” here.

Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

Analyst Take: The idea that LLM applications could be run on compute-cost efficient edge devices is enough to make most AI application developers’ imaginations run wild. This capability would bring lots of potential new use cases and business opportunities. Qualcomm and Meta are building the pathway to LLM apps at the edge. Here is a look at the why, the how, and the impact LLMs at the edge could have via Qualcomm-Llama 2.

The Market Drivers for Edge AI

Moving AI compute to the edge has two big potential advantages over cloud AI compute – lower latency and lower cost. If an edge device can handle an AI workload locally, there is no cloud compute cost. Latency drops because there is no lag in the compute. When you consider the compute cost for AI, especially for generative AI and LLMs, moving it offline to local compute has massive appeal and opens up a lot more AI opportunities.

The Market Barriers for Edge AI

The market barriers for edge AI are:

  • Compute and memory constraints – Which makes it very hard to run large AI apps.
  • Asymmetry – Edge devices are varied in size and shape, capabilities, and limitations. That makes it difficult for application developers to build AI applications that will run on a broad range of devices.
  • Security and privacy – Most edge devices are connected devices, which are therefore exposed to cyber-attacks.

Concept: The Lightweight LLM

Some LLM players have thought about the promise of Edge AI and the challenge they present for AI in compute. The solution has been to build LLMs that use less compute but deliver similar results in creative ways. Google created the Gecko Edition of the PaLM 2 model with that idea in mind. Another is Meta’s Llama models.

Under the Hood of Making LLMs Lightweight

Lightweight LLMs leverage model compression to optimize for edge devices. There are three main techniques: knowledge distillation, quantization, and pruning.

  • Pruning – A technique that removes redundant and inconsequential parameters, such as connectors, neurons, channels, or layers.
  • Knowledge distillation – A technique where a smaller model is trained to mimic the behavior of a larger model on a smaller data set.
  • Quantization – A technique where the model’s weights and activation accuracy are reduced without significantly impacting the model’s overall accuracy.

Bringing It Back to Hybrid

While leveraging lightweight LLM models locally can make an impact at the edge, Qualcomm’s concept of hybrid is the approach that makes the most sense for generative AI/LLM apps at the edge. AI compute loads that make sense to process locally are processed locally while other, likely larger AI compute loads are processed in the cloud. Edge AI then gets to benefit from some lower costs and latency of local compute but are still able to leverage the more robust compute power of the cloud to deliver potent LLM apps.

Conclusions

At first blush, the idea of embedding Llama 2 in edge devices seems far-fetched, but if you consider the model compression techniques available to make LLMs more lightweight, combined with the hybrid edge/cloud approach, the path to unleashing a new wave of generative AI apps at the edge has real potential. The end of 2024 will be a time to gauge how the idea will work. By then, there should be enough market adoption of the Llama 2-powered Qualcomm devices to get a sense of Edge AI direction.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Generative AI Investment Accelerating: $1.3 Billion for LLM Inflection

Not Nothing: Nothing 2 Powered by the Qualcomm Snapdragon 8 Gen 1 SoC

Qualcomm Snapdragon Wear 4100+ Platform: Helping Keep Children Safe

Author Information

Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Related Insights
Yann LeCun’s AMI Raises $1BN Seed Round - Is the World Model Era Finally Here
March 13, 2026

Yann LeCun’s AMI Raises $1BN Seed Round – Is the World Model Era Finally Here?

Nick Patience, VP & AI Platforms Practice Lead at Futurum, examines AMI Labs' $1.03B seed round - Europe's largest - and what it means for the world model era, sovereign...
Domo Q4 FY 2026 Earnings Show Record Billings And Profitability Gains
March 13, 2026

Domo Q4 FY 2026 Earnings Show Record Billings And Profitability Gains

Brad Shimmin, Vice President & Practice Lead Futurum, analyzes Domo’s Q4 FY 2026 results, focusing on record billings, improving retention, and AI-led workflow automation strategy as the company pushes consumption...
Oracle Q3 FY 2026 Earnings Driven by OCI AI Infrastructure Demand
March 13, 2026

Oracle Q3 FY 2026 Earnings Driven by OCI AI Infrastructure Demand

Futurum Research analyzes Oracle’s Q3 FY 2026 earnings, focusing on OCI AI infrastructure momentum, sovereign cloud positioning, and Fusion’s embedded AI agents as the company scales capacity and backlog....
Enterprise Connect 2026 - How Will AI’s Emergence Impact CCaaS Vendors
March 13, 2026

Enterprise Connect 2026 — How Will AI’s Emergence Impact CCaaS Vendors?

Keith Kirkpatrick, VP & Research Director with Futurum, covers Enterprise Connect 2026, and shares his insights on the evolution of the CX market and its impact on CCaaS vendors....
Adobe’s Ecosystem Evolution Creating a Seamless Core for Partner Success
March 12, 2026

Adobe’s Ecosystem Evolution: Creating a Seamless Core for Partner Success

Alex Smith and Tiffani Bova at Futurum Research at Futurum examine Adobe’s unified Digital Experience Partner Program and AI-powered PxHub mark a shift in scaling ecosystems....
Will Salesforce’s Agentic Contact Center Force a Rethink of CCaaS Sourcing
March 12, 2026

Will Salesforce’s Agentic Contact Center Force a Rethink of CCaaS Sourcing?

Keith Kirkpatrick, VP & Research Director at Futurum, shares his insights on Salesforce Contact Center offering, and discusses the impact for customers and Salesforce’s competitors in the CCaaS and enterprise...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.