Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

The News: On July 18, Qualcomm announced that it is working with Meta to implement Llama 2-based AI capabilities on smartphones and PCs starting in 2024. The two companies are working to optimize the execution of Meta’s newest LLM directly on device, without relying on the (note word here) sole use of cloud services. The vision is to be able to enable the creation of powerful generative AI use cases and applications. Developers can start creating applications for these devices today, leveraging the Qualcomm AI Stack, a set of tools designed process AI more efficiently on Snapdragon.

Read the full announcement on the Qualcomm website.

The announcement is further proof of Qualcomm’s investment and vision for AI – that a significant portion of AI applications will run on edge devices that leverage both local and cloud compute.

Read the Qualcomm whitepaper, “The future of AI is hybrid” here.

Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge

Analyst Take: The idea that LLM applications could be run on compute-cost efficient edge devices is enough to make most AI application developers’ imaginations run wild. This capability would bring lots of potential new use cases and business opportunities. Qualcomm and Meta are building the pathway to LLM apps at the edge. Here is a look at the why, the how, and the impact LLMs at the edge could have via Qualcomm-Llama 2.

The Market Drivers for Edge AI

Moving AI compute to the edge has two big potential advantages over cloud AI compute – lower latency and lower cost. If an edge device can handle an AI workload locally, there is no cloud compute cost. Latency drops because there is no lag in the compute. When you consider the compute cost for AI, especially for generative AI and LLMs, moving it offline to local compute has massive appeal and opens up a lot more AI opportunities.

The Market Barriers for Edge AI

The market barriers for edge AI are:

  • Compute and memory constraints – Which makes it very hard to run large AI apps.
  • Asymmetry – Edge devices are varied in size and shape, capabilities, and limitations. That makes it difficult for application developers to build AI applications that will run on a broad range of devices.
  • Security and privacy – Most edge devices are connected devices, which are therefore exposed to cyber-attacks.

Concept: The Lightweight LLM

Some LLM players have thought about the promise of Edge AI and the challenge they present for AI in compute. The solution has been to build LLMs that use less compute but deliver similar results in creative ways. Google created the Gecko Edition of the PaLM 2 model with that idea in mind. Another is Meta’s Llama models.

Under the Hood of Making LLMs Lightweight

Lightweight LLMs leverage model compression to optimize for edge devices. There are three main techniques: knowledge distillation, quantization, and pruning.

  • Pruning – A technique that removes redundant and inconsequential parameters, such as connectors, neurons, channels, or layers.
  • Knowledge distillation – A technique where a smaller model is trained to mimic the behavior of a larger model on a smaller data set.
  • Quantization – A technique where the model’s weights and activation accuracy are reduced without significantly impacting the model’s overall accuracy.

Bringing It Back to Hybrid

While leveraging lightweight LLM models locally can make an impact at the edge, Qualcomm’s concept of hybrid is the approach that makes the most sense for generative AI/LLM apps at the edge. AI compute loads that make sense to process locally are processed locally while other, likely larger AI compute loads are processed in the cloud. Edge AI then gets to benefit from some lower costs and latency of local compute but are still able to leverage the more robust compute power of the cloud to deliver potent LLM apps.

Conclusions

At first blush, the idea of embedding Llama 2 in edge devices seems far-fetched, but if you consider the model compression techniques available to make LLMs more lightweight, combined with the hybrid edge/cloud approach, the path to unleashing a new wave of generative AI apps at the edge has real potential. The end of 2024 will be a time to gauge how the idea will work. By then, there should be enough market adoption of the Llama 2-powered Qualcomm devices to get a sense of Edge AI direction.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Generative AI Investment Accelerating: $1.3 Billion for LLM Inflection

Not Nothing: Nothing 2 Powered by the Qualcomm Snapdragon 8 Gen 1 SoC

Qualcomm Snapdragon Wear 4100+ Platform: Helping Keep Children Safe

Author Information

Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.

Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

SHARE:

Latest Insights:

Oracle Introduces a Platform to Design, Deploy, and Manage AI Agents Across Fusion Cloud at No Additional Cost to Users
Keith Kirkpatrick, Research Director at The Futurum Group, analyzes Oracle’s AI Agent Studio, a platform enabling enterprise users to create, manage, and extend AI agents across Fusion Cloud Applications without added cost or complexity.
Nokia Bell Labs’ 100th Anniversary Created the Opportunity for Nokia CNS to Showcase How Collaboration with Bell Labs is Productizing Portfolio Innovation
Ron Westfall, Research Director at The Futurum Group, shares insights on why Nokia CSN and Bell Labs are driving the portfolio innovation key to enable CSP and enterprise transformation of cloud, AI and automation, and monetization capabilities.
Synopsys Deepens NVIDIA Collaboration to Accelerate EDA Workloads on Grace Blackwell Platform
Richard Gordon, VP & Practice Lead, Semiconductors at The Futurum Group, examines how Synopsys and NVIDIA aim to accelerate chip design with Grace Blackwell, targeting 30x EDA speedups and enhanced AI productivity.
Custom Arm Neoverse V2 Chip Posts Gains in AI, HPC, and General Compute Across C4A VMs
Richard Gordon, VP & Practice Lead, Semiconductors at The Futurum Group, unpacks Google Axion’s strong benchmarks across AI, HPC, and cloud workloads, showing how Google’s custom Arm CPU could reshape enterprise infrastructure.

Book a Demo

Thank you, we received your request, a member of our team will be in contact with you.