The Future of AI is Hybrid: Look No Further than Your Devices to Scale Generative AI

The News: Qualcomm envisions the future of AI as hybrid with on-device AI playing a key role enabling generative AI to scale. Read the Qualcomm blog here.

The Future of AI is Hybrid: Look No Further than Your Devices to Scale Generative AI

Analyst Take: A hybrid AI architecture distributes and coordinates AI workloads among cloud and edge devices, rather than processing in the cloud only. The cloud and edge devices, including smartphones, automobiles, personal computers, and Internet of Things (IoT) devices, work together to provide more powerful, efficient, and highly optimized AI.

Massive generative AI models with billions of parameters place substantial demands on computing infrastructure. As such, both AI training, which learns the parameters for an AI model, and AI inference, which executes the model, have been limited to cloud implementations for massive and intricate models. However, I see that changing rapidly now.

I anticipate that the scale of AI inference is poised to be dramatically higher than that of AI training. While training individual models requires significant resources, larger generative AI models are expected to be trained only a few times annually. Notably, the cost of inferencing with such models increases in accord with the number of daily active users and their frequency of use. Running inference in the cloud results in exorbitant costs that can prove unsustainable for scaling.

Hybrid AI provides the answer akin to traditional computing’s evolution from mainframes and thin clients to a mix of cloud infrastructure and smart devices including PCs and smartphones. Hybrid AI is essential to the affordable scaling of consumer and enterprise use cases that are emerging from generative AI. Foundation models, such as general-purpose large language models (LLMs) like Generative Pre-trained Transformer 4 (GPT-4) and Language Model for Dialog Applications (LaMDA), have attained breakthrough levels of language comprehension, generation capabilities, and vast knowledge. Most of these models are highly massive with 100 billion+ parameters.

Google, for instance, continues to enhance LaMDA so that Google Bard, which uses AI to generate more conversational, contextual, and informative web search results for users, can improve web search by drawing on information across the Internet to provide deeper, mode contextual query results for users. At Google I/O 2023, Google introduced PaLM2, the company’s next generation language model. In relation to PaLM 1, PaLM 2 is more trained on multilingual text, spanning more than 100 languages to boost understanding, generation, and translation of nuanced text such as idioms, poems, and riddles.

Generative AI Use Cases Rising Amid Device Categories

From my perspective, hybrid AI architecture can enable generative AI to deliver augmented and new user experiences. For instance, with over 10 billion searches daily, and mobile accounting for over 60% of searches, the expansion of generative AI will fuel a considerable increment in the computing capacity required, especially from queries originating from smartphones. The growing popularity of chat as a search interface, along with generative AI-based search, are ready to boost the number of overall queries. As chat improves, the smartphone can also perform more capably as a digital assistant.

Now users can communicate naturally to gain more informative interactions, due to the accuracy of on-device personas and the LLMs comprehending text, voice, images, video, and other evolving inputs. Smartphone models that perform language processing, image understanding, text-to-text generation, and more, will likely be more in demand for quite some time.

For IoT, AI is already used in a wide array of IoT market segments, including retail, security, energy, supply chain, and asset management. Generative AI can benefit IoT segments by improving customer and workforce experience. In retail, for example, store managers can better plan for off-cycle sales opportunities based on upcoming events such as major sporting events and cultural festivals.

Additionally, I expect that the operations teams throughout the energy and utilities segment can use generative AI to optimize corner case load scenarios and better predict spikes in energy demand as well as the potential for grid diminishment and breakdowns. Plus, generative AI can enhance customer service in areas such as billing and outage updates.

Key Takeaways: The Future of AI is Hybrid

In the same manner organizations are expanding their adoption of hybrid cloud, the digital ecosystem is fast embracing hybrid AI to meet the surging resource and scaling demands of AI including generative AI. Remarkably, Qualcomm cites that AI models with more than 1 billion parameters are already running on phones. Equally important, the performance and accuracy levels are comparable to those across the cloud. From my view, the hybrid AI approach currently extends to all the major AI applications and device segments, including smartphones, IoT, laptops, and vehicles and will only increasingly expand. The future is now for hybrid AI.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Qualcomm Revenue in Q2 Hits $9.27B, Beating Analyst Estimates

Qualcomm Uplifts WiFi 7 through Mesh Networking Performance Optimization

Qualcomm Snapdragon 8 Gen 2 Powers ASUS ROG 7 Mobile Gaming Phones

Author Information

Ron is an experienced, customer-focused research expert and analyst, with over 20 years of experience in the digital and IT transformation markets, working with businesses to drive consistent revenue and sales growth.

Ron holds a Master of Arts in Public Policy from University of Nevada — Las Vegas and a Bachelor of Arts in political science/government from William and Mary.

Related Insights
Can Parallel Retrieval Redefine Enterprise AI Search Speed and Quality?
June 6, 2026

Can Parallel Retrieval Redefine Enterprise AI Search Speed and Quality?

Databricks' upgraded Agent Bricks Knowledge Assistant achieves 2x faster answer generation and 3x faster search latency through parallel test-time scaling, redefining enterprise AI search performance....
Will Glean's NVIDIA Nemotron 3 Ultra Integration Shift the Enterprise AI Stack?
June 6, 2026

Will Glean’s NVIDIA Nemotron 3 Ultra Integration Shift the Enterprise AI Stack?

Glean's integration of NVIDIA Nemotron 3 Ultra marks a pivotal moment in enterprise AI, where model flexibility and infrastructure alignment become strategic competitive advantages for buyers seeking cost-effective, high-context solutions....
Zendesk Bets on Embedded AI Support, Can Deep Microsoft 365 Integration Shift Enterprise Workflows?
June 5, 2026

Zendesk Bets on Embedded AI Support, Can Deep Microsoft 365 Integration Shift Enterprise Workflows?

Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Di at Futurum, Zendesk's new Support Assistant for Microsoft 365 embeds AI-powered support into Teams, Outlook, and Word to streamline...
Marvell’s Teralynx T100 Puts Power Efficiency at the Center of AI Networking
June 5, 2026

Marvell’s Teralynx T100 Puts Power Efficiency at the Center of AI Networking

Tom Hollingsworth, Networking Technology Advisor and Event Lead at Futurum, examines how the Marvell Teralynx T100 addresses AI networking power and latency constraints as hyperscalers build larger AI clusters....
Can Cisco Cloud Control Make AgenticOps Practical for Enterprises
June 5, 2026

Can Cisco Cloud Control Make AgenticOps Practical for Enterprises?

Tom Hollingsworth, Networking Technology Advisor and Event Lead at Futurum, examines how Cisco Cloud Control combines AI agents, operations, security, and resilience into a unified control plane for critical infrastructure....
Can NVIDIA Cosmos 3 Make Open Physical AI a Reality, Or Will Fragmentation Stall Progress?
June 5, 2026

Can NVIDIA Cosmos 3 Make Open Physical AI a Reality, Or Will Fragmentation Stall Progress?

NVIDIA Cosmos 3 launches as the first open omni-model for physical AI, targeting robotics and embodied AI with an open-source approach that challenges proprietary models from OpenAI, Google, and Amazon,...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.