Menu

Large Language Model AI Needs to Be Invisible and Cheaper

Large Language Model AI Needs to Be Invisible and Cheaper

There is no doubt that large language model (LLM) AI is revolutionizing the ability of a computer to augment or replace human effort. The challenge is taking that revolution and gaining business value from LLMs. ChatGPT’s attention-grabbing ability to write college-level essays and Sora’s ability to generate life-like videos differ from using LLMs within a business application. LLMs must integrate easily into the tools that build business applications to deliver widespread business value from LLMs. The cost of running LLM inference in business applications must be controlled to allow maximum value.

Many organizations are trying to integrate LLMs into their applications and finding that months of work are required to gain any value from an LLM, let alone transform their applications. The primary issue is that the organization’s data and business processes must be integrated with the foundation LLM. The currently available technologies are a collection of incredibly powerful science projects that require significant tuning to each use case within an organization. The mix of projects is natural in the early stages of a new application class, and more will spring up over a few years. The usual maturity curve will apply. Over time, a few projects will rise to the top as the most useful, and these will become easier to implement. In the same way, machine learning (ML)-based AI for video has become a core component of some applications, and we will see LLMs become a feature rather than a product.

The cost to run LLM-based inference is a barrier to some use cases; LLM inference is resource-intensive and often requires GPUs installed in application servers. The high cost means that LLMs are only used where there is a high return. For broader use, the cost must come down. We are already seeing the use of quantization to reduce resource requirements, and as LLM sizes increase, we will need more techniques to reduce resource use. One development is that Intel has added a matrix math accelerator to the latest Xeon Scalable CPUs, reducing the need for GPUs to deliver business value from LLM inference performance.

I hope we see another seismic shift in AI, and LLMs become more applicable because they are easier and cheaper to integrate into business applications. I doubt we will see the future of intelligent assistance robots, flying cars, and unlimited leisure. But it would be nice if LLMs could make everyday applications more straightforward to use and more insight driven.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

HPE Infuses GenAI LLMs to Uplift HPE Aruba Networking Central AIOps

AI Field Day: Nature Fresh Farms Profits by Machine Learning, Not LLMs

Why the Launch of LLM Gemini Will Underpin Google Revenue

Author Information

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

Related Insights
Elastic Q3 FY 2026 Strong Quarter, but Reacceleration Thesis Unproven
March 3, 2026

Elastic Q3 FY 2026: Strong Quarter, but Reacceleration Thesis Unproven

Nick Patience, VP and Practice Lead for AI Platforms at Futurum reviews Elastic Q3 FY 2026 earnings, highlighting sales-led subscription momentum, AI context engineering adoption, and agentic workflow expansion across...
CoreWeave Q4 FY 2025 Results Highlight Backlog Growth And Capacity Expansion
March 3, 2026

CoreWeave Q4 FY 2025 Results Highlight Backlog Growth And Capacity Expansion

Futurum Research reviews CoreWeave’s Q4 FY 2025 earnings, focusing on backlog-driven capacity expansion, platform monetization beyond GPUs, and execution cadence shaping AI infrastructure supply....
Snowflake Q4 FY 2026 Results Highlight AI-Led Consumption and Platform Expansion
March 2, 2026

Snowflake Q4 FY 2026 Results Highlight AI-Led Consumption and Platform Expansion

Brad Shimmin, Vice President & Practice Lead at Futurum analyzes Snowflake’s Q4 FY 2026 earnings, highlighting AI-driven consumption growth, expanding platform scope, and guidance shaping expectations for FY 2027....
Collapsing the Stack VAST Data’s Bid to Own the AI Data Loop
February 27, 2026

Collapsing the Stack: VAST Data’s Bid to Own the AI Data Loop

Brad Shimmin, Vice President at Futurum, analyzes the VAST Data platform updates from VAST Forward, detailing how the new Policy Engine, Tuning Engine, and Polaris architectures are simplifying the AI...
Are Enterprises Ready for the Virtualization Reset, or Just Swapping Out One Complexity for Another
February 27, 2026

Are Enterprises Ready for the Virtualization Reset, or Just Swapping Out One Complexity for Another?

Futurum’s Alastair Cooke shares his insights on new HPE research that finds that only 5% of enterprises are fully prepared for the so-called Great Virtualization Reset, even as two-thirds plan...
NVIDIA Q4 FY 2026 Earnings Highlight Durable AI Infrastructure Demand
February 27, 2026

NVIDIA Q4 FY 2026 Earnings Highlight Durable AI Infrastructure Demand

Futurum’s Nick Patience analyzes NVIDIA’s Q4 FY 2026 earnings, highlighting data center scale, networking expansion, and agentic AI adoption shaping AI infrastructure demand....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.