Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit

Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit

The News: On August 8, startup AI compute provider Groq announced it now runs LLM Llama-2 (70 billion parameters) at more than 100 tokens per second, per user on a Groq Language Processing Unit (LPU). Tokenization is a process large language models (LLMs) use to break text into smaller, manageable units. It allows LLMs to process text more efficiently by reducing memory requirements and compute complexity. The more tokens per second per user an LLM can process, the faster the LLM will load results for users and the less AI compute the application will require.

Here are the pertinent details:

  • According to Groq, in similar tests, ChatGPT loads at 40-50 tokens per second, and Bard at 70 tokens per second on typical GPU-based computing systems.
  • Context for 100 tokens per second per user – A user could generate a 4,000-word essay in just over a minute.
  • Groq executives told Futurum Group that not only does the company’s LPU system improve AI compute efficiency, it also improves the developer’s user experience, making it easier for developers to create LLM-based apps.

Read the full Press Release on Groq’s token test here.

In related news, on March 13, Groq reported they had LLaMA in production on one of their LPU systems in 3 days. The rapid LLaMA ramp-up is noteworthy because Meta originally developed LLaMA for NVIDIA GPU chips, demonstrating a ready-to-use alternative to legacy GPU technology.

Read the Press Release detailing Groq’s rapid spin-up of LLaMA on an LPU system here.

Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit

Analyst Take: With continued concern about the cost and availability of AI compute, Groq’s token test and ability to bring AI compute into production rapidly could mean real competition for NVIDIA’s GPU business and could thwart CPU makers’ designs on capturing the AI inference compute market. Is the AI compute business about to be disrupted? The answers to the following key questions will determine the outcome.

Who are the likely buyers for Groq LPUs?

According to Groq executives The Futurum Group spoke with, there are three types of customers – 1) Hyperscalers/data centers, 2) Global 3000 enterprises, and 3) Everybody else.

For Hyperscalers/data centers, the appeal is their AI inference compute runs more efficiently, allowing them to reduce their dependence on GPUs and their related power and operational costs. As generative AI applications become more popular, AI compute costs must come down to achieve application scale, and price elasticity for AI compute will increasingly come into play for cloud providers. Groq’s LPUs help address keeping those costs in line.

Groq executives told The Futurum Group they believe the Global 3000 represents a significant market for LPUs. Enterprises are increasingly telling Groq they prefer to have complete control over their proprietary data, and many are contemplating increased on-premises data centers versus working solely with cloud data center providers. In theory, some enterprises could experiment with the Groq option, as it represents a smaller capital investment and appealing speed to market.

Enterprises outside the Global 3000 must use cloud providers for AI compute, and they are interested in the potential lower cost to run AI inference. They are also perhaps the market that most urgently desires access to Groq’s potential ease of development velocity.

Can Groq handle the potential demand for LPUs?

It is unclear how much production capacity Groq has for its LPUs. A Forbes article about the company in 2021 pegged the number of employees at less than 250. If Groq’s claims are well-founded, they might find themselves in a situation similar to their competitor NVIDIA – working to keep up with market demand for their chips. They may seek to a bigger partner or could get acquired as well.

How will NVIDIA counter?

If Groq’s LPUs are a viable option for AI inference compute, and if Groq does not have significant issues with production capacity of said LPUs, then it is likely NVIDIA will see them as a legitimate competitive threat. In the short term, NVIDIA might put pressure on hyperscalers and data center vendors who are not developing their own GPUs, perhaps by offering aggressive pricing, contracts, and value-added elements of the AI stack, like developer tools. In the long term, nearly all chipmakers are thinking about the best designs to handle AI compute loads and there will be competition for the next generation. NVIDIA knows this and is investing significantly in next-generation AI compute as well.

Note: Daniel Newman, CEO of The Futurum Group, is an investor in Groq.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. 

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

GroqDay: Groq Sets its Sights on Accelerating the Speed of Artificial Intelligence

Groq Goes LLaMa

The Cost of The Next Big Thing – Artificial Intelligence

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Related Insights
Brand Visibility Solution
April 21, 2026

Will Adobe’s Brand Visibility Solution Rewrite the Rules of AI-Driven Customer Experience?

Adobe expands Experience Manager with a brand visibility solution for AI-driven customer engagement, positioning itself against Salesforce, Oracle, and SAP as generative AI becomes enterprises' primary discovery channel....
Mirantis and NVIDIA
April 21, 2026

Can Mirantis and NVIDIA Run:ai Automation Break the AI Factory Bottleneck?

Mirantis and NVIDIA's k0rdent AI and Run:ai integration solves GPU infrastructure deployment by delivering fully orchestrated, multi-tenant AI environments in minutes instead of weeks....
Edge AI
April 21, 2026

Can Qualcomm’s Arduino Ventuno Q Break Nvidia’s Grip on Edge AI for Robotics?

Qualcomm's Arduino Ventuno Q single-board computer challenges NVIDIA Jetson by delivering edge AI for robotics at under $300, leveraging the Dragonwing IQ8 processor for cost-effective vertical integration....
agentic AI
April 21, 2026

Adobe CX Enterprise Coworker Aims to Disrupt Agentic AI in Customer Experience

Adobe launches CX Enterprise Coworker, an agentic AI platform orchestrating customer experience workflows across siloed systems, positioning itself against legacy CX suites and AI-native competitors....
Self-Driving Tech
April 21, 2026

Will AMD, Arm, and Qualcomm’s Bet on Wayve Rewrite the Self-Driving Tech Playbook?

Wayve's $60M funding round from semiconductor giants AMD, Arm, and Qualcomm signals a shift: chip makers are entering autonomous vehicle development, challenging automakers and hyperscalers for platform dominance....
Can LogicMonitor's LM Envision Redefine Hybrid Observability for the AI Era?
April 21, 2026

Can LogicMonitor’s LM Envision Redefine Hybrid Observability for the AI Era?

LogicMonitor's LM Envision unifies cloud and on-premises monitoring with AI-driven noise reduction, enabling faster incident response and improved operational agility....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.