The News: On August 8, startup AI compute provider Groq announced it now runs LLM Llama-2 (70 billion parameters) at more than 100 tokens per second, per user on a Groq Language Processing Unit (LPU). Tokenization is a process large language models (LLMs) use to break text into smaller, manageable units. It allows LLMs to process text more efficiently by reducing memory requirements and compute complexity. The more tokens per second per user an LLM can process, the faster the LLM will load results for users and the less AI compute the application will require.

Here are the pertinent details:

According to Groq, in similar tests, ChatGPT loads at 40-50 tokens per second, and Bard at 70 tokens per second on typical GPU-based computing systems.
Context for 100 tokens per second per user – A user could generate a 4,000-word essay in just over a minute.
Groq executives told Futurum Group that not only does the company’s LPU system improve AI compute efficiency, it also improves the developer’s user experience, making it easier for developers to create LLM-based apps.

Read the full Press Release on Groq’s token test here.

In related news, on March 13, Groq reported they had LLaMA in production on one of their LPU systems in 3 days. The rapid LLaMA ramp-up is noteworthy because Meta originally developed LLaMA for NVIDIA GPU chips, demonstrating a ready-to-use alternative to legacy GPU technology.

Read the Press Release detailing Groq’s rapid spin-up of LLaMA on an LPU system here.

Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit

Analyst Take: With continued concern about the cost and availability of AI compute, Groq’s token test and ability to bring AI compute into production rapidly could mean real competition for NVIDIA’s GPU business and could thwart CPU makers’ designs on capturing the AI inference compute market. Is the AI compute business about to be disrupted? The answers to the following key questions will determine the outcome.

Who are the likely buyers for Groq LPUs?

According to Groq executives The Futurum Group spoke with, there are three types of customers – 1) Hyperscalers/data centers, 2) Global 3000 enterprises, and 3) Everybody else.

For Hyperscalers/data centers, the appeal is their AI inference compute runs more efficiently, allowing them to reduce their dependence on GPUs and their related power and operational costs. As generative AI applications become more popular, AI compute costs must come down to achieve application scale, and price elasticity for AI compute will increasingly come into play for cloud providers. Groq’s LPUs help address keeping those costs in line.

Groq executives told The Futurum Group they believe the Global 3000 represents a significant market for LPUs. Enterprises are increasingly telling Groq they prefer to have complete control over their proprietary data, and many are contemplating increased on-premises data centers versus working solely with cloud data center providers. In theory, some enterprises could experiment with the Groq option, as it represents a smaller capital investment and appealing speed to market.

Enterprises outside the Global 3000 must use cloud providers for AI compute, and they are interested in the potential lower cost to run AI inference. They are also perhaps the market that most urgently desires access to Groq’s potential ease of development velocity.

Can Groq handle the potential demand for LPUs?

It is unclear how much production capacity Groq has for its LPUs. A Forbes article about the company in 2021 pegged the number of employees at less than 250. If Groq’s claims are well-founded, they might find themselves in a situation similar to their competitor NVIDIA – working to keep up with market demand for their chips. They may seek to a bigger partner or could get acquired as well.

How will NVIDIA counter?

If Groq’s LPUs are a viable option for AI inference compute, and if Groq does not have significant issues with production capacity of said LPUs, then it is likely NVIDIA will see them as a legitimate competitive threat. In the short term, NVIDIA might put pressure on hyperscalers and data center vendors who are not developing their own GPUs, perhaps by offering aggressive pricing, contracts, and value-added elements of the AI stack, like developer tools. In the long term, nearly all chipmakers are thinking about the best designs to handle AI compute loads and there will be competition for the next generation. NVIDIA knows this and is investing significantly in next-generation AI compute as well.

Note: Daniel Newman, CEO of The Futurum Group, is an investor in Groq.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

GroqDay: Groq Sets its Sights on Accelerating the Speed of Artificial Intelligence

Groq Goes LLaMa

The Cost of The Next Big Thing – Artificial Intelligence

Author Information

Daniel Newman

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Mark Beccue

Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Trusted by 100+ industry leaders

Featured Case Studies

Analyze

Data & Intelligence

Advise

Research & Advisory

Amplify

Content & Campaigns

Assess

Testing, Labs & Validation

Practice Areas

Featured Insights

Futurum Research 2026: Key Issues and Predictions

2026 Research Agenda: Key Topics and Coverage Areas

Insights

Premium Insights

Newsletter

Media Partners

Podcasts

Video Series

Featured Insights

AI Isn’t Coming for Your Job Yet – and Maybe Never Will

SAP Completes Prior Labs Acquisition to Advance Structured Data AI

Futurum Group

Portfolio Companies

Trusted by 100+ industry leaders

Featured Case Study

Scaling Smarter: How Google Cloud Marketplace Is Reshaping Partner Sales and GTM Strategy

Maximizing ROI with Agentic AI: Why Agentforce Is the Fast Path to Enterprise Value

Futurum and Kearney Reveal CEOs’ Readiness for AI Transformation in Landmark Study

Daniel Newman

Mark Beccue

AI Isn’t Coming for Your Job Yet – and Maybe Never Will

SAP Completes Prior Labs Acquisition to Advance Structured Data AI

Microsoft and Mistral Expand Ties. Sovereignty, or Just Optionality?

ADP’s AI-Driven API Central Enhances Developer Efficiency and Integration Speed

Coforge’s Recognition as an Exceptional Performer: What It Means for the Market

PyTorch Conference North America: A Catalyst for AI Innovation and Collaboration

Benjamin Brown

Analyze

Data & Intelligence

Advise

Research & Advisory

Amplify

Content & Campaigns

Assess

Testing, Labs & Validation

Practice Areas

Featured Insights

Futurum Research 2026: Key Issues and Predictions

2026 Research Agenda: Key Topics and Coverage Areas

Insights

Premium Insights

Newsletter

Media Partners

Podcasts

Video Series

Featured Insights

AI Isn’t Coming for Your Job Yet – and Maybe Never Will

SAP Completes Prior Labs Acquisition to Advance Structured Data AI

Futurum Group

Portfolio Companies

Trusted by 100+ industry leaders

Featured Case Study

Scaling Smarter: How Google Cloud Marketplace Is Reshaping Partner Sales and GTM Strategy

Maximizing ROI with Agentic AI: Why Agentforce Is the Fast Path to Enterprise Value

Futurum and Kearney Reveal CEOs’ Readiness for AI Transformation in Landmark Study

Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit

Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit

Who are the likely buyers for Groq LPUs?

Can Groq handle the potential demand for LPUs?

How will NVIDIA counter?

Other insights from The Futurum Group:

Author Information

Welcome to The Futurum Group

Book a Demo

Welcome

Benjamin Brown

Newsletter Sign-up Form

Thank you, we received your request, a member of our team will be in contact with you.