Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit

Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit

The News: On August 8, startup AI compute provider Groq announced it now runs LLM Llama-2 (70 billion parameters) at more than 100 tokens per second, per user on a Groq Language Processing Unit (LPU). Tokenization is a process large language models (LLMs) use to break text into smaller, manageable units. It allows LLMs to process text more efficiently by reducing memory requirements and compute complexity. The more tokens per second per user an LLM can process, the faster the LLM will load results for users and the less AI compute the application will require.

Here are the pertinent details:

  • According to Groq, in similar tests, ChatGPT loads at 40-50 tokens per second, and Bard at 70 tokens per second on typical GPU-based computing systems.
  • Context for 100 tokens per second per user – A user could generate a 4,000-word essay in just over a minute.
  • Groq executives told Futurum Group that not only does the company’s LPU system improve AI compute efficiency, it also improves the developer’s user experience, making it easier for developers to create LLM-based apps.

Read the full Press Release on Groq’s token test here.

In related news, on March 13, Groq reported they had LLaMA in production on one of their LPU systems in 3 days. The rapid LLaMA ramp-up is noteworthy because Meta originally developed LLaMA for NVIDIA GPU chips, demonstrating a ready-to-use alternative to legacy GPU technology.

Read the Press Release detailing Groq’s rapid spin-up of LLaMA on an LPU system here.

Groq Ushers In a New AI Compute Paradigm: The Language Processing Unit

Analyst Take: With continued concern about the cost and availability of AI compute, Groq’s token test and ability to bring AI compute into production rapidly could mean real competition for NVIDIA’s GPU business and could thwart CPU makers’ designs on capturing the AI inference compute market. Is the AI compute business about to be disrupted? The answers to the following key questions will determine the outcome.

Who are the likely buyers for Groq LPUs?

According to Groq executives The Futurum Group spoke with, there are three types of customers – 1) Hyperscalers/data centers, 2) Global 3000 enterprises, and 3) Everybody else.

For Hyperscalers/data centers, the appeal is their AI inference compute runs more efficiently, allowing them to reduce their dependence on GPUs and their related power and operational costs. As generative AI applications become more popular, AI compute costs must come down to achieve application scale, and price elasticity for AI compute will increasingly come into play for cloud providers. Groq’s LPUs help address keeping those costs in line.

Groq executives told The Futurum Group they believe the Global 3000 represents a significant market for LPUs. Enterprises are increasingly telling Groq they prefer to have complete control over their proprietary data, and many are contemplating increased on-premises data centers versus working solely with cloud data center providers. In theory, some enterprises could experiment with the Groq option, as it represents a smaller capital investment and appealing speed to market.

Enterprises outside the Global 3000 must use cloud providers for AI compute, and they are interested in the potential lower cost to run AI inference. They are also perhaps the market that most urgently desires access to Groq’s potential ease of development velocity.

Can Groq handle the potential demand for LPUs?

It is unclear how much production capacity Groq has for its LPUs. A Forbes article about the company in 2021 pegged the number of employees at less than 250. If Groq’s claims are well-founded, they might find themselves in a situation similar to their competitor NVIDIA – working to keep up with market demand for their chips. They may seek to a bigger partner or could get acquired as well.

How will NVIDIA counter?

If Groq’s LPUs are a viable option for AI inference compute, and if Groq does not have significant issues with production capacity of said LPUs, then it is likely NVIDIA will see them as a legitimate competitive threat. In the short term, NVIDIA might put pressure on hyperscalers and data center vendors who are not developing their own GPUs, perhaps by offering aggressive pricing, contracts, and value-added elements of the AI stack, like developer tools. In the long term, nearly all chipmakers are thinking about the best designs to handle AI compute loads and there will be competition for the next generation. NVIDIA knows this and is investing significantly in next-generation AI compute as well.

Note: Daniel Newman, CEO of The Futurum Group, is an investor in Groq.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. 

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

GroqDay: Groq Sets its Sights on Accelerating the Speed of Artificial Intelligence

Groq Goes LLaMa

The Cost of The Next Big Thing – Artificial Intelligence

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.

Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

SHARE:

Latest Insights:

Commvault Addresses the Rise of Identity-Based Attacks With Automated Active Directory Recovery, and the Ability to Protect Active Directory Alongside Entra ID
Krista Case, Research Director at The Futurum Group, shares her insights on Commvault’s automated recovery of Active Directory forests.
Marvell Spotlights How Incorporation of Its CPO Technology Capabilities Can Accelerate XPU Architecture Innovation
Futurum’s Ron Westfall explores how Marvell’s CPO portfolio can play an integral role in further demystifying applying customization in the XPU architecture design process, incentivizing hyperscalers to develop custom XPUs that increase the density and performance of their AI servers.
Dr. Howard Rubin, CEO at Rubin Worldwide, joins Greg Lotko and Daniel Newman to reveal how strategic technology investments drive superior economic results.
On this episode of The Six Five Webcast, hosts Patrick Moorhead and Daniel Newman discuss Meta, Qualcomm, Nvidia and more.

Thank you, we received your request, a member of our team will be in contact with you.