Menu

Groq Meta LLAMA-2 70B Parameters 100 tps Milestone

Groq Meta LLAMA-2 70B Param 100 tps Milestone

The Six Five team discusses Groq’s milestone of running Llama-2 70B at more than 100 tokens per second.

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.

Transcript:

Daniel Newman: Groq made a pretty big announcement about a hundred tokens per user per second. Pat, what does that mean?

Patrick Moorhead: Yeah, so good question. So first of all, Groq is a company that was founded by the folks that did the Google TPU. So smart cookies. And in my vernacular, they’re creating an ASIC to tackle first inference and then training. As we talked about many times on this show, an ASIC is more efficient than a GPU at doing certain things. And then the challenge is putting a programmatic layer on top of the ASIC to make it programmable. And then there’s Llama 2. So Llama 2 is an open source model that came out of Meta that everybody but trillion-dollar companies can take advantage of for free. And essentially it’s all the rage, right? Open models, right? Because we don’t want one company to have their model.

And what do I mean by closed models, right? So OpenAI and ChatGPT is a closed model. Bard is a closed system as well. So now, you have in the enterprise world at least everybody’s saying, “Hey, it’s about a combination of proprietary and open models distributed through somebody like a hugging face.” And then the 70 billion parameter model where they were literally according to them. And I can’t find any data that says this is not, it’s the fastest performance on Llama 2 70 billion parameter at over a hundred tokens per second per user. And the reason tokens are important as tokens determine the amount of data that can go into the prompt or they can go into the grounding.

So this has a lot to do with the pricing as well. So the cool part is that the cost is just extraordinarily lower to do this. And Dan, you hit this on the NVIDIA piece. Groq says that on a workload like this you get three X lower total cost of ownership from the inception, which is really great value, right? Those are comparisons using an 80-node NVIDIA A100 SuperPOD is $27 million, and H100 SuperPOD is $39 million. And a Groq 80 node system is $18 million. So again, competition is good. Dan, that’s a theme on our show. We say it every day. Competition matters. And one final thing, current silicon is 14 nanometer. Imagine when they get to four nanometer or five nanometer, performance and power should be amazing.

Daniel Newman: Absolutely. So I’m going to keep running. I’ll just say in the press release I did comment availability, Pat. I mean, you can actually buy these things. I just want to point that out. These are actually available which and surprise people wouldn’t want to capitalize on that.

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Related Insights
CIO Take Smartsheet's Intelligent Work Management as a Strategic Execution Platform
December 22, 2025

CIO Take: Smartsheet’s Intelligent Work Management as a Strategic Execution Platform

Dion Hinchcliffe analyzes Smartsheet’s Intelligent Work Management announcements from a CIO lens—what’s real about agentic AI for execution at scale, what’s risky, and what to validate before standardizing....
Will Zoho’s Embedded AI Enterprise Spend and Billing Solutions Drive Growth
December 22, 2025

Will Zoho’s Embedded AI Enterprise Spend and Billing Solutions Drive Growth?

Keith Kirkpatrick, Research Director with Futurum, shares his insights on Zoho’s latest finance-focused releases, Zoho Spend and Zoho Billing Enterprise Edition, further underscoring Zoho’s drive to illustrate its enterprise-focused capabilities....
NVIDIA Bolsters AI/HPC Ecosystem with Nemotron 3 Models and SchedMD Buy
December 16, 2025

NVIDIA Bolsters AI/HPC Ecosystem with Nemotron 3 Models and SchedMD Buy

Nick Patience, AI Platforms Practice Lead at Futurum, shares his insights on NVIDIA's release of its Nemotron 3 family of open-source models and the acquisition of SchedMD, the developer of...
Will a Digital Adoption Platform Become a Must-Have App in 2026?
December 15, 2025

Will a DAP Become the Must-Have Software App in 2026?

Keith Kirkpatrick, Research Director with Futurum, covers WalkMe’s 2025 Analyst Day, and discusses the company’s key pillars for driving success with enterprise software in an AI- and agentic-dominated world heading...
Broadcom Q4 FY 2025 Earnings AI And Software Drive Beat
December 15, 2025

Broadcom Q4 FY 2025 Earnings: AI And Software Drive Beat

Futurum Research analyzes Broadcom’s Q4 FY 2025 results, highlighting accelerating AI semiconductor momentum, Ethernet AI switching backlog, and VMware Cloud Foundation gains, alongside system-level deliveries....
Oracle Q2 FY 2026 Cloud Grows; Capex Rises for AI Buildout
December 12, 2025

Oracle Q2 FY 2026: Cloud Grows; Capex Rises for AI Buildout

Futurum Research analyzes Oracle’s Q2 FY 2026 earnings, highlighting cloud infrastructure momentum, record RPO, rising AI-focused capex, and multicloud database traction driving workload growth across OCI and partner clouds....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.