Menu

Groq Meta LLAMA-2 70B Parameters 100 tps Milestone

Groq Meta LLAMA-2 70B Param 100 tps Milestone

The Six Five team discusses Groq’s milestone of running Llama-2 70B at more than 100 tokens per second.

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.

Transcript:

Daniel Newman: Groq made a pretty big announcement about a hundred tokens per user per second. Pat, what does that mean?

Patrick Moorhead: Yeah, so good question. So first of all, Groq is a company that was founded by the folks that did the Google TPU. So smart cookies. And in my vernacular, they’re creating an ASIC to tackle first inference and then training. As we talked about many times on this show, an ASIC is more efficient than a GPU at doing certain things. And then the challenge is putting a programmatic layer on top of the ASIC to make it programmable. And then there’s Llama 2. So Llama 2 is an open source model that came out of Meta that everybody but trillion-dollar companies can take advantage of for free. And essentially it’s all the rage, right? Open models, right? Because we don’t want one company to have their model.

And what do I mean by closed models, right? So OpenAI and ChatGPT is a closed model. Bard is a closed system as well. So now, you have in the enterprise world at least everybody’s saying, “Hey, it’s about a combination of proprietary and open models distributed through somebody like a hugging face.” And then the 70 billion parameter model where they were literally according to them. And I can’t find any data that says this is not, it’s the fastest performance on Llama 2 70 billion parameter at over a hundred tokens per second per user. And the reason tokens are important as tokens determine the amount of data that can go into the prompt or they can go into the grounding.

So this has a lot to do with the pricing as well. So the cool part is that the cost is just extraordinarily lower to do this. And Dan, you hit this on the NVIDIA piece. Groq says that on a workload like this you get three X lower total cost of ownership from the inception, which is really great value, right? Those are comparisons using an 80-node NVIDIA A100 SuperPOD is $27 million, and H100 SuperPOD is $39 million. And a Groq 80 node system is $18 million. So again, competition is good. Dan, that’s a theme on our show. We say it every day. Competition matters. And one final thing, current silicon is 14 nanometer. Imagine when they get to four nanometer or five nanometer, performance and power should be amazing.

Daniel Newman: Absolutely. So I’m going to keep running. I’ll just say in the press release I did comment availability, Pat. I mean, you can actually buy these things. I just want to point that out. These are actually available which and surprise people wouldn’t want to capitalize on that.

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Related Insights
Glean Doubles ARR to $200M. Can Its Knowledge Graph Beat Copilot
April 3, 2026

Glean Doubles ARR to $200M. Can Its Knowledge Graph Beat Copilot?

Nick Patience, VP & Practice Lead at Futurum, examines Glean's platform evolution from enterprise search to agentic AI, as it doubles ARR to $200M and battles Microsoft 365 Copilot for...
HP IQ Finally Brings Useful On-Device AI To Workspaces
April 3, 2026

HP IQ Finally Brings Useful On-Device AI To Workspaces

Olivier Blanchard, Research Director at Futurum, shares insights on HP IQ, HP’s workplace intelligence layer combining on-device AI, proximity-based connectivity, and IT control across devices and workflows....
Can UK Public Sector Security Keep Up With Its Own Digital Growth?
April 2, 2026

Can UK Public Sector Security Keep Up With Its Own Digital Growth?

The UK public sector's complex digital infrastructure has outpaced manual audits. Palo Alto Networks offers visibility to uncover critical security gaps in government and NHS environments....
Are Browsers the New Enterprise Attack Surface No One Is Ready to Defend?
April 2, 2026

Are Browsers the New Enterprise Attack Surface No One Is Ready to Defend?

Browser security is now the primary enterprise attack surface, with 95% of organizations experiencing browser-originated incidents that legacy tools cannot defend....
CrowdStrike Deepens Agentic SOC Strategy Across Partners, Services, and Devices
April 1, 2026

CrowdStrike Deepens Agentic SOC Strategy Across Partners, Services, and Devices

Fernando Montenegro, VP & Practice Lead for Cybersecurity & Resilience at Futurum, examines CrowdStrike’s agentic SOC expansion across partners, IBM, and Intel, and what it means for security execution and...
LevelBlue–SentinelOne Partnership: Does Unified Security Improve Outcomes?
April 1, 2026

LevelBlue–SentinelOne Partnership: Does Unified Security Improve Outcomes?

Fernando Montenegro, VP & Practice Lead for Cybersecurity & Resilience at Futurum, analyzes the LevelBlue SentinelOne partnership and its focus on integrating threat intelligence, AI detection, and response to improve...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.