Groq Goes LLaMa

The Six Five team discusses Groq going LLaMa.

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.


Patrick Moorhead: Groq is going LLaMA. Let’s reintroduce our audience to Groq and we’re going to have to explain what this LLaMA thing is too.

Daniel Newman: So LLaMA is the Meta’s large language model that has been worked on and was released last month by Meta Platforms, Facebook’s parent, and just like, oh, what’s going on with ChatGPT? It’s iteration and attempt to power bots and generate human-like effects. So Groq is another company we work with. It’s a chip startup focused on AI. The company’s kind of ethos is bringing the cost of compute down to zero, which is very interesting because the cost of compute with Generative AI is going which way, Pat?

Patrick Moorhead: Well, the cost per doing it is going down over time, but people want more, so it’s going up.

Daniel Newman: So the cost per query on GPT is significantly higher. It makes me think about a rollercoaster going up where it’s like tic tic tic… What you’re getting is you’re getting mass adoption of generative capabilities. We know with Bing, everybody went tomorrow and started using Bing. The amount of demand on Microsoft’s data centers and compute would be exponential. And, yes, you’re absolutely right, Pat, over time the market would figure out how to do it for less. But, overall, the amount of compute resources required to do a Generative AI query is substantially higher than a traditional search query.

The amount of compute using GPUs, by the way, which is what most Generative AI is being used, it’s mostly being trained on NVIDIA. I think they have about 90% of that market right now. And GPUs are relatively inefficient. They suck a lot of power. And in a world where sustainability is one of the underlying governances of every business, is we want to be water-positive. We’ve heard AWS. We’ve heard Microsoft. We want to lower our carbon footprint. Well, we’ve got about 1% of the world’s energy right now being consumed by data centers and that number is going up.

So, anyways, it’s kind of a runaround of what’s going on here with Groq. Well, the interesting thing is GPUs and compute and this relatively rapid correlation of growth in compute utilization means that we’re going to see all these challenges about costs go up. How do you deliver to our customers? What is Microsoft, Salesforce, Google going to spend? How do they build out their data centers to support this? And a company like Groq becomes kind of interesting because it has very unique capabilities of software and compiling to be able to take a model like LlaMA – and this is what it did – it took the model and moved it from the NVIDIA GPUs and recompiled the code and started running it on its GPUs. And the findings were that they could do it more efficiently and lower utilization of power.

And this becomes an interesting question mark, is are there other chip players besides NVIDIA that have a chance to really be influential and, potentially, be disruptive? We know AMD is leaning hard in on AI. I’ve been in a lot of conversations with Intel. Intel with Habana Gaudi and open oneAPI. Open-source is looking at an approach with Groq. But these startups, companies like Groq, Cerebrus, like Sambe Nova, are trying to build very specific application specific chips that could, potentially, be disruptive to NVIDIA, run a model more efficiently, but the hard part, Pat, is when you have CUDA and you have all these developers building, moving models from one hardware set to another is really difficult.

And so Groq did this in just a few days, and that’s kind of the really interesting thing. I talked to their CEO about it is the ability to use their compiler without tons of developers having to optimize code to be able to move it from one piece of hardware to another was really, really interesting and should be exciting to the market because we have to solve those two problems, Pat. We have to solve the cost problem. I mean, NVIDIA’s going to make a fortune on this Generative AI movement, and we’ve seen its stock rip because of it, but there needs to be a challenger here.

And because the other side of it, Pat, the sustainability side of it, is we need to look at doing it more efficiently. I know you and I are all about measurable sustainability. We talk about this all the time. Not doing it for the sake of green washing and marketing. Do it for the sake of the fact that we really have a challenge of creating enough energy to support all this growth. So using inefficient chips to do things like Generative AI long term is not the answer. So either the GPUs need to become more efficient or we need to look at this custom silicon, these ASICs, that could, potentially, run these large models at a lower cost.

Patrick Moorhead: Good analysis there, Daniel. And Groq is one of the players that I do think is going to be left standing looks like. I mean, it appears to me they’ve managed their cash, their investments. And one of the biggest problems if you talk to end users, people who try to use this, is the software, and they would like a more flexible software infrastructure and they do want more competition. The benefit of a GPU is that as these models change so much, their programmability, the trade-off from being the most efficient, kind of rears its head, right? And that’s one of the benefits. And, heck, even people do training and inference on CPUs. It’s actually people do more training and inference on CPUs than they do on GPUs and that’s when the data center is dark and they’re trying to use resources. So different strokes for different folks.

At some point these models will change. And I don’t know… I keep thinking it’s going to be five years, but look at the growth, look at the size of these models that this didn’t come out of nowhere. In fact, the industry had been talking about these large models, natural language models, for forever. So I’d like to see Groq roll out some customers on this, as well. I do applaud them, though for doing this disclosure and giving information out. The company doesn’t disclose information like this, but I hope it gets them some attention and other people evaluating and using their silicon.

Daniel Newman: It’s the moment, Pat. I mean, if companies like these aren’t talking in this moment, when is the moment? And it was nice to see Reuters… I think it was Stephen Nellis maybe that picked it up, covered it? Yeah, it was-

Patrick Moorhead: Yeah.

Daniel Newman: Yeah. I’m just saying it was good to see someone pick it up because, like I said, it’s almost as if this last month that Microsoft and NVIDIA are the only two companies in this space and there are other companies that we need to pay attention to.

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.


Latest Insights:

In this episode of the Futurum Tech Webcast-Interview Series, the Futurum Group’s Dave Raffo is joined by Greg White of Nutanix and Pritish Nilangi of AMD to talk technologies required to run modern applications in hybrid and multi clouds.
The Futurum Group's Steven Dickens and AWS's Ajay Nair delve into key topics pertaining to serverless architectures and the role that AWS has played in bringing serverless services to the market.
Elevating Customer Trust and Digital Privacy in an Era of Increasing Data Breaches
The Futurum Group’s Krista Macomber and Steven Dickens assess the launch of Breached Data by Telesign, which aligns with broader cybersecurity trends, including a shift toward proactive measures and data-centric security.
On this episode of The Six Five – On The Road, hosts Daniel Newman and Patrick Moorhead welcome Intel’s CEO, Pat Gelsinger for a conversation on the “Siliconomy” and how Intel is “Getting the Geek Back”at Intel InnovatiON 2023.