The Six Five Team discusses recent published benchmarks from Hugging Face on Intel Habana 2 compared to NVIDIA H100 and A100.
If you are interested in watching the full episode you can check it out here.
Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.
Transcript:
Patrick Moorhead: Hugging Face actually published some benchmarks that show that Intel Habana 2 leads Nvidia A100 and H100 on a very specific model. Dan, why don’t you kick this one-off?
Daniel Newman: And there you have it. That was it. That was all I had. But no, I mean, look, first of all, you ended where we’re kind of picking up, but in the near term we’ve got this gold rush to train, and this gold rush to train is basically largely a train that only goes to one station, and that’s Jensen Huang’s kitchen. That kitchen is going to be so awesome by next year, I don’t know, but I think there’s a printing press being created in the Nvidia corporate headquarters.
But having said that, in all seriousness, whether it’s talking to Thomas Kurian about the capabilities of the TPU, whether it’s Intel, whether it’s AMDs MI offerings, whether it’s Groq’s accelerators, whether it’s Lattice semis, FPGAs with vision capabilities, there are a lot of semiconductors that can do AI, but right now the market’s impression is that there’s really only one, and that’s because there’s this gold rush to train large models and to build foundational models.
Because right now, the ability to use AI in your business using unique data sets is sort of the next frontier of opportunity for productivity gains and efficiency gains. There’s also this kind of overwhelming impression pattern. This is where I think we can kind of have a little bit of a convo/a debate that Nvidia is the only company that can do it, and that basically there’s a reason companies are waiting three to six quarters depending on who they are to get their hands on an A100 or an H100. And it has to do with a combination of the fact that Nvidia has really what’s considered to be a full stack of capabilities, the programming developer ecosystem around CUDA, but also just the fact that it’s sort of the universal and most capable, most powerful.
But the truth is there’s a couple of trend lines going on that are important to note. The one is, well, training is the next immediate frontier of opportunity. Longer term, the ability to accelerate workloads and even do that on traditional general purpose compute is actually a very large opportunity around inference. The other thing is that when you’re training very specific workloads, there is something to be argued that an ASIC, a semi that’s built very specific to accelerate a certain type of workload could end up outperforming, and that’s what Hugging Face…
So this wasn’t an Intel piece, but it’s a partnership and a relationship. Intel and Hugging Face have very publicly been out there that they do have this relationship, but effectively announced that when they were training these vision language models, these very specific kinds of models that the Habana Gaudi 2, which is the ASIC from Intel, actually performed substantially better than both the A100 and the company’s newest and most powerful GPU, its A100.
And so while this, again, Pat, I think is, it’s a little apples and oranges because obviously when you’re buying Nvidia, you could argue that I don’t know what we’re doing in all cases with this. So we want to have this most powerful general capability to do all things AI, but with many companies building out specific foundational models, specific language models, the idea of being able to train more efficiently, and by the way more price efficiently considerably becomes very interesting. So this whole bridge tower on Habana Gaudi 2 I think brings a really interesting debate, Pat, and it’s kind of two debates for me. One is … and by the way, we had a kind of a similar conversation around Groq with the LPU, they call it, right? An LPU, language processing. But is that what is the capacity and aptitude for companies to go down the path of using an ASIC or a chip very specific to that?
And two, what are the constraints, meaning what are the reasons, knowing that these are actually available today, they can be utilized right now in instances both in the public cloud and purchase for on-prem, that companies aren’t more thoughtfully considering the utilization of this technology from both an economics and a capability standpoint, Pat? And so to me, like I said, it’s early days, but I think what we’re starting here is a real conversation about the fact that there is a very powerful market position around the Nvidia products, but there are competitive offerings in the other forms, and that in many cases for specific kinds of workloads could become very compelling. So rather than droning on, I just want to put that out there and maybe bounce it to you and maybe go back and forth a bit on this one.
Patrick Moorhead: Yeah, I like to get back to the basics. I’ve been around chips for over 30 years, and one thing has always been true. There’s been a continuum of efficiency and programmability. The most efficient, the less programmable, and going from left to right, you have the CPU, the GPU, the FPGA and the ASIC. The challenge with the ASIC is always, again, like you said, is how do I program that? And what they do is they put the flexibility in the software to be able to run different workloads, but when you get it there, it’s going to be a heck of a lot more efficient than a GPU or a CPU.
To be clear, Nvidia does have ASIC blocks on its GPU, right? They have transformers, they have some … Heck, on Intel Xeon has four different ways to accelerate AI. So it’s really this continuum. So it’s not ASIC good, GPU bad or GPU bad, ASIC good. GPU has taken advantage of the flexibility that particularly hyperscalers want to be able to go to the next best thing. I mean, heck, a year ago we were still talking about recommendation engines and visual AI and object detection and recognition and self-driving cars. Right now in this generative age, we’re just doing some of the most wackadoodle stuff out there with the GPU. How do you think all of these initial LLMs were trained? They were trained on the A100, not the H100. The H100 is just a beast of a device that cranks out foundational models a lot more flexible. And that’s the key, is flexibility.
One thing I was interested in this one as well was that this wasn’t inference, and this wasn’t training, it was fine-tuning, and also I found it interesting that it wasn’t Intel people. These were Hugging Face people. So that gives a tremendous amount of credibility. But what all the listeners and viewers need to understand is that Habana Gaudi 2 won’t have the same level of advantage over Nvidia or even AMD in all use cases. This is a very specific use case using a very specific model, which was very similar to what we saw with Groq using Llama-2-70b. I also don’t think that this can claim to be a large model as well, given the size. This is not one of these 70 billion parameter models. It’s almost a billion parameters in total. But anyways, read the notes, read the show notes. Dan, any final comments?
Daniel Newman: No, I mean, look, it’s an interesting inflection. There’s a lot of market concern in question right now whether or not there are other companies that are set to benefit from this AI gold rush. The disproportionate amount of revenue that’s gone in one direction, the train only goes to one station, does bring up some relevant discussion points about healthy competitive ecosystems, about the need for alternative routes for enterprises, hyperscalers, and small businesses to be able to benefit from AI. And in the longer run, how much do people that are running Salesforce with some sort of attrition risk algorithm care about what hardware that’s being done on? And I think over time, companies are going to look for efficiencies, especially on the inferencing side. So I think it’s an interesting debate conversation to keep having, and I don’t expect that it’ll be the last time we’re going to have it.
Patrick Moorhead: Yeah, Dan, I mean graphics used to be done on a CPU, and then they put fixed functions in to do 2D graphics. We used to have an accelerator to use with spreadsheets and to crunch numbers, right? It was a plugin chip, right? It was a math accelerator and then it got sucked into the processors. So historically these things should calm down, but until they aren’t calm, GPUs are going to have an operational advantage.
Author Information
Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.
From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.
A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.
An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.