The Six Five team discusses NVIDIA announces Mistral NeMo 12B NIM.
If you are interested in watching the full episode you can check it out here.
Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.
Transcript:
Patrick Moorhead: NVIDIA announces Mistral NeMo 128B NIM. What is that gobbledygook? So first of all, Mistral is a model company. We all know what NVIDIA is, and they co-developed a 12-billion parameter NVIDIA inference microservice together. Well, that’ll be out later, but you can get it on the AI service today. So essentially, what they did is they came together, and this model was trained on the NVIDIA DGX cloud AI platform, and it leveraged the NVIDIA tensor RTLLM and the NVIDIA NeMo development platform to do this. So what does all this mean or, actually, let me give you some of the deets here. You can run this model locally. This is targeted for enterprises. It’s very small. You can even run it on classic, what would be considered NVIDIA accelerators for machine learning, not for large language models. So you can run this thing on an L40S. You can run this on a consumer RTX 4090, an RTX 45, even an RTX 4500. It is distributed via hugging face with an Apache 2.0. It’s available now as a service from ai.NVIDIA.com, and the NIM is expected soon.
So what can this model do or, actually, what’s the benefit of having a smaller model with higher accuracy? First of all is you don’t have to run something on a $30,000 card. You can run it more like on a $5,000 card. And what can you do with this? This is for chatbots, conversational agents, multilingual translations, co-generation and summarization, and basically reasoning and world knowledge type of stuff. So this might be something you would want to use for customer service or if you wanted to put a front end in human resources. So pretty cool, but first and foremost, by the way, it’s FP8 as well, which means it takes less resources. Obviously, you want to dial. That’s not as, let’s say, accurate as FP16, but uses around half of the resources. NetNet, we talked about software being the real biggest mote that NVIDIA has. I’m convinced that somebody can create very competitive hardware. We’ve seen it from AMD and I’m expecting that from Intel, but when you look at the entire solution and going from low-level drivers to libraries, to machine learning frameworks, to LLM models deployed over NIM, you have a very, very large mode.
Daniel Newman: Yeah, Pat, it’s such a large mote as I’ve had to talk to a number of media outlets about it that their ability to outinnovate the market by years is creating this vacuum of pressure, but I mean, is it really their fault for getting it right? I don’t know. I mean, look, in the end, we need to be able to deploy models that can commingle public data and private data, and they need to be able to do so efficiently to create text and chat and generative content and assets. And the bottom line is that they’ve done it in a way that’s more effective and efficient, and this is just one example of that. This is the way these complex, high-technical debt enterprises that are full of data that want to be able to write software to a GPU to create an application to benefit from AI. This is the package, dude. This is what we’ve got here. So look, the NetNet is what you just said. I mean, look, they’re doing a lot of things right. They’re making it easy, they’re making it accessible. By the way, they’re creating forces of stickiness that are going to outlast the innovation of competition.
Author Information
Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.
From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.
A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.
An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.