Menu

NVIDIA Announces Mistral NeMo 12B NIM

NVIDIA Announces Mistral NeMo 12B NIM

The Six Five team discusses NVIDIA announces Mistral NeMo 12B NIM.

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.

Transcript:

Patrick Moorhead: NVIDIA announces Mistral NeMo 128B NIM. What is that gobbledygook? So first of all, Mistral is a model company. We all know what NVIDIA is, and they co-developed a 12-billion parameter NVIDIA inference microservice together. Well, that’ll be out later, but you can get it on the AI service today. So essentially, what they did is they came together, and this model was trained on the NVIDIA DGX cloud AI platform, and it leveraged the NVIDIA tensor RTLLM and the NVIDIA NeMo development platform to do this. So what does all this mean or, actually, let me give you some of the deets here. You can run this model locally. This is targeted for enterprises. It’s very small. You can even run it on classic, what would be considered NVIDIA accelerators for machine learning, not for large language models. So you can run this thing on an L40S. You can run this on a consumer RTX 4090, an RTX 45, even an RTX 4500. It is distributed via hugging face with an Apache 2.0. It’s available now as a service from ai.NVIDIA.com, and the NIM is expected soon.

So what can this model do or, actually, what’s the benefit of having a smaller model with higher accuracy? First of all is you don’t have to run something on a $30,000 card. You can run it more like on a $5,000 card. And what can you do with this? This is for chatbots, conversational agents, multilingual translations, co-generation and summarization, and basically reasoning and world knowledge type of stuff. So this might be something you would want to use for customer service or if you wanted to put a front end in human resources. So pretty cool, but first and foremost, by the way, it’s FP8 as well, which means it takes less resources. Obviously, you want to dial. That’s not as, let’s say, accurate as FP16, but uses around half of the resources. NetNet, we talked about software being the real biggest mote that NVIDIA has. I’m convinced that somebody can create very competitive hardware. We’ve seen it from AMD and I’m expecting that from Intel, but when you look at the entire solution and going from low-level drivers to libraries, to machine learning frameworks, to LLM models deployed over NIM, you have a very, very large mode.

Daniel Newman: Yeah, Pat, it’s such a large mote as I’ve had to talk to a number of media outlets about it that their ability to outinnovate the market by years is creating this vacuum of pressure, but I mean, is it really their fault for getting it right? I don’t know. I mean, look, in the end, we need to be able to deploy models that can commingle public data and private data, and they need to be able to do so efficiently to create text and chat and generative content and assets. And the bottom line is that they’ve done it in a way that’s more effective and efficient, and this is just one example of that. This is the way these complex, high-technical debt enterprises that are full of data that want to be able to write software to a GPU to create an application to benefit from AI. This is the package, dude. This is what we’ve got here. So look, the NetNet is what you just said. I mean, look, they’re doing a lot of things right. They’re making it easy, they’re making it accessible. By the way, they’re creating forces of stickiness that are going to outlast the innovation of competition.

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Related Insights
Will the U.S. Army’s $5.6B Salesforce Deal Set the Standard for Modernization
January 28, 2026

Will the U.S. Army’s $5.6B Salesforce Deal Set the Standard for Modernization?

Keith Kirkpatrick, VP and Research Director at Futurum, shares his insights on Salesforce’s $5.6 billion contract with the U.S Army, and discusses the implications for Salesforce, its competitors, and agentic...
IonQ Buys a Foundry Is Vertical Integration the Path to Fault-Tolerant Quantum
January 28, 2026

IonQ Buys a Foundry: Is Vertical Integration the Path to Fault-Tolerant Quantum?

Futurum’s Nick Patience and Brendan Burke examine why IonQ’s acquisition of SkyWater signals that fault-tolerant quantum computing is now a manufacturing and supply-chain challenge, not just a physics one....
AI Is the Largest Infrastructure Buildout Ever—Are Investments Keeping Up
January 28, 2026

AI Is the Largest Infrastructure Buildout Ever—Are Investments Keeping Up?

Brendan Burke, Research Director at The Futurum Group, examines Jensen Huang’s view of AI as the largest infrastructure buildout in human history and why value is shifting to the application...
Microsoft’s Maia 200 Signals the XPU Shift Toward Reinforcement Learning
January 27, 2026

Microsoft’s Maia 200 Signals the XPU Shift Toward Reinforcement Learning

Brendan Burke, Research Director at Futurum, analyzes Microsoft’s custom Maia 200 architecture and market position. The accelerator supports reinforcement learning with low-precision formats and deterministic networking....
Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference
January 27, 2026

Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference?

Nick Patience, VP and AI Practice Lead at Futurum, examines Amazon’s EC2 G7e instances and how higher GPU memory, bandwidth, and networking change AI inference and graphics workloads....
NVIDIA and CoreWeave Team to Break Through Data Center Real Estate Bottlenecks
January 27, 2026

NVIDIA and CoreWeave Team to Break Through Data Center Real Estate Bottlenecks

Nick Patience, AI Platforms Practice Lead at Futurum, shares his insights on NVIDIA’s $2 billion investment in CoreWeave to accelerate the buildout of over 5 gigawatts of specialized AI factories...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.