NVIDIA Announces Mistral NeMo 12B NIM

NVIDIA Announces Mistral NeMo 12B NIM

The Six Five team discusses NVIDIA announces Mistral NeMo 12B NIM.

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.

Transcript:

Patrick Moorhead: NVIDIA announces Mistral NeMo 128B NIM. What is that gobbledygook? So first of all, Mistral is a model company. We all know what NVIDIA is, and they co-developed a 12-billion parameter NVIDIA inference microservice together. Well, that’ll be out later, but you can get it on the AI service today. So essentially, what they did is they came together, and this model was trained on the NVIDIA DGX cloud AI platform, and it leveraged the NVIDIA tensor RTLLM and the NVIDIA NeMo development platform to do this. So what does all this mean or, actually, let me give you some of the deets here. You can run this model locally. This is targeted for enterprises. It’s very small. You can even run it on classic, what would be considered NVIDIA accelerators for machine learning, not for large language models. So you can run this thing on an L40S. You can run this on a consumer RTX 4090, an RTX 45, even an RTX 4500. It is distributed via hugging face with an Apache 2.0. It’s available now as a service from ai.NVIDIA.com, and the NIM is expected soon.

So what can this model do or, actually, what’s the benefit of having a smaller model with higher accuracy? First of all is you don’t have to run something on a $30,000 card. You can run it more like on a $5,000 card. And what can you do with this? This is for chatbots, conversational agents, multilingual translations, co-generation and summarization, and basically reasoning and world knowledge type of stuff. So this might be something you would want to use for customer service or if you wanted to put a front end in human resources. So pretty cool, but first and foremost, by the way, it’s FP8 as well, which means it takes less resources. Obviously, you want to dial. That’s not as, let’s say, accurate as FP16, but uses around half of the resources. NetNet, we talked about software being the real biggest mote that NVIDIA has. I’m convinced that somebody can create very competitive hardware. We’ve seen it from AMD and I’m expecting that from Intel, but when you look at the entire solution and going from low-level drivers to libraries, to machine learning frameworks, to LLM models deployed over NIM, you have a very, very large mode.

Daniel Newman: Yeah, Pat, it’s such a large mote as I’ve had to talk to a number of media outlets about it that their ability to outinnovate the market by years is creating this vacuum of pressure, but I mean, is it really their fault for getting it right? I don’t know. I mean, look, in the end, we need to be able to deploy models that can commingle public data and private data, and they need to be able to do so efficiently to create text and chat and generative content and assets. And the bottom line is that they’ve done it in a way that’s more effective and efficient, and this is just one example of that. This is the way these complex, high-technical debt enterprises that are full of data that want to be able to write software to a GPU to create an application to benefit from AI. This is the package, dude. This is what we’ve got here. So look, the NetNet is what you just said. I mean, look, they’re doing a lot of things right. They’re making it easy, they’re making it accessible. By the way, they’re creating forces of stickiness that are going to outlast the innovation of competition.

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Related Insights
Can IFS Digital Workers Redefine Utility Field Operations, or Will Integration Stall Ambitions?
June 8, 2026

Can IFS Digital Workers Redefine Utility Field Operations, or Will Integration Stall Ambitions?

Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Di at Futurum, examines IFS Digital Workers and their potential to revolutionize utility field operations through agentic AI, while assessing...
Can Databricks Maintain Its Data + AI Summit Lead as Agentic AI Raises the Stakes?
June 8, 2026

Can Databricks Maintain Its Data + AI Summit Lead as Agentic AI Raises the Stakes?

With 51% of enterprises prioritizing agentic AI tools, Databricks' 2026 Data + AI Summit showcases how the company plans to lead the next era of intelligent data platforms while facing...
Broadcom Q2 FY 2026 VMware Stability Supports AI-Led Semiconductor Expansion
June 8, 2026

Broadcom Q2 FY 2026: VMware Stability Supports AI-Led Semiconductor Expansion

Futurum Research reviews Broadcom’s Q2 FY 2026 earnings, focusing on AI semiconductor scaling, networking mix expectations, and VMware’s linkage to server buildouts ahead of Q3 guidance....
IBM Maps a $10 Billion Path to Fault-Tolerant Quantum Computing
June 8, 2026

IBM Maps a $10 Billion Path to Fault-Tolerant Quantum Computing

Brendan Burke, Research Director at Futurum, examines IBM's $10 billion investment in fault-tolerant quantum computing, targeting 2029 delivery of Quantum Starling and establishing commercial-scale systems....
Can Parallel Retrieval Redefine Enterprise AI Search Speed and Quality?
June 6, 2026

Can Parallel Retrieval Redefine Enterprise AI Search Speed and Quality?

Databricks' upgraded Agent Bricks Knowledge Assistant achieves 2x faster answer generation and 3x faster search latency through parallel test-time scaling, redefining enterprise AI search performance....
Will Glean's NVIDIA Nemotron 3 Ultra Integration Shift the Enterprise AI Stack?
June 6, 2026

Will Glean’s NVIDIA Nemotron 3 Ultra Integration Shift the Enterprise AI Stack?

Glean's integration of NVIDIA Nemotron 3 Ultra marks a pivotal moment in enterprise AI, where model flexibility and infrastructure alignment become strategic competitive advantages for buyers seeking cost-effective, high-context solutions....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.