Menu

NVIDIA Announces Mistral NeMo 12B NIM

NVIDIA Announces Mistral NeMo 12B NIM

The Six Five team discusses NVIDIA announces Mistral NeMo 12B NIM.

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.

Transcript:

Patrick Moorhead: NVIDIA announces Mistral NeMo 128B NIM. What is that gobbledygook? So first of all, Mistral is a model company. We all know what NVIDIA is, and they co-developed a 12-billion parameter NVIDIA inference microservice together. Well, that’ll be out later, but you can get it on the AI service today. So essentially, what they did is they came together, and this model was trained on the NVIDIA DGX cloud AI platform, and it leveraged the NVIDIA tensor RTLLM and the NVIDIA NeMo development platform to do this. So what does all this mean or, actually, let me give you some of the deets here. You can run this model locally. This is targeted for enterprises. It’s very small. You can even run it on classic, what would be considered NVIDIA accelerators for machine learning, not for large language models. So you can run this thing on an L40S. You can run this on a consumer RTX 4090, an RTX 45, even an RTX 4500. It is distributed via hugging face with an Apache 2.0. It’s available now as a service from ai.NVIDIA.com, and the NIM is expected soon.

So what can this model do or, actually, what’s the benefit of having a smaller model with higher accuracy? First of all is you don’t have to run something on a $30,000 card. You can run it more like on a $5,000 card. And what can you do with this? This is for chatbots, conversational agents, multilingual translations, co-generation and summarization, and basically reasoning and world knowledge type of stuff. So this might be something you would want to use for customer service or if you wanted to put a front end in human resources. So pretty cool, but first and foremost, by the way, it’s FP8 as well, which means it takes less resources. Obviously, you want to dial. That’s not as, let’s say, accurate as FP16, but uses around half of the resources. NetNet, we talked about software being the real biggest mote that NVIDIA has. I’m convinced that somebody can create very competitive hardware. We’ve seen it from AMD and I’m expecting that from Intel, but when you look at the entire solution and going from low-level drivers to libraries, to machine learning frameworks, to LLM models deployed over NIM, you have a very, very large mode.

Daniel Newman: Yeah, Pat, it’s such a large mote as I’ve had to talk to a number of media outlets about it that their ability to outinnovate the market by years is creating this vacuum of pressure, but I mean, is it really their fault for getting it right? I don’t know. I mean, look, in the end, we need to be able to deploy models that can commingle public data and private data, and they need to be able to do so efficiently to create text and chat and generative content and assets. And the bottom line is that they’ve done it in a way that’s more effective and efficient, and this is just one example of that. This is the way these complex, high-technical debt enterprises that are full of data that want to be able to write software to a GPU to create an application to benefit from AI. This is the package, dude. This is what we’ve got here. So look, the NetNet is what you just said. I mean, look, they’re doing a lot of things right. They’re making it easy, they’re making it accessible. By the way, they’re creating forces of stickiness that are going to outlast the innovation of competition.

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Related Insights
CIO Take Smartsheet's Intelligent Work Management as a Strategic Execution Platform
December 22, 2025

CIO Take: Smartsheet’s Intelligent Work Management as a Strategic Execution Platform

Dion Hinchcliffe analyzes Smartsheet’s Intelligent Work Management announcements from a CIO lens—what’s real about agentic AI for execution at scale, what’s risky, and what to validate before standardizing....
Will Zoho’s Embedded AI Enterprise Spend and Billing Solutions Drive Growth
December 22, 2025

Will Zoho’s Embedded AI Enterprise Spend and Billing Solutions Drive Growth?

Keith Kirkpatrick, Research Director with Futurum, shares his insights on Zoho’s latest finance-focused releases, Zoho Spend and Zoho Billing Enterprise Edition, further underscoring Zoho’s drive to illustrate its enterprise-focused capabilities....
Will IFS’ Acquisition of Softeon Help Attract New Supply Chain Customers
December 19, 2025

Will IFS’ Acquisition of Softeon Help Attract New Supply Chain Customers?

Keith Kirkpatrick, Research Director at Futurum, shares his insights into IFS’ acquisition of WMS provider Softeon, and provides his assessment on the impact to IFS’s market position and the overall...
Micron Technology Q1 FY 2026 Sets Records; Strong Q2 Outlook
December 18, 2025

Micron Technology Q1 FY 2026 Sets Records; Strong Q2 Outlook

Futurum Research analyzes Micron’s Q1 FY 2026, focusing on AI-led demand, HBM commitments, and a pulled-forward capacity roadmap, with guidance signaling continued strength into FY 2026 amid persistent industry supply...
NVIDIA Bolsters AI/HPC Ecosystem with Nemotron 3 Models and SchedMD Buy
December 16, 2025

NVIDIA Bolsters AI/HPC Ecosystem with Nemotron 3 Models and SchedMD Buy

Nick Patience, AI Platforms Practice Lead at Futurum, shares his insights on NVIDIA's release of its Nemotron 3 family of open-source models and the acquisition of SchedMD, the developer of...
Will a Digital Adoption Platform Become a Must-Have App in 2026?
December 15, 2025

Will a DAP Become the Must-Have Software App in 2026?

Keith Kirkpatrick, Research Director with Futurum, covers WalkMe’s 2025 Analyst Day, and discusses the company’s key pillars for driving success with enterprise software in an AI- and agentic-dominated world heading...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.