The Six Five team discusses IBM’s ML Inference Card.

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we do not ask that you treat us as such.

Transcript:

Patrick Moorhead: IBM comes out with an ML Inference Card. So, we should be surprised but we shouldn’t be surprised, Daniel, at this and that we saw when the Z16 ship came out, it had an integrated AI block called Tellem. Okay.

So essentially, what the company has done is they have taken that block which is very scalable and they talked about this.

They made a much bigger chip and then they put it on a PCIE Express card that you could put in basically any server out there.

There wasn’t as much information about the chip and the card as I would have liked. And I actually had to ask IBM a couple questions. I never saw the word inference in the blog. And man, I looked at it, but I did see training in there twice. So, I was wondering, “Hey, wait a second, Tellem was inference, real-time inference, a low-latency inference. Is this their training play that you could run in a power or in an X86 system?” But the answer is no. This is absolutely an inference card.

And the key here is as we’ve seen and sometimes I think it’s better to be later than first, the industry has gone from a very high degree of precision, 32-bit to 16-bit to 8-bit to 4-bit where you don’t need all of the accuracy to do good inference.

So, this is a low bit-rate inference card. I don’t know how many watts it’s at so I don’t know how small the form factor. It could be at the edge. I don’t know if it needs passive or active cooling. So, they left a lot of questions out there. But I think the big story here is that IBM Research is doing things that are surprising us all.

You and I spent, gosh, three days between Yorktown and where was it? Yorktown, you and I both have been to Poughkeepsie for Z. What was the third city we went to?

Daniel Newman: Albany.

Patrick Moorhead: Yeah, Albany NanoTech. So, this came out from the research group, not the product groups. So, I think we’re going to have to see exactly how this is productized in the future. But I think it shows the very high capabilities of the IBM Research team that you and I have spent a lot of time with. Why didn’t they tell us about this when we were on site?

Man, they even stealth us, Daniel.

But listen, you can’t tell everybody everything. But we’re going to have to get these folks on the Six Five Pod to tell us more about this and what they’re going to do with it.

Daniel Newman: I know we’re talking to Rob Thomas next week about some Watson ML. Maybe we could sneak in an AIU ask, but I’m not sure he’d be the right person because it’s in research.

Pat, I’m going to give you a paragraph out of my impending research note. When will the AIU chip ready for enterprise use? How much will it cost? Is it a work in progress already in mass production? Those questions weren’t clear from the initial AIU announcement. What is clear is that IBM recognizes it’s beyond time to change the way AI computing happens.

Now, a little bit market-y maybe when I say that, but what I walked away like you was there’s a lot of questions yet. But I do like very much that IBM is continuing to plan its flag. It’s leaning into semiconductor manufacturing design research. And by the way, with the recent passage of the CHIPS and Technology Act, we know that IBM has its hand up in saying, “Hey, we’re another company with really tremendous engineering talent, manufacturing capabilities or research to support those capabilities, intellectual property.”

So, over the year, Pat, we’ve seen the two nanometer announcement come out from IBM. We’re seeing the AIU which we’ve needed another acronym, by the way. I feel like this is important that we add this to the GPU, VPU, DPU, CPU.

Patrick Moorhead: One more.

Daniel Newman: What?

Patrick Moorhead: QPU.

Daniel Newman: Quantum Processing Unit, nice. But like I said, I look at this more as IBM. Really like I said, planning a flag, raising its hand, clearly articulating its intent to participate in a more meaningful way with its intellectual property democratizing and making it available to the market. And in an era of US-based semiconductor, design and manufacture being more in demand than ever before and IBM’s obvious improved performance based on our third topic. It’s not a terrible time for IBM to make sure the world knows it’s also making big contributions in semiconductor technology.

So, that’s where I saw it. It’s interesting. It’s exciting. There’s so many more questions than answers right now. But Pat, it wouldn’t be the Six Five if we didn’t put a little speculation and an analysis around what was sort of a loose but exciting and interesting press release.

Patrick Moorhead: Yeah, probably the most important which I forgot in my diatribe was software. What is the middleware that it’s going to be using, right? Does it use CUDA? Does it use oneAPI? Is it going to use what AMD is creating with its combination with Xilinx? We don’t even know what middleware that this runs. But no, a lot of questions and look at us, we just fell right in. the planet. Ciao. to invest in more technology?

Author Information

Daniel Newman

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Trusted by 100+ industry leaders

Featured Case Studies

Analyze

Data & Intelligence

Advise

Research & Advisory

Amplify

Content & Campaigns

Assess

Testing, Labs & Validation

Practice Areas

Featured Insights

Futurum Research 2026: Key Issues and Predictions

2026 Research Agenda: Key Topics and Coverage Areas

Insights

Premium Insights

Newsletter

Media Partners

Podcasts

Video Series

Featured Insights

Nscale Acquires Anyscale: The Neocloud Land Grab Continues

Can Creatio’s 10x Release Unseat Entrenched CRM Giants With Agentic AI?

Futurum Group

Portfolio Companies

Trusted by 100+ industry leaders

Featured Case Study

Scaling Smarter: How Google Cloud Marketplace Is Reshaping Partner Sales and GTM Strategy

Maximizing ROI with Agentic AI: Why Agentforce Is the Fast Path to Enterprise Value

Futurum and Kearney Reveal CEOs’ Readiness for AI Transformation in Landmark Study

IBM ML Inference Card

Author Information

Daniel Newman

Thales Strengthens NATO Capabilities with Next-Gen Deployable TACAN System

Is Your Organization Ready for the Quantum Future? DigiCert’s New Guide Offers Insights

NXP Q2 FY 2026: AI at the Edge Strengthens Automotive and Industrial Demand

Synopsys, Cadence, and Siemens Take Agentic Chip Design Autonomous at DAC

Cadence Q2 FY 2026 Earnings Climb on Agentic AI and Record Backlog

SiTime’s Acquisition of Renesas Timing Division: A Strategic Leap in Timing Solutions

Benjamin Brown

Analyze

Data & Intelligence

Advise

Research & Advisory

Amplify

Content & Campaigns

Assess

Testing, Labs & Validation

Practice Areas

Featured Insights

Futurum Research 2026: Key Issues and Predictions

2026 Research Agenda: Key Topics and Coverage Areas

Insights

Premium Insights

Newsletter

Media Partners

Podcasts

Video Series

Featured Insights

Nscale Acquires Anyscale: The Neocloud Land Grab Continues

Can Creatio’s 10x Release Unseat Entrenched CRM Giants With Agentic AI?

Futurum Group

Portfolio Companies

Trusted by 100+ industry leaders

Featured Case Study

Scaling Smarter: How Google Cloud Marketplace Is Reshaping Partner Sales and GTM Strategy

Maximizing ROI with Agentic AI: Why Agentforce Is the Fast Path to Enterprise Value

Futurum and Kearney Reveal CEOs’ Readiness for AI Transformation in Landmark Study

IBM ML Inference Card

Author Information

Welcome to The Futurum Group

Book a Demo

Welcome

Benjamin Brown

Newsletter Sign-up Form

Thank you, we received your request, a member of our team will be in contact with you.