IBM ML Inference Card

The Six Five team discusses IBM’s ML Inference Card.

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we do not ask that you treat us as such.


Patrick Moorhead: IBM comes out with an ML Inference Card. So, we should be surprised but we shouldn’t be surprised, Daniel, at this and that we saw when the Z16 ship came out, it had an integrated AI block called Tellem. Okay.

So essentially, what the company has done is they have taken that block which is very scalable and they talked about this.

They made a much bigger chip and then they put it on a PCIE Express card that you could put in basically any server out there.

There wasn’t as much information about the chip and the card as I would have liked. And I actually had to ask IBM a couple questions. I never saw the word inference in the blog. And man, I looked at it, but I did see training in there twice. So, I was wondering, “Hey, wait a second, Tellem was inference, real-time inference, a low-latency inference. Is this their training play that you could run in a power or in an X86 system?” But the answer is no. This is absolutely an inference card.

And the key here is as we’ve seen and sometimes I think it’s better to be later than first, the industry has gone from a very high degree of precision, 32-bit to 16-bit to 8-bit to 4-bit where you don’t need all of the accuracy to do good inference.

So, this is a low bit-rate inference card. I don’t know how many watts it’s at so I don’t know how small the form factor. It could be at the edge. I don’t know if it needs passive or active cooling. So, they left a lot of questions out there. But I think the big story here is that IBM Research is doing things that are surprising us all.

You and I spent, gosh, three days between Yorktown and where was it? Yorktown, you and I both have been to Poughkeepsie for Z. What was the third city we went to?

Daniel Newman: Albany.

Patrick Moorhead: Yeah, Albany NanoTech. So, this came out from the research group, not the product groups. So, I think we’re going to have to see exactly how this is productized in the future. But I think it shows the very high capabilities of the IBM Research team that you and I have spent a lot of time with. Why didn’t they tell us about this when we were on site?

Man, they even stealth us, Daniel.

But listen, you can’t tell everybody everything. But we’re going to have to get these folks on the Six Five Pod to tell us more about this and what they’re going to do with it.

Daniel Newman: I know we’re talking to Rob Thomas next week about some Watson ML. Maybe we could sneak in an AIU ask, but I’m not sure he’d be the right person because it’s in research.

Pat, I’m going to give you a paragraph out of my impending research note. When will the AIU chip ready for enterprise use? How much will it cost? Is it a work in progress already in mass production? Those questions weren’t clear from the initial AIU announcement. What is clear is that IBM recognizes it’s beyond time to change the way AI computing happens.

Now, a little bit market-y maybe when I say that, but what I walked away like you was there’s a lot of questions yet. But I do like very much that IBM is continuing to plan its flag. It’s leaning into semiconductor manufacturing design research. And by the way, with the recent passage of the CHIPS and Technology Act, we know that IBM has its hand up in saying, “Hey, we’re another company with really tremendous engineering talent, manufacturing capabilities or research to support those capabilities, intellectual property.”

So, over the year, Pat, we’ve seen the two nanometer announcement come out from IBM. We’re seeing the AIU which we’ve needed another acronym, by the way. I feel like this is important that we add this to the GPU, VPU, DPU, CPU.

Patrick Moorhead: One more.

Daniel Newman: What?

Patrick Moorhead: QPU.

Daniel Newman: Quantum Processing Unit, nice. But like I said, I look at this more as IBM. Really like I said, planning a flag, raising its hand, clearly articulating its intent to participate in a more meaningful way with its intellectual property democratizing and making it available to the market. And in an era of US-based semiconductor, design and manufacture being more in demand than ever before and IBM’s obvious improved performance based on our third topic. It’s not a terrible time for IBM to make sure the world knows it’s also making big contributions in semiconductor technology.

So, that’s where I saw it. It’s interesting. It’s exciting. There’s so many more questions than answers right now. But Pat, it wouldn’t be the Six Five if we didn’t put a little speculation and an analysis around what was sort of a loose but exciting and interesting press release.

Patrick Moorhead: Yeah, probably the most important which I forgot in my diatribe was software. What is the middleware that it’s going to be using, right? Does it use CUDA? Does it use oneAPI? Is it going to use what AMD is creating with its combination with Xilinx? We don’t even know what middleware that this runs. But no, a lot of questions and look at us, we just fell right in. the planet. Ciao. to invest in more technology?

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.


Latest Insights:

On this episode of The Six Five – On The Road, hosts Daniel Newman and Patrick Moorhead welcome Rich Uhlig, Intel Senior Fellow and Corporate VP, Director of Intel Labs for a conversation on Intel’s vision for the future of technology.
On this episode of The Six Five – In the Booth, hosts Daniel Newman and Patrick Moorhead provide their analysis and opinion on GlobalFoundries’ recent GTS 2023 event, unveiling the silicon revolution and taking an inside look at how the company is “Delivering a New Era of More.”
An Assessment of Major CSP Testing Initiatives by Vodafone, Deutsche Telekom, and Verizon Aimed at Spurring Generative AI, Private 5G, and NaaS Innovations
The Futurum Group’s Ron Westfall and Todd R Weiss explore how major CSPs such as Vodafone, Deutsche Telekom, and Verizon are conducting new tests and pushing the envelope across GenAI, private 5G network, and NaaS applications.
SAP’s Acquisition of LeanIX Will Add a Comprehensive View of Business Processes and Applications to Its Business Transformation Products
Todd R. Weiss, Senior Analyst with The Futurum Group, shares his insights as the SAP business transformation portfolio is expanded with the acquisition of German vendor LeanIX and its growing enterprise architecture management capabilities.