Search
Close this search box.

AI Acceleration News from IBM at Hot Chips – Infrastructure Matters, Episode 52

AI Acceleration News from IBM at Hot Chips - Infrastructure Matters, Episode 52

On this episode of Infrastructure Matters, Steven Dickens is joined by IBM‘s Chris Berry, Distinguished Engineer – Processor Development, and Susan Eickhoff, Director, IBM Z Processor Development for a conversation on the latest AI acceleration innovations unveiled by IBM at the Hot Chips conference.

Their discussion covers:

  • The key features of IBM’s new processor developments.
  • The impact of AI acceleration on business applications.
  • The role of IBM’s technology in advancing computational power.
  • Strategies for integrating IBM’s advancements into existing IT infrastructure.
  • Future directions for AI and processor development at IBM.

Learn more about the Spyre Accelerator and Telum Processor from IBM.

Watch the video below, and be sure to subscribe to our YouTube channel, so you never miss an episode.

Or listen to the audio here:

Or grab the audio on your favorite audio platform below:

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this webcast. The author does not hold any equity positions with any company mentioned in this webcast.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Transcript:

Steven Dickens: We are here at IBM today and we’re talking all things processes. Susan and Chris, welcome to the show.

Susan Eickhoff: Thank you.

Chris Berry: Thank you for having us.

Steven Dickens: So let’s get started. Ladies first. Susan, tell us a little bit about what you do for IBM.

Susan Eickhoff: Yep. So I’m the director of IBM Z processor development here in Poughkeepsie, New York. And then I’m also the overall product owner for Telum 2, for the mainframe processor that we’re announcing now. So I started that about four years ago, leading a team of 800 global engineers cross discipline as we went through concept and HLD implementation and now we get to announce. So it’s exciting.

Steven Dickens: Fantastic. So Chris, tell me what you do for IBM.

Chris Berry: I design processors. I’m a distinguished engineer. I help lead and make decisions on what do we put into our next generation mainframe processor chips.

Steven Dickens: So Susan, you mentioned it, Telum 2, second generation of the Telum processor. Telum was a seminal moment for IBM. First time you had launched the processor and started to message that ahead of the system. We’re now onto the second generation. Tell us a little bit about Telum 2, the journey IBM’s been on with the development. Just really set the scene for us, if you will.

Susan Eickhoff: Yeah, I mean, so a lot of folks know the IBM Z mainframe and some don’t, but regardless, you interact with it daily throughout your life. Every time you swipe a credit card, it probably runs through a mainframe. 70% of the financial transactions run through it. And sometimes I feel like a little IBM sales pitch doing that, but it’s cool, right? What we work on, our technology matters, we use it every day.

Steven Dickens: It’s impactful and meaningful.

Susan Eickhoff: Exactly. So knowing that we run the high volume, high transaction always need to be available systems. So in commerce and in airlines and banking and in credit cards, that sets the stage for what we do, for what our pillars are that we looked at for Telum, for Telum 2, for going forward. So a couple of those areas that we looked at, like I said, the industries we support, you got to be reliable. So Chris, I think you quoted the other day where a mainframe is down what one hour every 11,400 hours-

Chris Berry: Yes. 11,400 years.

Susan Eickhoff: Sorry, years. Which is pretty impressive.

Steven Dickens: That puts in context the availability claim that you guys have got.

Susan Eickhoff: Exactly.

Steven Dickens: It’s hard to put that in and frame that, but that puts it in real context.

Chris Berry: Yes.

Susan Eickhoff: Yeah. So looking at that reliability, the scalability we need, security that we need, which is super important in performance. That’s always one of the pillars we’re looking at as we design. Sustainability is a big one lately, certainly with society at large, but also coming back from our clients. There’s only more and more data and process it faster and faster, which is at odds with and do it greener and less of a footprint. So that’s pervasive in what we do, where we can cut down on power, where we can consolidate workloads. And then the third one, which is a really fun one, is how can we, with all this data going through, how can you gain insight, put some analytics, do some analysis of that and put some intelligence to that data there. And you could take that off the mainframe to do it, but one of the things we’re known for is security. So you want to keep that data on there and get some intelligence from it.

Steven Dickens: So semiconductors are a hot topic in 2024. NVIDIA’s blown up. Intel, IMD, capturing the headlines. Lots of people are talking semiconductors, they’re on the nightly news every day almost with AI capture in the public domain. I don’t think a lot of people think IBM and semiconductors, but that’s obviously not true. Tell us a little bit more about how we should be thinking about IBM and semiconductors.

Chris Berry: They should be thinking about it like they go hand in hand. We’ve been in the semiconductor industry since its inception. We’ve got our research team working on new advanced nodes with semiconductors. You see announcements all the time in two nanometer, three nanometer, whatever the next node is going to be.

Steven Dickens: And that’s cutting edge, two nanometers is far ahead as anybody is.

Chris Berry: Correct. So we’ve got our research team doing all of that work. We’ve got our research team working on packaging and advanced packaging and chip stacking and all sorts of other areas. We’ve got our internal team that’s designing the chips on those nodes and figuring out how do we put more stuff on a chip than we’ve ever done before. And then we’ve also got our tools teams, our packaging teams to do design for us, our card teams, everything. We are completely embedded in the semiconductor industry and we build chips and put them out and put them in our systems. They’re the foundation of our systems.

Steven Dickens: And Susan, we were talking about it off camera, the size of your team. I think that surprised me at least. But just picking up on what Chris was saying there, the size of that team and the breadth of the scope.

Susan Eickhoff: Yeah, I mean, like I say, we’ve got 800 hardware developers just on the Telum 2 processor alone, across logic design, verification, physical design. When you look at all of the portfolio of the chips we do, we’ve got 1200 global hardware developers working on that. And like Chris said, that’s just the hardware development part. So all the tool support of that and the technology and packaging, and, and, and … We’ve got a very several thousand person global team.

Steven Dickens: So we’re in 2024. What are we, 10 minutes in? We’ve not … I think I mentioned AI, but I went past it really quick. It’d be wrong not to be recording something in 2024 and not talk about AI. We were talking about it off camera. You guys are doing a lot in this space. Chris, tell us a little bit about what you’re doing with Telum 2 from an AI perspective, because I think not a lot of people think mainframe, think AI. That’s changing. 200 plus use cases, lots of customer adoption now. Tell us a little bit more about what you’re doing in AI with Telum 2.

Chris Berry: So I’m going to take a step back and say what we already did in Telum. We put an on chip accelerator in Telum. It was dedicated. We didn’t have it distributed in the cores. We had a dedicated on chip accelerator and we’ve added to that. We’ve added support for different data types. We’ve added processing horsepower to it. It’s a better design than it was before. We’ve got the remote capability where the cores on one chip can access the AIU on another set of cores. AI accelerator is a better way to say that, on another set of cores on another processor throughout the drawer. And that’s just the tip of the iceberg. We’re also putting another chip into a set of PCI cars and building that out into our IO subsystem in the mainframe. So that takes it another order of magnitude of compute horsepower.

Steven Dickens: So we’ve got the wafer here with us. This is where we geek out. We were talking about it off camera. I mean just to put some of this in context. What was it, 24 miles of copper-

Chris Berry: 24 miles of wire.

Steven Dickens: In 18 different layers.

Chris Berry: Yes.

Steven Dickens: I got that right guys, you’d be impressed. But tell us a little, I mean Chris, there’s some deeply nerdy geeky stuff here. We were joking around. I mean we’re recording this, talking about the Telum 2, we’ve got to nerd out for a moment. So just you give us the speeds and feeds. Just what am I holding in my hands? What does this mean? What can this thing do?

Chris Berry: So what you’re holding in your hand. So that’s Telum 2, five nanometers, 32 billion transistors on that thing. Like you said, 24 miles of wire, speeds and feeds. It’s running at over five gigahertz. 5.5 gigahertz. It is absurd how fast these things are running and how much we put into them. As I mentioned earlier, we were talking off camera. The data processing unit that we’ve added onto the chip, it is amazing how much content, how much capability we can put onto these processors these days. Eight cores, 360 megabytes of L2 cache distributed, PCI interfaces, whatever. It’s just insane.

Steven Dickens: That cache number, when we were talking about it and we were prepping for this, I think that for me is one of the things that stands out. There’s lots of things you can do with that, but maybe just tell me, just double click on that cache nest and the size of that. Because I know you’ve increased that from Telum to Telum 2, but just give me a little bit more on that, if you don’t mind.

Chris Berry: So for the workloads that we focus on; transaction processing, databases, they really benefit from those big cache numbers. I mean when AMD put that 3D chip out, the V-Cache I think it’s referred to, they saw significant benefit from some workloads. The workloads that are our bread and butter, they really benefit from that really enormous cache. And so yes, we’ve grown that 360 megabytes, 40% to get to that number in Telum 2.

Steven Dickens: So this process has obviously got a second life. It sits within the mainframe system. It’ll be in the next mainframe, but it also sits in Linux only systems from IBM, the LinuxONE family. Chris, tell us a little bit about what this, from a chip design and chip processor, some of the cache we were talking about, put that in context for me and give me a view of what that means from a LinuxONE perspective.

Chris Berry: So from a LinuxONE perspective, we think about total cost of ownership. We are talking about consolidation ratios of 30 to 1, 25 to 1, 40 to 1, depending on the workload. Translating that onto what we do in the hardware, again, the caches, that’s how you can consolidate all of that workload, the sustainability Susan mentioned a little bit ago. You’re talking about a processor and a set of cores and an infrastructure within the system that allows these cores to run at 90%, 99% utilization for sustained periods of time. And that’s not what you see in the x86 space. And that’s why we can get those big consolidation ratio numbers and then also you get into the memory and the IO subsystem and everything else that we have in these systems and you can really take advantage of that. And then again, total cost of ownership. It reduces software licenses and a variety of other things. There’s really a value proposition there for the clients.

Steven Dickens: I think that’s the payoff for me. It’s the case of yes, you’ve got a fantastic chip architecture, it obviously sits in that system. You bring the availability and the reliability that you talked about, but you can translate that directly to some of those cost savings.

Chris Berry: Correct.

Steven Dickens: Per core license software is going to be cheaper when it’s run on a LinuxONE, purely because of this consolidation ratio. You mentioned some of the numbers there. They’re going to vary depending on the workload, but it’s orders of, it’s maybe not orders of magnitude, but it’s 10X plus consolidation ratios and that really translates.

Chris Berry: Yes. I’m often surprised by the consolidation ratios that we’re capable of. MongoDB Citi example I think was a 30 to 1 ratio. That’s what we quote. I still can’t quite wrap my head around how it is that good.

Steven Dickens: Yeah. Well you should be able to. You designed the chip.

Chris Berry: I get it. But still you’re surprised by those numbers. That’s how impressive they are.

Steven Dickens: So Chris, one of the fascinating things that you mentioned when we’re talking about the process of various caches, that number’s so unusual, so different. So out of the realms of what we see with some of the other processors, tell us why that’s such an important part of the IBM story with Telum 2?

Chris Berry: 360 megabytes on one chip. It’s 36 megabytes per core. It’s all about our workloads, transaction processing, database management. Having that pool of caches to load in all that data and then access it as you’re processing a transaction or figuring out whether this is fraud or not. Having all of that access to all that data is super important for the bread and butter workflows that make the mainframe what it is.

Steven Dickens: So one of the other things we talked about, you were talking about the accelerator. I think it’s worthwhile digging into that. This is something new that’s coming with Telum 2. You guys are calling it Spire. I think Susan, you mentioned it really quickly, but I want to go back to it. What’s Spire? Why is it important? How does it fit into the context of 2024, and what we’re seeing with AI? Tell me a little bit more.

Susan Eickhoff: Yeah, I mean, so as you know, we introduced our AI accelerator on Telum three years ago. And as clients frankly, play with that and see what they can do with that, and we’ve got over 200 use cases of what they’re doing, two trends have come back. We need more compute power and we need to be able to host different models. So we’ve got, as you say, Spire. So that’s our PCIE attached dedicated AI accelerator, inferencing accelerator. 75 watt form factor that it fits in and developed in Samsung 5 nanometer. And so the two things that it can go after, like I said, more AI compute power, so 300 tops per chip there, but it also supports generative AI. So we can now run the Watson code assistant, for example, on IBM’s granite models.

What we can also do, because we have the AI accelerator, we mentioned before on Telum 2, you can run those two in concert. This notion of ensemble AI. So the transaction’s running through the mainframe, you can get the low latency, lower energy footprint, AI compute from the accelerator on Telum 2. And if you get a low enough confidence score there, you can go out to Spire there where you can host the larger model, you can get a more accurate answer. And so like I said, running those in concert together in this ensemble AI, you get the most accurate, lowest latency, lowest energy footprint answer.

Steven Dickens: I think that’s fascinating for me, this concept of ensemble, you’re building stuff into Telum 2, you’ve also built a connectivity there with the card and the accelerator. I think that’s just fascinating that you’re going to be able to, thinking about it from a fraud use case, be able to do some scoring, “Hey, I need to do more,” and be able to do that in the transaction because of the way you’ve designed the system.

Susan Eickhoff: Exactly.

Steven Dickens: So this system’s got a second life. It sits within the mainframe, but it also sits within IBM’s Linux only systems, the LinuxONE. Tell us a little bit, Chris, about what we can expect from this Telum 2 processor and the AI accelerator when we see that in a Linux only system and LinuxONE.

Chris Berry: So LinuxONE is all about total cost of ownership, consolidation of workload onto that system and Telum 2 just accelerates that. Improved performance, improved capacity, more memory, all of that.

Steven Dickens: We talked about the caches.

Chris Berry: Correct, the caches, exactly. So you end up with these massive consolidation ratios, 20 to 1, 30 to 1, whatever it happens to be, because the processors are able to sustain utilization at 90%, 99% in some cases where you don’t have to have peak capacity demand capabilities baked in. You don’t have to have idle compute sitting around. So you can really onboard a tremendous amount of workload onto these systems. And then your software licenses go down, your energy costs go down, whole host of things get better. And again, total cost of ownership goes down. And that’s really the value proposition of LinuxONE.

Steven Dickens: So Chris, one of the examples of what you’ve just said there is what Citi is doing with their MongoDB estate. Tell us a little bit more.

Chris Berry: It’s that consolidation play. They take workload that they’ve got running out in a data center somewhere and they figure out that if they put it on LinuxONE, it’s more efficient. It’s more compute efficient, it takes up less floor space, takes up less energy. And they’re taking advantage of that processing, that transaction, those caches, that memory, really leveraging what we bring to the table with LinuxONE.

Steven Dickens: So let’s start to bring this home. I think it is a fantastic conversation. Always great to get hands on with the wafers and the chips. Maybe Susan, start us off. Give us a couple of things that are key takeaways people should be thinking about with Telum 2.

Susan Eickhoff: Yeah, I mean back to the of three pillars we talked about at the top of the episode here that certainly the reliability, scalability, security, performance, that’s always something that we focus on. Second to none already, but always improving on that. Sustainability has been a big one for us as we design through Telum 2. Outright power savings are another benefit of all that consolidation, lower energy footprint there. And then the last one is the AI capabilities that we have on here. Chris talked about it with Telum 2, but we’ve got the Spire chip there as well for even more AI compute power there and to be able to run new models. So I’d say all three of those.

Steven Dickens: And Chris, what would be those takeaways from you?

Chris Berry: So it’s the same takeaways. I want to emphasize that Spire chip. I mentioned a bit ago the amount of compute we’re talking about putting into the system, 30 petaOps in one of our test systems that we’re putting together. I mean, we’re talking about an enormous amount of compute that’s going to be in a mainframe and the ability for our clients to take advantage of that AI in terms of what they want to do with it. We always talk about, we’re talking off camera, the banking transaction sort of examples is sort of the bread and butter, but the array of use cases that our clients are coming up with and exploring and taking advantage of, I’m looking forward to finding out what they come up with as they explore that space when they have all of that horsepower in their hands.

Steven Dickens: Well, it’s been a fantastic chance to go deep, for me to nerd out. Really appreciate you taking the time chatting to us. We’ve been with IBM today going deep on all things Telum. There’s lots more to find out, so please click on the link in the comments below and we’ll see you next time. Thank you very much for watching.

Author Information

Regarded as a luminary at the intersection of technology and business transformation, Steven Dickens is the Vice President and Practice Leader for Hybrid Cloud, Infrastructure, and Operations at The Futurum Group. With a distinguished track record as a Forbes contributor and a ranking among the Top 10 Analysts by ARInsights, Steven's unique vantage point enables him to chart the nexus between emergent technologies and disruptive innovation, offering unparalleled insights for global enterprises.

Steven's expertise spans a broad spectrum of technologies that drive modern enterprises. Notable among these are open source, hybrid cloud, mission-critical infrastructure, cryptocurrencies, blockchain, and FinTech innovation. His work is foundational in aligning the strategic imperatives of C-suite executives with the practical needs of end users and technology practitioners, serving as a catalyst for optimizing the return on technology investments.

Over the years, Steven has been an integral part of industry behemoths including Broadcom, Hewlett Packard Enterprise (HPE), and IBM. His exceptional ability to pioneer multi-hundred-million-dollar products and to lead global sales teams with revenues in the same echelon has consistently demonstrated his capability for high-impact leadership.

Steven serves as a thought leader in various technology consortiums. He was a founding board member and former Chairperson of the Open Mainframe Project, under the aegis of the Linux Foundation. His role as a Board Advisor continues to shape the advocacy for open source implementations of mainframe technologies.

SHARE:

Latest Insights:

Krista Case of The Futurum Group reflects on lessons learned and shares her expected impacts from the July 2024 CrowdStrike outage.
Steven Dickens and Ron Westfall from The Futurum Group highlighted that HPE Private Cloud AI’s ability to rapidly deploy generative AI applications, along with its solution accelerators and partner ecosystem, can greatly simplify AI adoption for enterprises, helping them scale quickly and achieve faster results.
Uma Ramadoss and Eric Johnson from AWS join Daniel Newman and Patrick Moorhead to share their insights on the evolution and the future of Building Generative AI Applications with Serverless, highlighting AWS's role in simplifying this futuristic technology.
Steven Dickens, Chief Technology Advisor at The Futurum Group, explores how AWS is transforming sports with AI and cloud technology, enhancing fan engagement and team performance while raising concerns around privacy and commercialization. Discover the future challenges and opportunities in sports tech.