Insights from Lenovo and Micron to Deliver Extreme Performance Computing – Six Five On The Road at SC24

Insights from Lenovo and Micron to Deliver Extreme Performance Computing - Six Five On The Road at SC24

Think super computing is a big deal? Try EXTREME computing. Host David Nicholson is joined by Lenovo’s Patrick Caporale, Distinguished Engineer and Chief I/O Architect, and Micron’s Praveen Vaidyanathan, Vice President and General Manager, Micron Compute Products Group, on this episode of Six Five On The Road at SC24. They discuss the cutting-edge collaboration between Lenovo and Micron in advancing extreme performance computing.

Their discussion covers:

  • The necessity of high-performance servers for AI and HPC workloads and the role of innovative memory and cooling technologies in meeting these demands
  • The collaboration between Lenovo and Micron, highlighting improved cooling technologies and the introduction of MRDIMM to enhance memory bandwidth capability
  • Insights into the development of emerging memory, storage, and GPU technologies aimed at providing unprecedented AI and HPC performance
  • How Lenovo’s liquid-cooling technology (Neptune) differentiates from traditional air cooling solutions, supporting the integration of Micron’s advanced memory technologies
  • The industry trends surrounding liquid cooling beyond AI and HPC, and how these innovations are paving the way for future computing environments.

Learn more at DDR5 DRAM | Micron Technology Inc.

Watch the video below at Six Five Media at SC24 and be sure to subscribe to our YouTube channel, so you never miss an episode.

Or listen to the audio here:

Disclaimer: Six Five On The Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript:

David Nicholson: Welcome to SC24, the Supercomputing Conference here in Atlanta, Georgia. I’m David Nicholson with Six Five On The Road. If you were to boil down what Supercomputing is all about, SC24, high-performance computing AI, it all comes down first and foremost to the data that needs to be stored and processed, that generates heat, that heat is generated because of massive power consumption. It all needs to be cooled. It’s all kind of like plumbing, but we’re going to make it sound a little more technical than that. I have two gentlemen who are very, very deep into all of this movement of data, cooling of data. Praveen, how are you?

Praveen Vaidyanathan: We’re doing very good, David. Very nice to see you.

David Nicholson: Good to see you from Micron, and from Lenovo, Patrick, welcome gentlemen.

Patrick Caporale: Excellent. Good to see you today.

David Nicholson: Yeah, good to have both of you here. I want to start with memory. What is Micron doing? What’s the latest and greatest when it comes to the kind of AI workloads and things that you’re seeing out there, and what’s Micron doing to address them?

Praveen Vaidyanathan: Yeah, maybe, I’ll start with the context of why we are here also in the context of SuperCompute, we are here on the exhibit floor, such a great environment to be in. We were here last year, and we walked away with some very key takeaways, what the industry is doing to innovate and provide new ways of solving compute challenges in the semiconductor industry. Even yesterday, I was talking to a bunch of university students, incredible ecosystem that comes to Supercompute every year and amazing ideas in terms of how do we advance technology and innovations. Also, I think in the last 24 to 48 hours, several announcements about new solutions to solve compute challenges. But what I find interesting in all of these, is deeply embedded in all these storylines, is the very strong coupling between compute and memory. We gravitate towards that, and the way we think about it in Micron is memory-centric computing.

What does that mean for the world? How do we operate in a memory-centric computing world? But operating in that world also requires us to have a very close collaboration with people who understand systems and who can deliver solutions, which is where our engagement with Lenovo comes in, an incredible partnership, and thank you Patrick for being with us today to go, deliver the solutions. If you think about, in that background, the kind of things that Micron is working on for AI and high-performance computing is really on three vectors. It’s basically bandwidth and performance, how do we deliver more and more of that, it’s capacity, how do we provide more content per millimeter square. The third thing is you touched upon it, power efficiency. It’s so important as we scale performance over the next five to 10 years, you’ve got to do it in a power efficient manner. These are the three vectors that we use to go, design our memory solutions.

One of the solutions that we are working very closely with Lenovo is called the Multiplex Rank DIMM, which is an MRDIMM, which today compared to conventional memory solutions, it provides a 40% improved bandwidth in a main memory solution, which is incredible. Usually this takes gen-over-gen two to three years for you to deliver that performance improvement, and this innovation allows us to go, deliver that, and we are very closely working with Lenovo to make sure this solves problems for the AI and HPC industry.

Patrick Caporale: I do want to say that, from a Lenovo perspective, we look at it starting from that customer-centric viewpoint, and when we have to look at workloads such as HPC that are really going to have some memory optimization, that especially around the bandwidth that is so critical, the data sets and the memory access is really important, especially in that HPC environment. We’ve seen trends in the industry before when we moved from DDR4 to DDR5, we’ve seen some increases in memory channels that are available, and this is certainly critical to as we build out the systems. But the latest technology available from Micron is really important because within that same physical envelope, meaning a DIMM socket, we can now increase, as Praveen has said, more 40% higher bandwidth.

For us, power is going to come along with that performance, and power translates to heat, that was your executive summary. For Lenovo systems, we’re on our sixth generation of warm water cooling known as our Neptune technology. That’s really critical because we can provide water directly to those DIMMS in a lot of our systems now that actually can capture that heat and provide true efficiency and total cost of ownership benefits to customers that’s back to that customer-centric. So not only do they get the benefits of the workload, they get all the other TCO benefits with Lenovo and Micron pairing here with the MRDIMM technology.

David Nicholson: Praveen, you touched on just a little bit, this idea of doing things that you can to decrease the amount of power that’s consumed, therefore decreasing the amount of heat that has to be dissipated by friends at Lenovo. What are some of the technologies that go into that? I have a specific question, to the extent that you can leverage a high bandwidth memory, does that in and of itself decrease the amount of power consumed because you’re not having to move data around in the system as much, or do I have that wrong?

Praveen Vaidyanathan: Yeah, I think that’s a good way to think about it. In terms of one of the trends in the industry is the closer you move memory to compute, the less distance you need to move data and that naturally reduces the amount of energy you spend. But we also… I think the context to think about there is power, and there is efficient use of power. So you’ve got to work on both. You want to go, reduce your consumption, but then whatever you consume, do you use that efficiently? So those are the two areas we work on from a memory subsystem perspective and from a high bandwidth memory, one of the innovations we’ve worked on is our high bandwidth memory consumes about 30% less power intrinsically compared to other products out there in the HBM space, and that is a value proposition. Now, once you do that, you also want to go, use that efficiently.

The higher performance allows you to go complete tasks quickly. One of the concepts that with the help of Lenovo who really latched onto this is we’ve thinking about task energy in terms of how much energy do you spend to complete a task. If I can walk from here to the end of this hall and do that spending less energy, then maybe I can walk one more hall, and how do I do that better and trying to do that as a focus, and again, the cooling solutions that we are working with Lenovo and that innovations work help really manage that a lot because for the same power dissipation, you manage your thermals, things are cooler, you’re able to improve your performance.

David Nicholson: Of course, when you talk about reducing the amount of power consumed per task, you’re talking about all sorts of green things, not the least of which is the green money that’s saved on power.

Praveen Vaidyanathan: Exactly.

David Nicholson: Talk about the differences between or the limits of air cooling in an environment like this compared to what you’re doing with direct liquid cooling technology. Then I’ve got a specific question, I’m curious about racks today, how much power are we potentially pumping into a single 42U rack in a data center today?

Patrick Caporale: Yeah, great question.

David Nicholson: What does that look like?

Patrick Caporale: So let me take the first part first, and we’re seeing water is just simply a more efficient conductor of heat that’s just when you compare that to air. So when you look at the ability to extract the heat off of the element that’s creating the performance, in this case, MRDIMM’s, but we also see that across the system, CPUs, accelerators, everything else is generating heat. So we can show tangible benefits up to, say, 40%, savings in your data center when you start taking power away from spinning fans. To move air, you have to have fans that are really doing nothing but just to physically move the volume of air across those to cool it, hundreds of watts of fan power just cool many, many more hundreds of watts of things that are inside the servers.

But if you start using water, you can actually reduce the amount of power that’s needed for all those fans, and in some cases, like, we can demonstrate in our Lenovo systems, get rid of fans entirely. That’s a savings first and foremost when you look at air versus water and then you have other tangible benefits like how cool does your data center need to be? Again, our warm water cooling technology allows for up to 45 degree C water inlet temperature. So really warm water captures that heat and is really be efficient. When you talk about power at the rack level, we’re demonstrating solutions in our HPC servers that can support up to a 180 kilowatts in one rack. That’s really a demonstration of what we’re talking about and providing solutions using our Neptune warm water cooling technology to extract and cool a 180 kilowatts in one rack.

David Nicholson: If you’re kind of an EV or solar nerd like me, you hear a number of like that, and it sounds insane. For the viewers at home, let’s take a 100 kilowatts, that’s 130 horsepower. That would be an engine in a car running at red line in that rack about the size of a refrigerator. It’s insane amounts of power. The other thing is it’s a zero-sum game within the rack, meaning there’s a finite amount of power that can be dropped into a given rack. If you’re saving power by not having to spin fans, that means that you have more power for all the stuff that we really want to do with memory, right?

Praveen Vaidyanathan: That’s exactly where the collaboration comes in. If you are all working in a finite zero-sum environment, then how do we distribute the power in the best way possible? What we hear from partners like Lenovo is exactly that, “Hey, if you guys can consume less memory, then I can use it to run faster.” So it’s a distribution of the budget that we have within the system. I think we are all focused on if you can all reduce your budget, what naturally happens is let’s put more compute, let’s put more memory and get to that, improve the overall performance of the system within that same power budget, and I think we both play a role in being able to deliver to that ask from the industry.

David Nicholson: So looking towards the future, I think back to when I was a kid a hundred years ago, we had a computer that shipped with 4K of RAM. We upgraded it to 16K because we knew that was all we would ever need for anything you could ever do with a computer with 16K of RAM. What are the capacities that we’re talking about today in terms of these DIMMs, in general, when we say there is X amount of memory in a 42U rack of equipment, how much memory are we talking about today, and what does the future look like in terms of speeds and capacities, and what is the future of all of this from a memory perspective look like to you?

Praveen Vaidyanathan: Sure. The future looks very bright, and bright, not just because of heat, not just because of that, not just with a thermal imager, it’s not that. It definitely looks bright in terms of solving problems. I think from that perspective, I think there are some incredible challenges and incredible innovation going on to solve those challenges. But put some numbers to where you’re talking about, today, a typical single socket system is about 12 channels of memory, you can get up to 256 of gigabytes per slot. So multiply that by 12, if my math is right, that’s about three terabytes for a single socket, dual socket, six terabytes, and just scale that up. So you’re talking about tens of terabytes in a rack of memory that you’re putting in.

In terms of the future, beyond just capacity, I think… I don’t know what the right solutions would be going forward, but I know that one of the focus areas for all of us, including Micron in this industry, is power efficiency. I think that’s in this context of SuperCompute, we have to be thinking about power efficiency, and I know Patrick alluded to that also. I think having a power-centric, power-efficiency-centric memory and compute solution focus is what the future looks like.

David Nicholson: It makes a lot of sense. From a cooling perspective, you brought this warm water in, relatively warm water, you’ve dissipated heat from the system, now you have hotter water, what does the future look like when it comes to now what can you do with that hot water that might be interesting?

Patrick Caporale: There’s a lot of things that you can do. You can actually use that to, maybe, in certain times of the year, use that to heat certain areas of a data center or other properties around, use reclaimed water to do other areas too. We only may have a 10 to 12 degree C rise in that water, so 45 in, you got 55 to 57, you do need to bring that down, but that’s not quite a lot. But you can reclaim that in many, many ways and bring back green, not just green data centers, but even things in the community and the space around the data center, how can you use that heated water and then just bring that down slightly back to 45 degree C. You don’t need to bring it down too much.

But then again, that customer-centric view, really, I think Praveen said it, every watt that we can save, we work in partnership with Micron, we’re in the labs together and looking at all those things we can save and if MRDIMMS provide that benefit in the workload to the customer, it’s far better to use that than just having a fan spinning that doesn’t actually move the data around itself, but the MRDIMMs are going to do a great deal. So again, warm water cooling, that reclaimed water, everything comes back to customer benefits and a total cost of ownership. That is really valuable.

David Nicholson: Yeah. It makes sense. Well, let me tell you folks, I did my best to drive a wedge between these two gentlemen and their companies to talk about things differently, but it looks like absolutely everything they’re doing is completely integrated, trying to lower power consumption on one hand, dissipating the heat, all of it with a customer-centric view, it makes a lot of sense. Lenovo and Micron, thank you gentlemen for being with us today. For Six Five On The Road, I’m David Nicholson, stay tuned for more excitement from SC24.

Author Information

David Nicholson is Chief Research Officer at The Futurum Group, a host and contributor for Six Five Media, and an Instructor and Success Coach at Wharton’s CTO and Digital Transformation academies, out of the University of Pennsylvania’s Wharton School of Business’s Arresty Institute for Executive Education.

David interprets the world of Information Technology from the perspective of a Chief Technology Officer mindset, answering the question, “How is the latest technology best leveraged in service of an organization’s mission?” This is the subject of much of his advisory work with clients, as well as his academic focus.

Prior to joining The Futurum Group, David held technical leadership positions at EMC, Oracle, and Dell. He is also the founder of DNA Consulting, providing actionable insights to a wide variety of clients seeking to better understand the intersection of technology and business.

SHARE:

Latest Insights:

Novin Kaihani from Intel joins Six Five hosts to discuss the transformative impact of Intel vPro on IT strategies, backed by real-world examples and comprehensive research from Forrester Consulting.
Messaging Growth and Cost Discipline Drive Twilio’s Q4 FY 2024 Profitability Gains
Keith Kirkpatrick highlights Twilio’s Q4 FY 2024 performance driven by messaging growth, AI innovation, and strong profitability gains.
Strong Demand From Webscale and Enterprise Segments Positions Cisco for Continued AI-Driven Growth
Ron Westfall, Research Director at The Futurum Group, shares insights on Cisco’s Q2 FY 2025 results, focusing on AI infrastructure growth, Splunk’s impact on security, and innovations like AI PODs and HyperFabric driving future opportunities.
Major Partnership Sees Databricks Offered as a First-Party Data Service; Aims to Modernize SAP Data Access and Accelerate AI Adoption Through Business Data Cloud
Nick Patience, AI Practice Lead at The Futurum Group, examines the strategic partnership between SAP and Databricks that combines SAP's enterprise data assets with Databricks' data platform capabilities through SAP Business Data Cloud, marking a significant shift in enterprise data accessibility and AI innovation.

Thank you, we received your request, a member of our team will be in contact with you.