Scaling Up and Out for AI – Six Five at Marvell Industry Analyst Day

Scaling Up and Out for AI - Six Five at Marvell Industry Analyst Day

To scale up or to scale out? Discover why both are important for the future of AI. Host Patrick Moorhead is joined by Marvell Technology‘s Achyut Shah, SVP and GM, Connectivity, and Nick Kucharewski, SVP and GM, Network Switching at Marvell’s Industry Analyst Day, for a conversation on the seismic changes in AI infrastructure. As the digital world embarks on a major shift with the advent of new technologies, Marvell Technology is developing crucial technologies to support this transformation.

Tune in for more on ⤵️

  • The concept of ‘scale up’ in technology development
  • An exploration of ‘scale out’ and the enabling technologies that support this approach
  • Insights into the driving forces behind the trends toward scale up and scale out, and their implications for the future AI infrastructure
  • The potential of optical interconnect as a universal medium and Marvell’s roadmap for its adoption
  • Predictions on how these advancements could transform data center architectures and systems

Learn more at Marvell Technology.

Watch the video at Six Five Media at Marvell Industry Analyst Day, and be sure to subscribe to our YouTube channel, so you never miss an episode.

Or listen to the audio here:

Disclaimer: Six Five On The Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript:

Patrick Moorhead: Six Five is On The Road here at Marvell Technology Corporate Headquarters in Silicon Valley. And we are discussing, surprise, AI. On the show and in our analyst research we’ve discussed a lot about the elements of what makes AI successful in the data center. There seems to be a lot of focus on the GPUs, which are super important as are the memory and also the storage. But a component, networking, doesn’t get enough discussion even though it’s one of the biggest bottlenecks in AI training today and even in inference when you’re looking at it from a latency standpoint. But the good news is we have both Nick and Achyut from Marvell to discuss not only scale-up, but scale-out approaches to improving performance, lowering power, and also improving on latency. So guys, welcome to the show.

Nick Kucharewski: Thanks very much for having us.

Achyut Shah: Thanks for having us, Pat.

Patrick Moorhead: Definitely. So a lot of the ways people love to talk about the network, we can talk about the front end network, the back end network, we can talk fabric, but there’s a lot of discussion about the difference between scale-up and scale-out. So Nick, I’d like to start with you, talk about what you mean when you talk about a scale-up network.

Nick Kucharewski: So to put it plainly, scale-up is about adding additional capacity while making it appear as a single computer to the software. You can add more memory, you can add more compute, but the key point is that all of those resources are available to a single software application running. It’s fundamentally like a supercomputer application.

Patrick Moorhead: Do we call it a node or a cluster?

Nick Kucharewski: It’s a cluster but the key difference is that that cluster is very tightly coupled that allows you to provide all of that resources simultaneously to one software application.

Patrick Moorhead: And what are some of the key technologies that you have that are driving scale-up networking and improving it?

Nick Kucharewski: So when we look at scale-up networking, the key is the processor element itself, the fabric that interconnects those processing elements and then the interconnect that goes between the processing elements and the fabric. And we see in the industry an evolution across all of those components. We’ve talked a lot about custom compute, but you very quickly move out of that compute to look at the entire solution and that means looking at the interconnect fabric and then you talk about different architectures for the fabric and you quickly see that the fabric is very tightly coupled to the architecture of the computing. So there’s an opportunity to do custom implementations for customers as they look at a differentiated way of building up higher performance scale-up.

Patrick Moorhead: I’m really glad you brought up custom. It seems to be… Tech is like an accordion. It compresses and it gets custom that comes apart. But, at times where you are trying to get absolute highest performance at the lowest latency and the lowest power, custom has always been the way to go. I’m glad you brought that up. So Achyut, thank you for being patient here.

Achyut Shah: Sure.

Patrick Moorhead: Let’s talk about scale-out. How does it differ from scale-up and what are the technologies that you’re bringing the table to improve this space?

Achyut Shah: Thanks, Pat. When you look at these large language models that we have today, whether it’s from a training perspective or an inference perspective, you need thousands, tens of thousands, even growing number of XPUs GPUs connected together in a super cluster. And when you have a large language model, you break it out into multiple tasks, which different clusters doing different part of the task. So while scale-up is tens or a few hundred GPUs or XPUs together, scale-out is the network that allows you to connect all of these scale-up clusters or these AI servers together. So now you have a super cluster of 10,000, 100,000, at some point a million XPUs or GPUs together.

Patrick Moorhead: It’s funny, I remember when it was cool to have 5,000 clusters and then 10 and then 50, 100. And I was at a conference last week where they were talking about 200,000 XPU clusters connected and then you even have clusters connected between data centers and a lot of action going on in China right now, but a lot of discussion here of how we do this in the United States. So Nick, it’s actually going to be for both of you. What are some of the trends that are driving towards scale-up and scale-out and how are those shaping the standards out there for AI infrastructure?

Nick Kucharewski: So probably the most significant trend is moving to larger and larger scale-up clusters. Today we see predominantly the scale-up domain is either in a tray where you might be talking about four, maybe eight XPUs, and that’s the domain in your scale-up, or in a rack where you’re talking about less than 100, really limited to the physical constraints. So if you want to go bigger than that, you’re talking about expanding beyond any one given rack to a row of systems and then the interconnect becomes a really critical factor because copper traces or copper cables, which predominates on the board or in the rack doesn’t have the reach to go to a complete row. And then you look at what’s the right next step as you move from copper, which is low power and low latency to optics.

Patrick Moorhead: Makes sense. I got to ask Achyut a question on scale-out trends that are driving it.

Achyut Shah: If you look at scale-out and just the size of the scale-out network that we want to build, you very quickly realize that the amount of power you can bring to any one data center building in any one location is limited. So if you want to get to multiple hundreds of thousands of XPUs in a cluster, one of the trends we see is these distributed data center networks where you have these clusters spread across multiple data centers across regions. Sometimes they’re on a given campus 10 to 20 kilometers apart, sometimes these things could be a few hundred or a 1,000 kilometers apart. And now you need to connect all of these distributed data centers with very, very high bandwidth interconnect with its own switching fabric to make sure that it looks like one seamless cluster when you run a large language model.

The other thing that you see within the data center is the bandwidth capacities of these XPUs growing very, very quickly. They double or triple every year. So the network speeds that you need to interconnect all of these XPUs in the scale-out needs to double every year or every two years also. So not only do you see speeds interconnecting these XPUs double within the data center, you also now see these clusters spread across multiple data centers connected by high bandwidth interconnects across tens or hundreds of kilometers.

Patrick Moorhead: It’s interesting. I’ve been around for a long time. I think I’m in my 35th year of working in tech, and you always hear stuff like, “Oh, this is going to replace this. It’s going away. No more hard drives.” But guess what? Most storage is still on hard drives even in the hyperscalers today. And that goes for copper. Copper’s going away and it’s going to be replaced by optical interconnects. I got to tell you though, if I look at why people are making decisions, how they’re making decisions in the hyperscaler data centers, it seems like that optical could be this universal interconnect. Am I making things up here or am I onto something?

Achyut Shah: I started working in this field 25 years ago at 100 megabits and one gig and people said that was the last generation of copper. So the death of copper has been greatly exaggerated. Now, what you do see in the AI is, yes, there is a use for copper, but the rates that the speeds are going up, you do see at some point of time that the distance that copper can traverse shortens and shortens as you double the speed.

Patrick Moorhead: Sure.

Achyut Shah: There will be a point of time where people will keep using copper as long as they can. It is lower power, it is a little more resilient. But once you build these large clusters, both in scale-up and scale-out networks, you do see that copper simply cannot traverse the physical distance required for these large clusters. At that point of time, whether that’s 3, 5, 7 years from now, at some point copper runs out of steam and it gets on optical, but you do see copper technologies keep pushing the transition. You had DAC cables, you went to accelerated ACC cables, you now have DSPs inside copper to expand the reach. So I think copper will be around for a bit, but at some point of time it is going to go all optical. The question is when is that timeframe, whether it’s in the next five or seven years or is it in the next decade?

Patrick Moorhead: So Nick, in scale-up, is this easier for you because you’ve got shorter throw and shorter distances, or is the bandwidth and latency requirements even higher so you have to go optical as well? Where do you sit on this?

Nick Kucharewski: You make a great point there because we’re really talking about two kind of dueling constraints. One is the need for ever and ever a higher bandwidth. When you’re talking inside scale-up, you’re inside the server chipset where very, very high speed is critical because it drives your performance of your compute directly. But on the other hand, as you go to longer distances, as you move into the optical domain, you’re adding latency to the transactions happening across the scale-up. So there’s really this dual trade off of high bandwidth and low latency that you have to balance. And as a result, I think we’re going to continue to see an evolution of the architecture and it’s going to come down to what’s the problem the customer’s trying to solve?

For instance, if you’re working on an AI problem with a contained data set, maybe you’re doing inference that’s been optimized for a single application, your scale-up might be a single rack or less, and in that case you might stay copper a little bit longer. On the other hand, you might be doing a very large training workload where you want the scale-up to be as large as possible There you’re going to be motivated to move to optics and really to enable that very large scale and you’re willing to make the investment in architect for the additional latency of that solution.

Patrick Moorhead: I love that you make it conditional and not just a binary question here. I also was struck by some of your slides on even the different topologies of campus, one building versus maybe going kilometers between it. And I think it’s up there with different strokes for different folks, which is good that we have this covered. So I want to go a little bit peek into the future here. We talked a lot about where we came from, where we are today, a little bit about the future here, but the reality is that grounds-up data centers, they almost start with a clean slate and what do I need to do three years when I get this thing online, maybe two, but then how can I have four or five years of longevity in it? Nick, I want to start with you. How are these technologies transforming data center architectures for what people need to be thinking about in the future?

Nick Kucharewski: Absolutely. So it’s interesting because when you build on this concept that not all AI is created equal, you have drastically different compute problems that are being solved, whether you’re talking about large training or inference, whether you’re talking about a single application or you’re talking about a multi-tenant AI situation where you’re providing AI services to others. As you look to the future, you can see a continued evolution and a continued optimization of each one of those different use cases. So at the same time, hyperscalers are going to be looking at the available technologies and applying those to problems in very unique ways. So we see a future of multiple different architectures, different innovations that are specifically suited to the problems those hyperscalers are solving. And it’s really exciting because from a Marvell standpoint, the focus is on building up the portfolio of technologies to enable all of those future scenarios.

Patrick Moorhead: Well, it seems like, with the advent of custom, the potential scenarios are almost endless. Again, am I making stuff up again or is that where this is headed?

Nick Kucharewski: Yeah, and I think it gives that opportunity. So we have a broad standard product line the customers can choose from in building these solutions, but if they want to go the extra step to do something based on things that don’t exist in the market today, we give the customers that option so they can say, “You know what? I can’t find what I’m looking for, but I want it quickly.” Well, we’re here to provide that through our custom offering.

Patrick Moorhead: To shoot from a scale-up perspective, how are these new technologies, how do people need to re-look at the future architecture of the data center?

Achyut Shah: If you look back when people used to build the data centers a few years ago, you would look and see what are the standard off-the-shelf components you have.

Patrick Moorhead: Standard rack size.

Achyut Shah: Standard rack size.

Patrick Moorhead: Standard power size per rack.

Achyut Shah: Exactly. Even the module, the optical that you use, the switches that you use were all standardized off-the-shelf. You put it together and you get the best performance you get out of it. And that was very similar. You take a bunch of standard components, the results you’re going to get are going to be very similar. Now with each of our customers wanting to optimize their networks so that they can get a leg up on their competition, each of them have their own secret sauce, they have their own optimizations, customizations if you will, if you need to build. And so they use our products, we help them customize their products, their networks in specific ways. So based on the kind of workloads they want to run based on their business models, based on what their investment models are for CapEx or OpEx, each of them picks a slightly different architecture, a slightly different optimization point.

And we are here to provide them with the family of products that help them customize and optimize their networks in any fashion that they see. You’re going to get to these million XPU scale-outs, you’re going to get the distributed data centers, the speeds are going to grow up from 50 gigabits or 100 gigabits per lane to 400 gigabits across primarily optical networks, and each customer is going to optimize that network in their own way. And we provide them the family of technologies to help enable that.

Patrick Moorhead: It’s exciting. I could talk about this stuff forever. It’s super exciting because the pace of change is so quick and the numbers that we’re throwing around in terms of gigawatts, low precision tops and all of the data that has to go through there is just… 10 years ago, I don’t think we ever could have made this stuff up.

Achyut Shah: We could not have seen this coming.

Patrick Moorhead: But I’m glad we have people like you creating these technologies with your team and I guess as an analyst is to monitor and track and guide the industry as best we can. But I really appreciate you coming on the show and I hope we can do this again.

Achyut Shah: Absolutely. Thanks. Great talking to you.

Patrick Moorhead: Thank you guys.

Nick Kucharewski: Great. Thanks for your time. Good to be here.

Patrick Moorhead: So this is Six Five On The Road at Marvell Technology Headquarters. We are talking about our favorite topic over the last two years, and that is AI and data center AI, and hopefully you better understand where the scale-up and scale-out networks are, where they can go short term, but also into the long term as it impacts data center architectures. Check out all of our interviews with Marvell Technology executives and subject matter experts. Hit that subscribe button. Take care.

Author Information

Six Five Media

Six Five Media is a joint venture of two top-ranked analyst firms, The Futurum Group and Moor Insights & Strategy. Six Five provides high-quality, insightful, and credible analyses of the tech landscape in video format. Our team of analysts sit with the world’s most respected leaders and professionals to discuss all things technology with a focus on digital transformation and innovation.

SHARE:

Latest Insights:

Daniel Newman sees 2025 as the year of agentic AI with the ability to take AI and create and hyperscale your business by maximizing and automating processes. Daniel relays to Patrick Moorhead that there's about $4 trillion of cost that can be taken out of the labor pool to drive the future of agentics.
On this episode of The Six Five Webcast, hosts Patrick Moorhead and Daniel Newman discuss Microsoft, Google, Meta, AI regulations and more!
Oracle’s Latest Exadata X11M Platform Delivers Key Enhancements in Performance, Efficiency, and Energy Conservation for AI and Data Workloads
Futurum’s Ron Westfall examines why Exadata X11M allows customers to decide where they want to gain the best performance for their Oracle Database workloads from new levels of price performance, consolidation, and efficiency alongside savings in hardware, power and cooling, and data center space.
Lenovo’s CES 2025 Lineup Included Two New AI-Powered ThinkPad X9 Prosumer PCs for Hybrid Workers
Olivier Blanchard, Research Director at The Futurum Group, shares his insights on how Lenovo’s new Aura Edition ThinkPad X9 prosumer PCs help the company maximize Intel’s new Core Ultra processors to deliver a richer and more differentiated AI feature set on premium tier Copilot+ PCs to hybrid workers.

Thank you, we received your request, a member of our team will be in contact with you.