Search
Close this search box.

Scaling AI with Astera Labs’ Smart Fabric Switch Portfolio – Six Five On the Road

Scaling AI with Astera Labs’ Smart Fabric Switch Portfolio - Six Five On the Road

On this episode of Six Five On the Road, host Patrick Moorhead is joined by Astera LabsJitendra Mohan and Thad Omura for a conversation on scaling AI through innovative solutions like the new Scorpio Smart Fabric Switch Portfolio and overall understanding the journey and vision of Astera Labs.

Their discussion covers:

  • Astera Labs’ journey to enabling AI infrastructure and reflections on their IPO
  • The challenges of keeping up with demand for AI models, next-gen GPUs, and AI accelerators as discussed by Thad Omura
  • Insights into Astera Labs’ Scorpio Smart Fabric Switch Portfolio and its alignment with industry trends and hyperscaler demands
  • How Astera Labs’ current developments tie into its future vision for advancing AI technology infrastructure

Learn more at Astera Labs.

Watch the video below, and check out our website so you never miss an episode.

Or listen to the audio here:

Disclaimer: Six Five On the Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript:

Patrick Moorhead: The Six Five is On the Road here at Astera Labs Headquarters in Silicon Valley. And we are here talking about my favorite subject, and that is AI. AI is everywhere. AI is pervasive and AI starts in the hyperscaler data centers. I always like to say, “Hey, in every inflection point, all elements of the data center chain get challenged, whether it’s compute, storage, memory, and as importantly, networking.” And networking performance, network and reliability, and programmability have really stormed on the stage and not just with training, but also inference with generative AI models. And I can’t think of two better guests to talk about this incredible announcement here than Jitendra and Thad from Astera Labs. Gentlemen, welcome to the program. First time Six Five guests. Hey, it’s an honor to be here on announcement day.

Jitendra Mohan: Thank you, Patrick.

Patrick Moorhead: Absolutely.

Thad Omura: Great to see you, Patrick.

Patrick Moorhead: What a great story. You started off with a 2,000 square foot facility upgraded to 10, and here we are. I think you have multiple floors here in the building. And it’s not just about people, but it’s about growth. So you’re obviously doing a lot of things right here and maybe I’ll start off with this. Congratulations on the IPO in March. I was doing a lot of research coming in here. I watched about three hours of Jitendra online doing interviews, and obviously I was familiar with the work the company has done, pretty deep into your ecosystem and with your customers. But tell me about the creation story here. I mean, was this the plan from the start in the vision of your company or was it like most startups where they have a lot of options, they see a lane and then they go for it and they just dig in?

Jitendra Mohan: Fortunately, Patrick, this was the plan all along.

Patrick Moorhead: Okay.

Jitendra Mohan: We’d been waving the AI flag, I would say six, seven years ago when there were no self-driving cars.

Patrick Moorhead: Before it was cool.

Jitendra Mohan: Before it was cool. Now you got self-driving taxis in San Francisco. Model sizes have become 2 trillion parameters that we never imagined. But what we did imagine was that AI would be an integral part of our lives, and the AI models will have to become larger in order to deliver the performance that we need. And so we basically believe that AI models will be run in the cloud, and we wanted to be the company that would provide the connectivity infrastructure to enable running AI in the cloud.

So that was a very simple premise. We’ve stayed true to that over the last six years. We’ve delivered multiple product families in support of this connectivity for the cloud and AI infrastructure. We have our Aries product family that’s shipping in the millions in full production powering a lot of the AI workloads that you see deployed in the cloud and even at the edge today, Aries is the PCI express retimer family. Then we have Taurus, ethernet retimer family that is deployed in the form of smart cable modules starting to ramp later this year. And then finally we have the Leo family that addresses the memory bottleneck that these AI systems see. And very excited today to talk about our fourth product family Scorpio that we launched today.

Patrick Moorhead: No, it’s very exciting. And this is definitely not your first rodeo, so you’re fully deployed in a bunch of hyperscalers today.

Jitendra Mohan: Absolutely.

Patrick Moorhead: Excellent. So Thad, there’s a lot of things going on in the market. It seems like every three to six months things seem to change, but that’s a super challenge when you have to lay gates and you have to build products for the future. So what’s happening now would’ve had to have been planned a while ago. So that takes foresight. We’re seeing models doubling. We’re seeing small models. We’re seeing more heterogeneity across accelerators, GPUs, XPUs, ASICs, things like that. What’s your take on these trends and why do they matter?

Thad Omura: Well, what we’re seeing based upon the deployment, the mass deployment of our product lines, our Aries and other product lines that Jitendra mentioned, we actually have a front row seat at all of this, which is fantastic. And we are seeing elements of everything you mentioned, whether it’s diversity in the AI accelerators, GPUs, you name it, how they’re trying to connect into the rest of the infrastructure, whether it’s cloud AI factories or what have you, we’re seeing the diversity just continue to multiply out. Now, what you obviously hear about in the press a lot is the scale of the training workloads, right? Certainly that will continue as the workloads continue to grow.

Patrick Moorhead: There’s a million nodes.

Thad Omura: Cluster.

Patrick Moorhead: Millions of nodes.

Thad Omura: Billions and billions of parameters. We’re also seeing though on the inference side, the number of GPUs that have to work together in order to have real-time response multiply up very quickly as the models grow. So in all of that, what’s happening is that the connectivity to tie in these AI accelerators, GPUs, what have you, is becoming more and more important, not just between the GPU and the head node that’s controlling the GPUs, providing connectivity to the networking, doing all the data ingest to keep those GPUs fed and as efficient as possible. But we’re also seeing a tremendous amount of diversity and need for platform-specific solutions on what we call the backend or the GPU clustering.

Patrick Moorhead: Right.

Thad Omura: The fabric that keeps all the GPUs working together and working on the same problem as quickly as possible in unison. And what we’re really seeing is that customers who are trying to figure out what’s the right fabric solution, not just on the front end but also on the backend, they’re looking for optimal solutions from a power perspective. They need something that’s not complex that fits those applications like a glove, and they want to be able to do that in shorter and shorter timeframes as these platform cadence are getting released at a much faster pace.

Patrick Moorhead: Does reliability play into that? I mean, when it comes to a training run or inference run. At this point, a lot of chatter I’m hearing out there says it’s very difficult to create, sorry, to complete a training run. And even now in the inference when it comes to latency out there, that’s an issue today.

Thad Omura: There’s no doubt that in these training runs, you want to keep the GPUs fed because these runs can take days, weeks, even months to complete for the largest models. So any optimization and utilization improvements, including connectivity to all the devices is absolutely critical. But it’s becoming even more critical on the inference side because when you’re talking about an immediate response from the infrastructure for an inference request, you’re now talking about a direct user experience impact. And so the connectivity that is connecting all of the GPUs to respond instantaneously, if you will, has become critical. One of those GPUs goes down, the entire user experience is impacted. Reliability remains one of the most critical things for the infrastructure from a utilization and from a user experience perspective, absolutely critical.

Patrick Moorhead: Understood.

Jitendra Mohan: And then the same thing for the training site as well. If you look at even a very well-run cluster today, it is probably operating at a peak utilization of only 50%.

Patrick Moorhead: Yeah.

Jitendra Mohan: So most of the time these complex GPUs are basically waiting for either data or memory.

Patrick Moorhead: That’s expensive idle time.

Jitendra Mohan: That’s expensive idle time, and that’s a problem that we are trying to solve with our products.

Patrick Moorhead: I appreciate that. So, Thad, great job outlining the needs that are out there and magically Scorpio comes in to address these needs. So Jitendra, I’m going to start with you. How does the Scorpio smart switch family address all of these challenges and opportunities that Thad outlined for us?

Jitendra Mohan: Wonderful. Patrick, first and foremost, I do want to give a shout-out to the team that has been hard at work for their blood and sweat to make sure that we meet the expectations of our customers and deliver this product on the timelines that they set out for us. So thank you, team.

Patrick Moorhead: Excellent.

Jitendra Mohan: Having said that, I’m really excited to announce the Scorpio smart fabric switch family purpose-built from the ground up for AI applications to support the AI data flows and AI workloads. Scorpio family comes in two series. First is the Scorpio P series. A Scorpio P series is used for scale out in front-end connectivity to connect CPUs, GPUs, SSDs, NICs, you name it together. So the focus here, the architecture here supports mixed mode traffic. It focuses on interoperation between different types of root complexes, different types of hosts, as well as many different types of endpoints. The second is the Scorpio X family. The Scorpio X series addresses the back-end connectivity. It’s a nascent market, but growing very, very rapidly where we connect GPUs together to form a GPU cluster, also known as back-end scale-up networking. And here the bandwidth between the GPU is extremely important and this family is architected to deliver maximum bandwidth in GPU-to-GPU traffic.

Patrick Moorhead: Excellent. And for those watching out here, you can interchange XPU for GPU and the ASICs. There’s a lot of custom hyperscaler ASICs out there as well.

Jitendra Mohan: Absolutely. I do use the word GPU somewhat interchangeably for any AI accelerator, whether it’s a GPU, a XPU, an ASIC, they all fall under the category of AI accelerators. But one thing that underpins the requirement of these GPUs and what is the linchpin of our Scorpio family, first and foremost is performance. We need to make sure that we deliver the performance that these incredible compute engines require to connect to each other. And we do that by delivering nearly theoretical maximum bandwidth to interconnect these GPUs together. We also deliver it in a modular fashion such that it’s easy for our hyperscalers to integrate the solution into their boards. We deliver this in a small package that does not take up a lot of real estate on the board, and more importantly, it can be placed where these high-speed signals are. As these data rates continue to double, you don’t want to route these signals all over the place on the board. So we came up with a modular architecture with a small package which can place close to these devices in close to where the signals are.

Second is reliability. You mentioned that earlier. Reliability robustness of these devices is extremely important, and that’s where our Cosmos software comes in, where we leverage the software-first architecture of these devices to deliver incredible amount of diagnostics, observability, debug capability, and in general, just excellent fleet management for our hyperscaler customers to deploy a lot of these devices. As I mentioned, for the backend connectivity, it is important to deliver customization as well. So the software-first architecture of our Scorpio family allows us to customize the family for the unique requirements of backend connectivity, which vary from hyperscaler to hyperscaler. Third, I would say is the infrastructure reuse. We have to make sure that our customers can fit these devices into the existing infrastructure that they have in the data centers as well.

Patrick Moorhead: On the customization, are these different chips per hyperscaler or is it one chip and you’re customizing with firmware and software?

Jitendra Mohan: Yeah, that’s the beauty of our products. We don’t need to tape out a chip for every application for every hyperscaler. We can use our Cosmos software and the software architecture of our chips to customize Scorpio X family for the unique requirements that each hyperscaler has for their backend connectivity.

Patrick Moorhead: And it helps them with their supply chain too, which is a big deal. Good answer. I’m glad that was, yes. So, Thad, you are fully integrated into many hyperscalers out there, and here we have Scorpio come along. Does this mean they need to toss their old equipment that was in there before or do these somehow talk to each other, and for lack of a better term, interoperable or maybe even they have superpowers when they come together?

Thad Omura: It’s a great point. So first of all, Scorpio was designed for connectivity to the head node to be able to leverage all of the existing products, whether they’re CPUs and NICs and storage devices. They can completely leverage all of those as is and actually get a performance boost by using the Scorpio fabric switch. That’s on one hand. The next is with our Hyperscaler customers who are using our Cosmos software infrastructure, they can fully utilize that as well as they now integrate Scorpio into their infrastructure. So Cosmos is the infrastructure by which we provide this incredible amount of telemetry, and is the infrastructure running optimally? Are the links all stable and running well? So when you actually pair up now our Aries PCI express smart retimers with Scorpio, we actually get an increased amount of telemetry that is talking between the two devices that then can really be expressed out to the hyperscalers.

So they actually get a tremendous benefit from that perspective. The other elements that our customers are asking us for specifically are how do we make the infrastructure more secure. So security is becoming a bigger and bigger issue in these AI platforms, and we help them address that with our Scorpio device. In addition to what do you do with all that telemetry information that’s being provided by Cosmos? We make sure now that the uptime for the entire infrastructure enables the best utilization, which ultimately what people are looking for is as they spend billions of dollars on this infrastructure, how am I going to get the best return on investment? You put in the infrastructure that provides you the best visibility to make sure you’re utilizing it to the fullest.

Patrick Moorhead: I’ll tell you what, I think I’ve been briefed 5,000 times on new technology and the ability for a technology to come in and connect with prior technology and actually add additional features is unique. You don’t see a lot of that. Usually it’s rip and replace type of stuff. So that must have been planned early in the process. Does this have to happen like that? I mean, you have to plan this up front architecturally.

Jitendra Mohan: Absolutely. So our devices are built from the ground up to be software first. So we do most of the functionality of the device, 50, 60% of the functionality in software that allows us to deploy or bring about the customization that our cloud service providers hyperscalers require, changes that they require sometimes in flight as well as all of the debug features and observability features and fleet management features that Thad just talked about. However, I will say that the sequence of events is a little bit different. We got our retimer class products, Aries products designed in first because that was the solution that the hyperscalers needed. Once the product got deployed on the strength of its architecture, that’s when Cosmos came about to expose all of this functionality to the hyperscalers. And now that the Cosmos is deployed into their software systems, the easiest thing for them is to simply upgrade from our current generation of devices to the next generation of devices or indeed a new product family entirely, because these are all backwards compatible to Cosmos.

Patrick Moorhead: So interesting past, exciting day today. You’re amping up your offering in a lot of ways. How would you encapsulate that in the context of your future vision? What does this mean about your vision? And I want to hear from both of you here. Jitendra, we’ll start with you.

Jitendra Mohan: Absolutely. I think it’s probably worth seeing what our vision is. So our vision simply put, was to deliver connectivity solutions to truly unlock the potential of cloud and AI infrastructure. That is what we have done with all of our products so far. And that is exactly what we are doing with the Scorpio family as well. We are interconnecting these really complex, computationally intensive GPUs together. We are working very hard to make sure that they stay fully utilized, as high utilization as possible. All of our previous products have done that and Scorpio fits right in. And so to that standpoint, we are addressing the same end application, AI training and AI inference. We would be selling this to the same set of customers. In fact, we are leveraging the technology that we have already built with our previous IO class retimer devices, both at PCI express, gen five, and also more recently at PCI express gen six.

However, we are solving a more complex problem, and as a result, the ASP that we can combine for these devices is much higher and we end up addressing a much larger market than what we have in the past. So it is a very exciting time for us to both introduce the Scorpio P series that we are more familiar with connecting your GPUs and CPUs, and NICs, and storage together, but even more excited about the Scorpio X series, which now allows connection of these back-end networks, which by definition because of the all-to-all connectivity are more dense. So we have more connections. We have more ASPs. It’s all looking good.

Patrick Moorhead: I love it. Thad, any comments on this?

Thad Omura: Yeah, I mean, at the end of the day for these AI platforms, what we’re really looking to impact with the connectivity is the user experience. Okay? And user experience is realized in the form of performance and responsiveness. It’s realized in the reliability because now one GPU goes down in an inference response, the whole user experience is impacted. So that’s where we’ve focused a lot of our efforts in deploying Scorpio and getting this product ready for deployment. And the other element that we’re also very much getting feedback directly from customers is that some of the incumbent solutions that are designed for these products, they end up having to compromise.

They end up having to use a device that’s higher in power. They end up having to utilize a device that actually takes longer design cycles to design in because of the complexity built in some of the incumbents, which is that previously were only available. So now with Scorpio, we really alleviate a lot of the power that Dr. Jitendra mentioned, simplify the design in the routing and putting these products in to keep pace with really what we’re seeing in this AI market of new platforms being released every year based upon different architectures. And that goes back to that diversity message that you brought up at the beginning.

Patrick Moorhead: A follow-up question on there, we’re talking a lot about the hyperscalers here, but our research suggests that the next wave and not that hyperscalers are going to slow down, is the enterprise. Is there any reason that your technology can’t be transported to the enterprise where 80% of that data exists today?

Jitendra Mohan: Yeah, absolutely no reason whatsoever. In fact, I will reiterate that the products that we have, Scorpio in particular are purpose-built ground up for AI applications. They’re applicable for running training workloads in the cloud. They’re also applicable for running inference workloads at the edge if the need arises. So absolutely purpose-built for AI. And the reason we can say this with confidence is because our product definition is really influenced by or driven by our customers.

The close collaboration that we have with them, the trusted relationship that we have with our customers really gives us a ringside seat to see what is going on. The visibility that we have into our customer’s roadmap is unprecedented, and that is what is driving our roadmap of devices. And truly, we are hard at work working on even newer product families that will really propel us from where we are today with IO class devices typically deployed at a rack level on copper to now fabric class devices that are going to be connecting AI accelerators across the data center over copper and over optical technologies.

Patrick Moorhead: I love that. So, Jitendra, this is a great conversation, but I want to give you the last word here. In the context of this announcement, what do you want to communicate to your customers, to your investors and your employees?

Jitendra Mohan: Thank you, Patrick. It’s a great question. So first of all, let me say that one of the core values at Astera Labs is our focus on customers. We obsess our customers. To our customers, I will say we are there for you. We are going to match pace with you. We would like to be your extended engineering team to deliver the products that you need when you need them. To our investors, I will say that we are hard at work to increase the value of your investment, and we thank you for the trust that you’ve placed in Astera Labs through your investments. And finally, none of this would be possible without the team. So to the team, I thank them from the bottom of my heart. They put in a lot of blood and sweat into building these products. Keep doing what you’re doing. Everything else will fall in place.

Patrick Moorhead: Excellent. Guys, it’s been a great conversation. Appreciate your time. I mean, my company has been following you, but we’re going to follow you even closer, particularly with this last announcement. And we’re interacting with your customers and your customer, customer. And I just love this disruptive story in a very important market to pretty much everybody who’s out there. So hopefully we can do this again, really appreciate your time.

Jitendra Mohan: Thank you, Patrick.

Patrick Moorhead: This is Pat Moorhead signing off here at the Astera Labs Headquarters here in Silicon Valley. Disruption is amazing and a company like this, it’s just fun to see them delivering value literally in a sea of giants out there and solving real problems that are known, whether it’s the performance on nodes, I’m still waiting for this million node training run here, or whether it’s even making the head node and all the networking more efficient there. But also performance is one thing, but reliability is the other. And I can’t wait until these devices get installed to check in and see if it is in fact delivering on the promise. So check out more of our data center AI content at The Six Five and on my website too, at Moor Insights & Strategy. You take care. Thanks for tuning in.

Author Information

Six Five Media

Six Five Media is a joint venture of two top-ranked analyst firms, The Futurum Group and Moor Insights & Strategy. Six Five provides high-quality, insightful, and credible analyses of the tech landscape in video format. Our team of analysts sit with the world’s most respected leaders and professionals to discuss all things technology with a focus on digital transformation and innovation.

SHARE:

Latest Insights:

Nakul Duggal, from Qualcomm, joins Patrick Moorhead and Daniel Newman at the Snapdragon Summit to share his insights on the latest developments and the future of mobile technology. This brief overview captures the essence of their conversation, highlighting the innovation and strategic visions discussed.
Bruno Aziza, VP of Data, AI & Analytics Strategy at IBM, joins Keith Townsend to share insights on harnessing business intelligence and BI assistants to anchor and advance IBM's AI strategy.
Intel Is Undergoing a Significant Restructuring, Focusing on Investments in Artificial Intelligence and Advancements in Its Foundry Operation
Bob Sutor, VP and Practice Lead for Emerging Technologies at The Futurum Group, analyzes Intel’s Q3 2024 earnings report, focusing on the company’s restructuring efforts, investments in artificial intelligence, and advancements in its foundry operations.
Dion Hinchcliffe, Camberley Bates, and Steven Dickens share insights on this episode of the Six Five Webcast - Infrastructure Matters on how AI is revolutionizing the tech landscape, while mainframe technology solidifies its presence through unprecedented innovations.