On this episode of Six Five On The Road, hosts Dave Nicholson and Lisa Martin are joined by Dell Technologies‘ Delmar Hernandez, Sr. Principal Engineer, Technical Product Marketing, and Steen Graham, Founder of Scalers.AI, for a conversation on the flexibility and advantages of GPU choice in the new Dell PowerEdge XE9680.
Their discussion covers:
- The design philosophy behind the Dell PowerEdge XE9680
- How Dell’s latest offering supports varied GPU ecosystems
- The role of the PowerEdge XE9680 in advancing AI and machine learning technologies
- Collaboration opportunities for businesses with Scalers.AI using Dell’s technology
- Future trends in hardware optimization for AI applications
Learn more at Dell Technologies.
Watch the video below, and be sure to subscribe to our YouTube channel, so you never miss an episode.
Or listen to the audio here:
Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.
Transcript:
Lisa Martin: Hey everyone. Welcome back to Six Five On the Road. We are in Las Vegas, covering Dell Technologies World 2024. Lisa Martin here with Dave Nicholson. Dell is in its AI era and as a Swifty, I love that. I can’t say that enough. We’re going to have a great conversation next. We’ve talked a lot about the Dell ecosystem, its partners, what they’ve been doing for a long time there that’s really deep and collaborative. We’ve got two guests with us, Steen Graham joins us from Scalers AI, founder fast-tracking industry transformation. Delmar Hernandez is here as well, not only celebrating PowerEdge’s 30th birthday, he’s also the senior principal engineer, technical product marketing at Dell Technologies. Guys, it’s great to have you on the program.
Delmar Hernandez: Thanks for having us. Really appreciate it.
Steen Graham: Yeah, I’m excited to be here.
Lisa Martin: Yeah, so what’s new? There’s got to be burning news, Steen, I know.
Steen Graham: Yeah. Well, I mean-
Lisa Martin: What can you drop?
Steen Graham: … I don’t think we can drop news as compelling as a Swifty Kelce rumor or breakup, but what we can do is we can break some news about the semiconductor industry.
Lisa Martin: Let’s do it. That’s even sexier.
Steen Graham: I think a lot of us have been working on generative AI workloads, and we’re really excited about what we have in the market today, but it’s not enough for the world’s demand for the innovation that gen AI will provide. And so we want to enable choice and Delmar and I have been working on a ton of newsworthy stuff, including some first in the world lives that we’re shown here at Dell Tech World on the MI300X processor from AMD, and really excited to work on that. And Delmar, do you want to share a little bit more about what we’ve been up to?
Delmar Hernandez: Yeah, so I’m the lucky guy that gets to play with the XE9680. I think if you walk around the show floor, everybody’s looking for that server, trying to put hands on it, figure out which GPUs are supported in it. So that is definitely, I think from my perspective, the star of the show. And we’ve been spending a lot of time with Scalers AI recently, seeing what it can do, benchmarking it, developing POCs. Basically we just hand the server over to Scalers and like, “Hey, make this thing shine.”
Steen Graham: Yeah, yeah. The XE9680 is packed with four of those in my 300X GPUs and-
Delmar Hernandez: Eight.
Steen Graham: Eight, sorry.
Delmar Hernandez: Sorry, sorry, I have to … yeah.
Steen Graham: Sorry, sorry. Sorry, it’s that-
Delmar Hernandez: I don’t want to get yelled at afterwards, I had to.
Steen Graham: … 5:30 afternoon, eight-way GPU systems that we’re running, and we’ve been able to do a lot of really cool stuff with that. And one thing you’re not able to do with other GPUs in the market is run a 70 billion parameter model on a single card. And we’ve done that. And then the other thing you’re not able to do is run eight 70 billion parameter concurrent instances with Kubernetes on a single node on the XE 9680. And we’ve done that. And another thing you’re not able to do is fine tune a 70 billion parameter model on a single node.
And we’ve been able to do that, but we didn’t stop there. We also, thanks to Delmar, we got a multi-node implementation running, and we’re connected via Broadcom Ethernet. And so we’ve done multi-node fine-tuning on that system as well to build custom model. And in this case, we trained on the PubMed data set to make a medical specific model on Llama 3 and we compared that side by side performance with the off-the-shelf, Llama 3 and we get great results on questions like MCAT questions as well.
So showing the promise of fine-tuning on that. And then really to take advantage of the MI300X memory footprint. We have a first in the world RAG demo, retrieval augmented generation, as you guys all know, and it’s a multi-modal as well. We’ve got voice, video and language inputs, and we’ve got that live on the show floor right now. What do you think?
Delmar Hernandez: I think we should take a victory lap. We took a chance. Yeah, we took a chance doing a live demo, right?
Steen Graham: Yeah.
Delmar Hernandez: I think everybody knows when you do a live demo-
Lisa Martin: You should take the victory lap.
Delmar Hernandez: … you’re taking a chance.
Steen Graham: Without a net.
Delmar Hernandez: Yes, yes. I mean, there’s so many things that can go wrong, and we’re day two and it’s working. It is drawing crowds.
Lisa Martin: That’s awesome.
Delmar Hernandez: It’s the first live demo I’ve seen on MI300X. We’ve been to a few shows together. We haven’t seen a live demo on this GPU yet.
Lisa Martin: Well Steen, talk about how you’ve been able to do that. Because you talked about a number of things that have not been able to be accomplished yet. And you said check, check, check.
Steen Graham: Yeah. Well, I mean first, it starts out with AMD and Dell’s great relationship and getting this platform to market, time to market. I think Dell’s foresight on their flagship XE9680 platform as well, and future-proofing that for additional GPUs. And then Dell’s got great infrastructure and people like Delmar that can provision these servers quickly and give access to a team of AI engineers.
And we can be first in the world to do a lot of great things based on that. So, shout out to the Dell team for putting us in a position to be first and AMD for making a great chip and all the other support we get from the ecosystem like Broadcom. I mean, Ethernet’s a great industry standard that we all have to get behind and off the shelf at work. So, we’re able to do fast things with trusted technologies like Ethernet as well. So, really exciting stuff.
Delmar Hernandez: Dave, I’m realizing that we didn’t take you to the lab when you were in town next time. Remind me.
Dave Nicholson: Next time, next time. We spend all of our time-
Delmar Hernandez: We’re going to give you the-
Dave Nicholson: … we spend all of our time in the Experience lounge.
Delmar Hernandez: Yeah, yeah. But we’re going to give you the grand tour next time.
Dave Nicholson: I’m looking forward to the grand tour.
Delmar Hernandez: Yeah, yeah.
Dave Nicholson: Well, you just must be absolutely thrilled to be at a place where you have access to all the toys. You’ve been playing with the AMD toys lately, but with Dell offering up choice, whether it’s CPU or GPU, XPU, choice. We actually talked to the CEO of Kalray earlier today, and they’ve got a DPU and Dell embraces it, absolutely fit for function. So not being hemmed in by someone saying there’s only one way to do something, that must be fun. Now, do you specialize in AMD specifically, or do you also work with other CPU, GPU stuff?
Delmar Hernandez: So it’s definitely a team. It’s not just me, but yeah, our team supports all of our partners, NVIDIA, Intel, and AMD. This server supports all three. To use Steen words, we’re entering an era of choice. AMD’s coming online, Intel’s coming online later this year with Gaudi 3.
Dave Nicholson: Gaudi, yeah.
Delmar Hernandez: So I mean, we’re lucky in that we are at the center of all this action. We’re able to deploy these workloads on all three GPUs in our lab. That’s another … not to take over. I don’t know where you’re going with this, but another thing-
Dave Nicholson: It doesn’t matter. It doesn’t matter. I want to know where you’re going with it.
Delmar Hernandez: So one of the really cool things that we’ve done with S Scalers AI is we’ve developed a software stack that works on NVIDIA, and then we’ve upgraded that stack to also work on AMD, and we’ll probably do that going forward on Intel. And that’s the distributed AI stack that Steen mentioned.
Steen Graham: Yeah, it’s really exciting. I think for the past year or two at least, innovation has been GPU constrained or AI accelerator constrained, depending on the nomenclature you want to use. But now we are entering this era of choice, and you can see it. Like today, Scalers AI has live demos on Intel, AMD and NVIDIA in modern enterprise grade RAG architectures, and they’re all running great, and they all have incredible attributes associated with each one of them. So it’s great to get choice in the market as well. And as Delmar said, we even have a fine-tuning stack that’s heterogeneous insofar as we support multiple GPUs or AI accelerators on that front as well.
Lisa Martin: What’s been the customer feedback? Sorry, Dave.
Dave Nicholson: No, no, yeah, yeah.
Lisa Martin: In terms of you’re in your choice era. I’m going to use another Swifty there because I just couldn’t … it’s right there on the table. But talk about the customer feedback because they want choice, they want to be met where they are. What’s been some of the feedback so far in terms of what you guys have been able to do together with the power of the ecosystem?
Delmar Hernandez: So from my perspective, a common question that we receive at Dell is AMD ready for these AI workloads? Does this model run with AMD ROCm? So what we’re able to do in collaboration with Scalers is answer that question. I mean, every time, it’s yes. If you go to Hugging Face those models, we download those, deploy them on the GPU in our lab and they work. So I think that’s one of the most common questions that we’re receiving is before I buy this expensive server, is the software going to work? So that’s a big part of the collaboration with Scalers AI, is answering that question.
Dave Nicholson: Let me play Davey Downer for a moment. Does choice bring with it confusion? Is it enough to say, “Yeah, we have choice in silicon.” Okay, but what about the stack that surrounds it? You may take a software stack, adapt it to run with other silicon and prove out that it’s easy. Is the market going to converge in the direction of less variation over time, do you think? Or do you think that this will go on into the future, with folks like Intel and AMD sharing the market with the NVIDIAs of the world. Or the NVIDIA of the world, I should say? What do you think?
Delmar Hernandez: I mean, it is a little confusing today, right? I mean, the AMD Instinct is not really out in the market yet. It’s coming. Intel Gaudi 3 is coming. So you’ve got a lot of software developers that have been using CUDA for many years. There’s millions of lines of code written against CUDA, so it’s going to take a little bit of time. But what we’ve found in the software stacks that we’ve built with Scalers is what is there six, seven key components? And you swap out one or two and then it works on the new GPU. So, I’m simplifying it. Steen can probably deep dive but it’s-
Dave Nicholson: But it’s not an insurmountable problem?
Delmar Hernandez: No, no.
Steen Graham: No, and I think, I mean the industry has done a great job of enabling application-level enterprise AI developers at the framework level, the PyTorch level, and all three leading companies are upstreaming optimizations and working collaboratively with the key frameworks. And they all have great collaborations with Hugging Face as well.
There is so much innovation right now where there are some modern tools that we all love to use that the incubant leader is definitely ahead of right now, but everybody’s catching up fast. And the thing is, some of those tools we weren’t using three months ago, some of those tools were created in the last nine months. And so, everybody in the industry is moving fast and upstreaming their capabilities and optimizations and supporting those latest leading software tools that we’re using as well. So, a lot of momentum on that front.
Dave Nicholson: Did he say incubant?
Delmar Hernandez: I think he did.
Steen Graham: Yeah, yeah.
Dave Nicholson: Is that a real word?
Steen Graham: Yeah, I think so.
Dave Nicholson: Is it really?
Steen Graham: I don’t know, I don’t know.
Dave Nicholson: It’s like the one who incubated, not incumbent, but … Or did you just make it up?
Steen Graham: I probably should have said incumbent, yeah. I was thinking …
Dave Nicholson: Yeah, because if it’s not-
Steen Graham: They did incubate it.
Dave Nicholson: … if it’s not a real word, Steen, it is now. It is now. What are you going to say, Delmar?
Delmar Hernandez: I need to brag a little bit for Steen. He probably won’t do it, but so you’re asking is it confusing and is AMD ready for these popular AI models? We gave Scalers AI access to a server. I think it was a Friday. We give them an IP address, that’s how it works. They remote into our lab. Saturday, I was getting text messages from Steen like, “Hey, we’ve got Llama running.” This is less than 24 hours later, random GPU.
Lisa Martin: Kid in a candy store.
Delmar Hernandez: Yeah, so these guys are super familiar with AI and AI software stacks. So, they were able to quickly deploy, and I just wanted to throw that out there. It was less than 24 hours, and they had the latest model running on a brand new GPU.
Dave Nicholson: So that, I mean, if what you’re … I’m not saying if it’s true, I believe that it’s true. That bodes well for choice moving forward. And it’s interesting as the world looks in at AI and asks the question, is there a single dominant player that will drive everyone out of the industry? It seems like Dell is not going to let that happen. Yeah?
Delmar Hernandez: Yeah, I mean, no, our portfolio, it’s AMD, Intel and NVIDIA. We’re supporting all three equally.
Lisa Martin: When you look at the roadmap, considering you were talking about some of the things that have been developed so quickly, tools that weren’t there a few months ago. How do you plan the next six to nine months with problems that are out there that we might not know about until three months from now? I’m sure that’s a challenge for a leader, but I’m sure that you do it well. I just am curious.
Steen Graham: Well, I mean, the thing about software programming in the modern world is everybody’s using modular, containerized, and the layers of abstraction in the microservice frameworks become more and more prevalent today. But if you don’t have discipline around microservices, if you don’t have discipline around APIs, you’re really going to have a challenge keeping up. Because one day, one vector DB is perfect for a customer, or one’s embeddings model is number one in the world, and then you’re swapping it out seamlessly. And so, creating a highly modular microservices-based architecture that we’ve all practiced for the last half a decade is the reality today, that you’ve got to really live off of.
Then the other thing is just every day. I mean, There’s something new every day. So you have to love learning. It’s really an incredible time to be in the tech industry today because something incredibly new every day. And if you take a day off, like a Saturday off, you might miss something like getting multi-node fine-tuning up and running on MI300X with Broadcom-based Ethernet, which I don’t think we’ve announced that publicly yet. Have we?
Delmar Hernandez: We published a blog.
Lisa Martin: We’re breaking news on Six Five On The Road.
Steen Graham: Yeah, yeah, and I think we’ll announce publicly in text format later, a multi-modal rag that we’re demoing live today on MI300X as well. So exciting times for innovation, and it’s a great time because I think everybody’s imagination has been unlocked by these large language models, but ultimately for enterprises, they need high-fidelity solutions based on their proprietary business workflows, and they can’t take any risk with that. So the industry’s really stepping up its game with high-fidelity RAG and really even fortifying that with some more agentic workflows to ensure these enterprise-grade GenAI platforms can transform businesses and give you a high-quality outcome and keep your proprietary secrets proprietary as well.
Lisa Martin: What’s the booth number for anyone’s watching that they want to be able to go and check out what you guys are demoing? You talked about some of these amazing things. Do you have the number you can share with the audience?
Steen Graham: Yeah, we’ve got our first in the world, AMI300X multimodal RAG Demo in 933 I think is the AMD booth.
Lisa Martin: 933.
Steen Graham: And yeah, and we’re showcasing some other RAG demos on NVIDIA, of course. In the Dell booth, we got an amazing solution to help IET professionals, so many great IET professionals here, and their workloads going up because of gen AI, and so we want to save them a little bit of time. So we created a rag environment that helps them analyze log issues on PowerEdge, network issues and app issues. And when they find errors, they can upload those logs and then triage those logs. And then auto-generate a report for an incident analysis and a root cause analysis. So I want to save those IET managers some time is their workloads just dramatically increasing with GenAI, as well.
Lisa Martin: You can make a lot more fans.
Delmar Hernandez: That’s another live demo. That’s another by the way.
Steen Graham: That’s another live demo.
Lisa Martin: Fantastic.
Steen Graham: That we’re running. And yeah, also, it wouldn’t be fun if we didn’t try our hands at a little robot engineering. So we do have an autonomous mobile robot running out there as well, trained in the NVIDIA Omniverse, so pretty excited stuff with the Isaac, a Nova Carter simulation platform. We’re deploying Nova Carter Live, which is a partnership between Segway and NVIDIA on that autonomous mobile robot. So, ton of cool stuff out there on the show floor and a ton of innovation in the market.
Lisa Martin: If you’re here, guys, check it out. Steen dropped some of the booth numbers, the partners, it’s definitely worth checking out. Because this is pretty groundbreaking stuff that you’re doing on a very accelerated pace. We appreciate both of you sharing the partnerships, what’s going on, the choice you’re enabling for customers to have and how you answered Downer Dave’s question with how it’s …
Dave Nicholson: Davy Downer.
Lisa Martin: Davy Downer. What did I say? Downer Dave?
Dave Nicholson: Yeah, it’s all right. Either way.
Lisa Martin: Tomato, tomato.
Dave Nicholson: Either way.
Lisa Martin: Anyway guys, we appreciate you sharing your insights. We’re going to have to have you back. I feel like we’re just peeling one layer of the onion and there’s more there. Awesome, guys. We thank you for your insights.
Delmar Hernandez: Thank you.
Lisa Martin: All right, for our guests and for Dave Nicholson, I am Lisa Martin. You’re watching Six Five On the Road from Las Vegas, covering Dell Technologies World 2024. Thanks for watching and bye for now.
Author Information
Lisa Martin is a technology correspondent and former NASA scientist who has made a significant impact in the tech industry. After earning a masters in cell and molecular biology, she worked on high-profile NASA projects that flew in space before further exploring her artistic side as a tech storyteller. As a respected marketer and broadcaster, she's interviewed industry giants and thought leaders like Michael Dell, Pat Gelsinger, Suze Orman and Deepak Chopra, as she has a talent for making complex technical concepts accessible to both insiders and laypeople. With her unique blend of science, marketing, and broadcasting experience, Lisa provides insightful analysis on the latest tech trends and innovations. Today, she's a prominent figure in the tech media landscape, appearing on platforms like "The Watch List" and iHeartRadio, sharing her expertise and passion for science and technology with a wide audience.
David Nicholson is Chief Research Officer at The Futurum Group, a host and contributor for Six Five Media, and an Instructor and Success Coach at Wharton’s CTO and Digital Transformation academies, out of the University of Pennsylvania’s Wharton School of Business’s Arresty Institute for Executive Education.
David interprets the world of Information Technology from the perspective of a Chief Technology Officer mindset, answering the question, “How is the latest technology best leveraged in service of an organization’s mission?” This is the subject of much of his advisory work with clients, as well as his academic focus.
Prior to joining The Futurum Group, David held technical leadership positions at EMC, Oracle, and Dell. He is also the founder of DNA Consulting, providing actionable insights to a wide variety of clients seeking to better understand the intersection of technology and business.