Dell AI Data Management Services – Six Five On The Road at SC24

Dell AI Data Management Services - Six Five On The Road at SC24

There is no AI without data. Host David Nicholson is joined by Dell Technologies‘ Global Portfolio Lead, AI, Apps & Data Services, Beth Williams on this episode of the Six Five On The Road at SC24. Beth and David discuss the importance of data in virtually any AI use case and how Dell is addressing the challenges of AI Data Management.

Tune in for more on ⤵️

  • Data-related challenges organizations face when adopting AI, including:
    ✅ Managing massive data volumes
    ✅ Ensuring data quality
    ✅ Meeting AI-specific regulatory standards
    ✅ Adapting to the demands of AI development and deployment.
  • Techniques and technologies organizations can adopt to address AI Data Management challenges
  • Dell’s Data Management Services, including optimization and implementation services for data cataloging and pipelines, and Dell’s collaboration with leading technology providers to offer advanced data management for AI

Learn more at Dell Technologies.

Watch the video below at Six Five Media at SC24 and be sure to subscribe to our YouTube channel, so you never miss an episode.

Or listen to the audio here:

Disclaimer: Six Five On The Road at SC24 is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript:

David Nicholson: Welcome back to Six Five On The Road’s Continuing coverage of SuperComputing ’24. I’m Dave Nicholson, and I’ve got a very, very special guest from Dell Technologies today. Beth Williams, welcome to the program. How are you?

Beth Williams: I’m doing great, thank you. Thanks for having me.

David Nicholson: So data is at the center of all things AI. In fact, without data, I would argue that AI stands for absence of intelligence. Would you agree that data plays a critical role in successfully deploying AI? Are you seeing that with your customers?

Beth Williams: Absolutely. I think it’s fundamental actually, in terms of implementing any kind of AI use case, making sure that you’re getting your data right, getting the quality right, getting the right sort of data to the AI model, all of that is imperative to get a use case implemented correctly.

David Nicholson: And what are you seeing in terms of challenges that are out there? First of all, what is your role at Dell Technologies? What’s the perspective that you bring to this?

Beth Williams: Yeah, so I’m in the consulting team, so I’m the global portfolio lead for all of the consulting offers around AI and applications and data. And they’re obviously all very linked.

David Nicholson: Okay. So you’re engaging with folks outside of what we would think of as the ivory tower in tech actually, where the ROI rubber meets the road, I guess, but so what are some of the specific challenges that you are seeing?

Beth Williams: So there’s lots of different things. I’ve mentioned quality already. I mean, I think that’s probably one of the biggest challenges, especially when we’re talking about things like generative AI, whether or not you’re training a model or you’re trying to feed a vector database via RAG, making sure that the quality of the data that you’re using in the models through those mechanisms is right, is fundamental, and it’s a big challenge. I think part of the reason for that is now we’re opening up different data sources that we’ve not really used before. So unstructured data has suddenly become really important to generative AI models. In fact, it’s probably the richest source of data that we’re seeing for gen AI. So things like SharePoint sites are a big example of that. In fact, we’ve got probably about 500,000 if not more SharePoint sites in Dell, and we are using that as a mechanism to feed a lot of the models that we’re using.

But obviously not all SharePoint sites are correct. Some of them might have a bit of stale data. And so quality is really, really important. Making sure that you’re cleaning that data, making sure that whatever you are feeding the model is right, but also the amount. I mentioned, we’ve got over 500,000 SharePoint sites, and that’s just SharePoint. So there’s so much data to deal with, that whole volume of data is a big challenge. And the different types of data, I’ve mentioned unstructured data, but you can break that down even further. If you think of, say, PDFs, within a PDF, you’ve obviously got texts where you’ve also got things like images and graphs, and you’re trying to take all of that different type of data, that kind of multimodal data and feed it into your model. And that’s really, really tricky. In fact, we as Dell have found that quite challenging in the past, especially around things like images and graphs.

David Nicholson: One of the concerns that we hear about constantly is concerns about governance, privacy, things like that. Are those concerns completely warranted or do we have adequate solutions for those issues at this point? What do you say?

Beth Williams: I think the concerns are warranted for sure, and I think part of the reason why it’s certainly around security, it’s getting quite interesting, is because we now have a lot of different attack vectors that we didn’t have before. So people have been doing AI for quite a while, but they had data security in place. But now when we’re starting to talk about using models, there’s all these new opportunities to get to it in a nefarious way. So you could get to the data sources that you’re pulling in and start to poison those. You could even get to the point where you’re talking to the model and poison the prompts. So there’s loads of different new attack vectors that have suddenly opened up all of this opportunity for nefarious characters to start accessing and polluting your data and your models.

And then governance itself has always been a concern around data, PII data and so on. But now that’s even more important as well, because we’re starting to see a whole shift in the types of data being consumed as well. I mentioned things like SharePoint sites. If you think about that, if you think about the way that we currently use SharePoint, for example, you have this inbuilt role-based access. So I can say, you can see my site and they can’t see my site. But the minute you start to suck all that in to say, a vector database, that’s gone. So all of a sudden you’ve just got all of this lovely data and you have no idea who’s meant to see what. And so governance becomes really important at that point to make sure that that role-based access still prevails after you’ve flattened all the data.

David Nicholson: So that’s interesting because the traditional thought process behind data hygiene is the idea that you want to avoid as much of the garbage in, garbage out outcome as possible, and then you have to balance between how much time and money do we spend making it perfect before it’s good enough. What you just described is actually inducing additional error, if you will, because you’re stripping away the permissions that have been managed in a traditional way. So when you go in and you engage with a client, how do you sort through that? How do you figure out how much more time and money should be thrown at making the data perfect? And then how would you address something like that where you’ve stripped away the traditional permissions and now you’ve made it this, the data lake house next to the lake full of data sounds great, but maybe you’re not supposed to see my data. How do you manage all of that?

Beth Williams: So not surprisingly, there isn’t a silver bullet. There’s lots of different mechanisms that you need to put in place. We often talk about things as becoming a data swamp these days, because of all of that data going in without the right kind of lineage being attached to it. So one of the things that we always suggest, well, first of all, don’t try and boil the ocean. I mentioned we’ve got hundreds of thousands worth of SharePoint sites. We don’t need to necessarily clean up all of those sites. So first of all, focus on the data sources that are important to you. And quite often the best way of doing that is looking at the use cases that you’re going to implement first. So prioritizing what you’re going to use the data for first, that’ll give you a subset of the data sources to look at.

And then once you’ve decided on that subset, it’s a case of going through some of the processes that we use. For example, we use data catalogs. And so what we can do there is look at the data sources and tag the sources with metadata. So we can say, for example, this is the lineage of this data source. We know exactly where it’s come from and this is the role-based access we would like. These are the people that are allowed to see it, these are the people that aren’t, this has got PII in it. So having that kind of metadata tagging to the data sources in a catalog is a really good way of getting started. Because what you can do with that then is even though you might not be using that straight away, when you start to consume the data with say, data pipelines, you can actually access that metadata and using policies, you can say, well, actually, that data’s allowed, that data isn’t. And you can do that repeatedly, automatically on the fly if you like.

David Nicholson: I’m curious to hear what your experience has been working with folks who are thinking about fine-tuning a model with their own bespoke, proprietary, crown jewels of data. Do you think this is pushing folks more in the direction of hybrid cloud? The hyperscale cloud providers would’ve said at some point that, nah, hybrid cloud, it’s just a bridge until everything can be in our clouds. I’m hearing a lot of folks say, “Well, hold on a minute. No, no, no, no, no. We want a core of what we’re doing to be somewhere where we feel like we have more control.” Are you seeing the same thing? What are your thoughts? I know, look, Dell does hybrid cloud better than anyone, but what’s your perspective?

Beth Williams: It’s always going to be a hybrid cloud world, and it comes to the data, because as we say in Dell, there’s certain amounts of data that will never ever leave Dell. Under no circumstances will we let that data go. It’s really important to our business. And so it will always stay on-prem, and there’s other data that we already put out to say to public cloud, which is okay, it’s customer-facing. So there’s lots of different ways to achieve the same goal, but at the same time, you’ve got to stick to your core principles of this is my IP, this is my data, and this is really important for my customers that I keep this data safe. So we see that across many, many different customers. Obviously, from a Dell perspective, our customer base is predominantly those people that are already thinking in that space anyway. They’re already looking at hybrid cloud and have been for a very long time because there are certain workloads that will never ever leave their premise. But as we say, there’s always good use cases for public cloud as well.

David Nicholson: Yeah, in the classes that I teach in AI, the big question from CIOs and CTOs is always, how do I get to the nirvana of a positive ROI from AI? Specifically from generative AI, but from all things AI, Dell has been in this business of helping people “manage their data”. I use the big air quotes to all encompassing managing data for a long time. So this isn’t all net new for Dell. Can you walk us through from the perspective of what Dell has been doing in the past, what you’re doing in the present, and what you’ll be doing in the future, and how that’s changed? Because some of the stuff you’re talking about is stuff you’ve been doing. 10 years ago it would’ve been the same thing, but how have things really, really changed recently? And then what can we expect on the horizon?

Beth Williams: Yep, you’re absolutely right. So Dell and obviously Heritage EMC, we’ve been focused on storing customers data for many, many years. So our data storage and data management solutions have been front and center for a very long time. What we’ve started to move into now are things, as you mentioned before, the Lake House concept, the Dell Data Lakehouse, using the sort of Starburst partnership. So we started to move up on top of that kind of storage and management layer and start looking at, well, actually, what are you going to do with this data? How are you going to manage it effectively? How are you going to be able to extract metadata? So we have products that are now evolving so that metadata can be automatically extracted from the storage products that we’ve got as well. So that evolution is happening. So we’re slowly moving up the stack of, I would say, data products into that kind of data management space.

But we’re not trying to be all things to all people. So very clearly, we are focusing on data management that in this context, that will accelerate AI and gen AI. We’re not going to everything in data management that the world could need, it’s going to be very much focused on storing your data and then managing it in the focus of accelerating AI. And what you’re going to see, I think coming down the line is more of that. More solutions using our ecosystem of partners, using the best of breed technologies where we can go and actually help customers achieve those goals, look more at the kind of AI use case world, the AI solution world, and help the data feed those solutions. And again, using kind of the ecosystem of partners that we’ve got as well as our fundamental data products.

David Nicholson: I love the fact that you referred to Heritage EMC. Thanks for making me feel very, very old.

Beth Williams: I’m Heritage EMC, so there you go.

David Nicholson: I actually, yeah, full disclosure, I was at EMC for 16 years prior to Dell acquiring EMC. So Heritage, wow. I think I get to get those special license plates now for my cars, classic.

Beth Williams: Absolutely.

David Nicholson: So what are some of the things that maybe people would be surprised by when they’re initially coming at this? If the board, I’m a CIO, I’m pretending to be a CIO, and my CEO calls me and says, “Dave, the board just asked me what our AI strategy is, what do I tell them?” And so I’ve got to come up with an answer, and I reach out to you and I say, “Hey, I want to kick off an AI pilot program.” And that’s literally all I tell you. What am I going to be surprised by when you say, “Okay, okay, Dave, first, let’s start with step zero or step one.” Anything that people are shocked by that you can’t just flip a light switch on? What does that look like?

Beth Williams: Yeah, no, I think there’s obviously people always wanting the panacea. They’re always wanting the guaranteed solution, the killer use case that’s going to make their ROI. And like anything, it takes work and it takes some dissemination to work out exactly what it is that we need to be doing. And we had to do that as Dell. So we had hundreds of use cases, over 800 use cases that we were booting around in terms of thinking about what to do with AI, some of which were actually in flight. And we realized really quickly that actually, yeah, we can’t have 800 in-flight AI use cases. We need to come down to a small set, focus on those things that are important to us. And so we went through that process of looking at what people were suggesting and starting to cluster these use cases together to see where the biggest bang for the buck would be, what was most technically feasible, where the data was.

So as we said before, data readiness for use cases is really important. If this use case relies on data that we know is really bad at the moment, it’s going to take a long time to clean up. Maybe that’s a lower priority than say, other use cases that we know the data’s pretty good and it’s very fast in terms of getting it to the model. So what we try and do with our customers is help them understand that you do have to go through that process. It doesn’t have to be a long process though. I think the surprising thing for a lot of people is they think that’s like a six month strategy exercise and it’s not. We can do something very quickly in a couple of weeks to come in and help you disseminate what those high priority use cases are in the same way that we have, and then focus on the data sources that are relevant to those use cases and then start to incrementally implement them.

And again, we are not saying go in and build a massive data as a service data product data mesh out of the gate. We’re saying go and get those good use cases to start with, go and implement those. You can do them tactically in terms of data. There can be some manual steps in there as well, as long as you’re safe, start to see that getting some ROI. And then once you’ve got a few use cases under your belt and you’re starting to get into some scale, that’s when you start putting the automation in. That’s when you start putting the Dell Data Lakehouse in. That’s when you start to look for scale, and you are starting to look at things like data as a service because you’re going to need that with these more use cases coming down the line. So I guess the big surprise is, don’t buy everything that we sell out of the gate. Just do things to start with that are small, incrementally add to those, make sure you’re getting ROI, and then when you get to a point where you really need to scale, then we can help you with our own experience.

David Nicholson: No, it makes a lot of sense, and I think it’s always interesting talking to folks who are involved in the real services side of the tech business because you have a lot of effort that gets put into productizing things, but at a certain point, every single one of your engagements is bespoke. I guess the best you can hope is sort of 80 to 90%, yeah, we’ve done this before. Don’t worry, we’ve got you. But there’s always going to be that leading edge where you’re working in collaboration with a client. Is that a fair statement? And are you sometimes jealous of your product friends who have three SKUs and that’s it?

Beth Williams: So I’ve been in consulting for a very long time, so if I didn’t like variance, I’m in the wrong job, quite frankly. So I think that’s what makes it fun. But you’re right, I think the pretrial is a fairly good one. It is pretty much an 80-20. As things start to evolve, we can start to see patterns emerging. For example, we know what the kind of top solutions are that people are going after. We know that things like RAG are really important. So we can boilerplate a lot of this stuff to a point where we can take it to the environment, take it to the customer, get it deployed, and then after that, that extra configuration layer that’s just integration and configuration with the customer and what the customer wants to do. And that is normally about 20%. So yeah, that’s the bit.

Every customer’s different. Everybody wants something different. Everybody’s got a slightly different goal, but we try and take a bit of weight off. Not everything’s a snowflake. We’ve got good patterns that we use. We’ve got our validated designs that we follow. That gives a really good start, and we know those things work. So it’s not like the old days where we used to go in and say, well, what do you want? Now we go in and say, look, this is how we would do it. How much would you like of this? And where would you like the variance? So it’s a lot better than it used to be.

David Nicholson: Well, AI is certainly filled with excitement and drama, but to the extent that Beth Williams and her teams can go in and eliminate that drama, go get your adrenaline somewhere else if you’re an adrenaline junkie. What you don’t want is to be terrified about what the next day brings in your AI deployment. Beth Williams from Dell Technologies, thanks so much for joining us here on Six Five On The Road’s continuing coverage of SuperComputing 2024.

Beth Williams: Thanks very much.

Author Information

David Nicholson is Chief Research Officer at The Futurum Group, a host and contributor for Six Five Media, and an Instructor and Success Coach at Wharton’s CTO and Digital Transformation academies, out of the University of Pennsylvania’s Wharton School of Business’s Arresty Institute for Executive Education.

David interprets the world of Information Technology from the perspective of a Chief Technology Officer mindset, answering the question, “How is the latest technology best leveraged in service of an organization’s mission?” This is the subject of much of his advisory work with clients, as well as his academic focus.

Prior to joining The Futurum Group, David held technical leadership positions at EMC, Oracle, and Dell. He is also the founder of DNA Consulting, providing actionable insights to a wide variety of clients seeking to better understand the intersection of technology and business.

SHARE:

Latest Insights:

Novin Kaihani from Intel joins Six Five hosts to discuss the transformative impact of Intel vPro on IT strategies, backed by real-world examples and comprehensive research from Forrester Consulting.
Messaging Growth and Cost Discipline Drive Twilio’s Q4 FY 2024 Profitability Gains
Keith Kirkpatrick highlights Twilio’s Q4 FY 2024 performance driven by messaging growth, AI innovation, and strong profitability gains.
Strong Demand From Webscale and Enterprise Segments Positions Cisco for Continued AI-Driven Growth
Ron Westfall, Research Director at The Futurum Group, shares insights on Cisco’s Q2 FY 2025 results, focusing on AI infrastructure growth, Splunk’s impact on security, and innovations like AI PODs and HyperFabric driving future opportunities.
Major Partnership Sees Databricks Offered as a First-Party Data Service; Aims to Modernize SAP Data Access and Accelerate AI Adoption Through Business Data Cloud
Nick Patience, AI Practice Lead at The Futurum Group, examines the strategic partnership between SAP and Databricks that combines SAP's enterprise data assets with Databricks' data platform capabilities through SAP Business Data Cloud, marking a significant shift in enterprise data accessibility and AI innovation.

Thank you, we received your request, a member of our team will be in contact with you.