On this episode of Infrastructure Matters, host Keith Townsend is joined by HPE‘s Bharath Ramesh, Global Head of AI Product for a conversation on exploring the complex landscape of artificial intelligence (AI) in the enterprise domain, with a focus on the challenges organizations face and the emergence of hybrid AI solutions as a viable approach.
Their discussion covers:
- The current state of AI adoption in enterprises and the primary obstacles they encounter
- The evolution and importance of hybrid AI solutions in addressing enterprise needs
- Insights into how organizations can effectively navigate the integration of AI technologies
- Strategies for enterprises to maximize the value of AI investments
- Future trends in AI development and how they might influence enterprise strategies
Learn more at HPE, and download our report: Navigating Challenges in Scaling AI Workloads with Hybrid Cloud Solutions here.
Watch the video below, and be sure to subscribe to our YouTube channel, so you never miss an episode.
Or listen to the audio here:
Or grab the audio on your favorite audio platform below:
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this webcast. The author does not hold any equity positions with any company mentioned in this webcast.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Transcript:
Keith Townsend: All right. Welcome to this surprise episode of Infrastructure Matters. I’m your host, Keith Townsend. Steve and Camberley gave me the keys to their podcast. They may live to regret this, absolutely. But we have a special sponsored podcast with Bharath Ramesh, director of Product Management for AI Solutions at HPE. Bharath, welcome to the podcast.
Bharath Ramesh: Thanks, Keith. Excited to be here.
Keith Townsend: You know what, we’re going to kick this off pretty aggressively. I’m known for answering aggressive questions, even on sponsored podcasts. You know what, let’s start off a little friendly. What are some of the common problems you’re seeing in the enterprise as they face the challenges of scaling AI workloads? I think one of the challenges I’ve seen is that customers kind of don’t know where to start the journey. So, talk to me about how that opinionated option that HPE is well known for in its engineered solutions. How are customers reacting to that, or what’s the opportunity?
Bharath Ramesh: Yeah. That’s a really good question. And when I look at the landscape of customers we typically encounter, I broadly segment them into two types of customers. They’re the AI producers. Think of these as the large companies whose mission in life is to build the next big compelling AI model. And then the AI consumers, which actually is the vast majority of the other companies who just want to take AI technologies that are off the shelf, and impute them into their business, and deliver some kind of a business benefit. So we broadly see customers fall into those two buckets. Obviously there are those who do both. But I’m going to focus the answer on the needs of the consumers.
And for the reason is, the producers tend to be fairly AI savvy to begin with. So what they’re looking for is, not a turnkey packaged experience. They’re looking for the best in breed tools that they will stitch together to precisely meet their needs. But then the consumers being users of AI, really crave something that is packaged and turnkey, and very easy to deploy. Because ultimately their goal is to show how AI benefits their business. And what we see with these consumers of AI is typically their AI journey begins with a top-down edict. Their CIO or CTO has read about the great merits of AI, think it applies to to improving some top-line characteristics of the business, maybe creating a new business model.
Or, it just helps drive better efficiency to reduce your cost of headcount or cost of operations. So they’re given this top-down edict, and then the CIO says, “Go and implement AI.” Frequently, IT is given this charter. Some companies that have a more vision in AI tend to have MLOps teams already in place. And they’re given the charter to say, “Go and execute.’ And these teams, their criteria to succeed or fail is, “How quickly can I show AI delivering the benefits that I’m promising my business?” They’re not interested in building those AI models. They just want to pick it up and use it as quickly and as efficiently as possible.
Keith Townsend: So, I get that. As we’re recording this, Llama 3 was just announced. And we’re not targeting, at least this podcast episode isn’t targeting the folks building Llama 3s of the world. But rather folks consuming the Llama 3s of the world. Why not just go all in and public cloud? The CTO advisor is known for advertising hybrid infrastructure as an advantage. Talk to me about the advantages of not just hardware implementations, but hardware, software, and infrastructure implementations that deploy it across hybrid infrastructures. Data matters in this world of AI.
Bharath Ramesh: Yeah. That’s another really good question. And again, I go back to what is the user journey that we see customers run through? The problem statement here is, you said Llama 3. It’s a large language model, and they’re seeking to achieve a certain outcome by using that LLM. It could be, “I want to build a new customer service chatbot. I want to add this model as part of my enterprise search engine.” So they start with that outcome, and then they’re looking through, “What are the solution choice points do I have to achieve that outcome quicker and cheaper than if I tried to build this on my own?”
And many times actually customers start with saying, “Can I just use a commercial API provider who’s hosting these LLMs in their network? And all I do is I build an application that talks to that API and I’m done.” They start there. They very quickly realized that that deployment scenario is constrained, because you don’t control the model. You don’t control when the model revision changes. You don’t necessarily want to put your private data into the model, because that has a way of leaking into the model training over time. So there are many constraints they run into when they start with the API providers. So the next logical step is, “Can I run this in the public cloud?”
“Can I spin up an instance in one of the public clouds and create the tooling, Use the tooling that the cloud vendors provide to host these models and run these models.” And that’s fine if you’re still kind of in the POC stage and you’re tinkering. But once your deployment starts to grow to what we consider enterprise scale, which means, multitude of these models running and operating with very tight latency and bandwidth constraints, the cost of operation in the public cloud becomes very, very high. And then you still haven’t solved completely the privacy question, because you’re still sending not just your data that you may have used to fine tune the model or build a rag application for that model, you’re actually now injecting your client data and responses into and out the public cloud.
And so, customers are like, “Okay, then what’s my third alternative?” And that’s when they start thinking, “Maybe the AI deployment is a hybrid world, where workloads that I consider mission critical, performance critical, data critical, I want to host those in my network.” Kind of use the same tooling and have the same user experience. But have the flexibility to burst those non-critical workloads back into the public cloud. And that’s the hybrid cloud strategy and offering that HPE offers, which differentiates us in the market.
Keith Townsend: So, this sounds like a data governance problem. What are some of the first steps that customers should take when they’re developing a strategy around their data for AI?
Bharath Ramesh: Yeah. Frankly, AI would not exist without access to massive amounts of data. And when I look at what are the pivotal points in the development of AI, it was really, particularly with generative AI which is a rage these days, right? It’s really three things. First is development of new foundational model architectures, the revolver of transformers, which Google pioneered many years ago. That was the one key pivotal point. The other one was actually availability of compute, because you need lots of GPUs in a power efficient way to actually train these models. The third pivotal point was actually access to data.
So if you look at Llama, which you just highlighted, it was trained on 15 trillion tokens of publicly available data, seven times more than the previous generation Llama 2. That’s a lot of data. So even the model builders are really struggling with, “How do I collect this massive publicly available data? And then, how do I then make sure that data is cleansed and scrubbed of things like bias, and toxic statements and things like that?” Because the model’s going to get trained on that if that’s part of the data set. So even the model producers are very cognizant that data is sort of the center of gravity of AI. I’d argue that as you start deploying the models as consumers, that data management and governance becomes even more critical.
Because now it’s not a static dataset that you’re building the models off of. It’s actually client interactions you’re dealing with. People might be providing personal data, for example, if it’s say, a financial services chatbot, these are regulated industries. All those kinds of interactions, the data needs to be in compliance with the right data protection laws. And that needs to be engineered into the way you do your deployment from the ground up. It’s not something you add in after the fact. And then if you’re capturing those client interactions, and you using it to further iterate on the model, how do you make sure that things like personally identified information is scrubbed out of that?
Because you don’t want the next iteration of the model to then leak somebody’s private data because it was accidentally trained on it. So data management has always been very critical for the AI producers. It is starting to get more and more critical for the AI consumers, because they’re dealing with way more data, I would argue, than the producers ever having to deal with. Because a lot of these are the live interactions, and the sensitivity to the data is a lot more, because it’s like live data from your clients.
Keith Townsend: Yeah. So, one of my theories is, the majority of AI is going to be done at the enterprise level. When we’re talking about whether it’s retraining existing models, it can be anywhere from the 8 billion parameter model to one of these trillion model size parameter models. Or, if we’re taking RAG and taking the existing models and augmenting it with that, these come in different shapes and sizes. The data, the amount of tokens that we’re going to need to process. And to give the audience a point of reference, a token is roughly about three quarters of your average word. So when you’re talking about how fast a human reads, a human probably reads at about eight tokens a second, give or take a piece.
So, scaling the environment is going to come in all different shapes and sizes. Whether you’re talking about batch processing, if you’re talking about chatbots, the needs are going to vary significantly. So, we’ve talked about the opinionated approach that HPE takes to helping customers get there. We’ve talked about the data strategy. Now, what we can’t avoid is, the 800 pound gorilla in the room. That’s NVIDIA. Today NVIDIA has a moat around AI, AI model, AI processing with CUDA, et cetera. GPUs right now are the king of the hill. Talk to me about your collaboration with NVIDIA. How are you helping customers take this abstract notion of what they think they need, package an opinionated stack HPE stack around that, including NVIDIA, GPUs? Talk to me about that relationship.
Bharath Ramesh: Yeah. So, to answer that question, I think let’s take a trip down memory lane. So, the genesis of modern AI and neural network architectures is really, it dates back to 2012, which was the Google Brain Project. That was the first real attempt to use AI to, in that case, recognize images of cats on YouTube. And when I look at what is it, what have HPE and NVIDIA been partnering on? It actually predates that. We’ve been selling NVIDIA GPUs in our servers for years. Increasingly, we have upleveled that and said those same configurations are very valid for high-performance computing, for these demanding AI use cases. So I’d argue that we’ve actually been partners in crime here serving these customer needs for a lot longer than AI has been around.
And then when I look at how synergistic is our portfolio, I’m actually very excited. Because NVIDIA, while they’re very well known for GPUs and network chips and switches through the acquisition of Mellanox, actually has a really good software portfolio as well. So they obviously have the SDKs in the compilers that target their architecture and help you optimize models. But they’re increasing providing things like you heard about NeMo Inference Microservices at GTC. That’s a great example where you’re providing the right bite-sized pieces that you need to construct enterprise class inference stack.
And then when I map that to what HPE has been doing, and you look at the acquisitions we have made over the years, we have a portfolio that spans from mainstream servers all the way up to supercomputers. We bought SGI and Cray not too long ago. We’ve also bolstered our software site that addresses AI and analytics. So we bought BlueData, MapR, those products under the Ezmeral brand, right now, Unified Analytics and Data Fabric namely. But we also bought a number of companies focusing on AI model creation and data management for the purposes of AI.
A company called Determined AI, call it the machine learning development environment. And a company called Pachyderm, we call the machine learning data management. And we also have an inference product coming out called the Machine Learning Inference Software out in the summer. So NVIDIA has their hardware and software story. We have a very compelling and complimentary hardware and software story. And the challenge we have given the teams over the last year is, how do we showcase that the whole is greater than the sum of the parts?
And we demonstrated some of this integration at GTC, for example. So we show our inference software actually working together with the NeMo Inference Microservices, and providing an aggregate capability that is greater than any one company could have achieved on its own. We’ve done the same thing with our data management product, like we integrate MLDM with Rapids, and we said, that gives you a 200 x speed up on your data pre-processing compared to if you just did one or the other. So this matters to the customer. Because ultimately why are customers buying the turnkey solutions? They’re buying turnkey solutions because they want the easy button.
They want to know that we have squeezed out every little bit of performance, and cost efficiency and risk out of that solution. So the consumers of AI don’t need to pretend to be producers of AI and upscale their teams. And through this partnership with NVIDIA, I think we have achieved that. A great example of this is the enterprise computing solution for generative AI that we just showcased at GTC. It’s a scalar, inferencing and fine tuning solution built and co-engineered by the two companies. And we believe it is amazing, and one of its kind.
Keith Townsend: So, we’ve talked again about data, which I think is the most important topic, because infrastructure matters. This is the Infrastructure Matters Podcast. We’ve talked software integration. We’ve talked kind of platform. But these things have to do something from a business value perspective. So, it’s the job of IT to return value back to the business. And part of returning value is to reduce risk, and to do that in a ethical way. So, HPE is considering building platforms, opinionated platforms.
You can’t get out of the business of avoiding talking about AI ethics risk associated with AI and your frameworks. So, talk to me about how you’re helping customers basically stay out of the news, if you’re going to just be direct. I mean, some of the things that we’re seeing around AI models getting released is, the news is just waiting. Back in my day, newspapers are waiting for the headline level mistakes. What is HPE doing to help customers avoid these missteps?
Bharath Ramesh: Yeah, that’s the million dollar question. Because we and our customers tend to focus a lot on the technical challenges of deploying AI. But safety of AI is arguably the biggest, could potentially be the biggest inhibitor to deploying AI in a lot of industries. And interestingly, more than two years ago, our labs organization actually pioneered this concept called Trustworthy AI. I’m not sure if you’ve read about it. But it kind of covered a few tenants, which we consider are important to any AI deployment. Things like, is it private? Is it focused on protecting human rights? Is it inclusive? Is it responsible? Is it robust?
Tenants that we considered foundational to put AI into production and expose it to the world. And interestingly, just last week, MLCommons, if you’re familiar with the consortium, it’s an industry consortium. They’re very well known for benchmarks like MLPerf, inference, training, storage. They actually announced the new AI safety benchmark proof of concept. And if you haven’t read it, go look up the website. It’s very interesting because what they’re doing is, they have come up with a series of tests across a number of categories they consider hazardous categories. Things like hate, exploitation, violence, unrestricted use of weapons, et cetera. Categories which, if you’re training or deploying an LLM, you really don’t want it to be providing those kinds of responses.
And what they’re doing is, through this new AI safety benchmark, they’re able to test these candidate models against the hazard categories, and assign them risk scores. So as a model creator, first you know how your model’s performing and you can go back and work on your data or your training paradigms. But also as a deployer of AI, you know whether that’s a safe model to deploy, mapped to your business risk threshold. So, we’re seeing industry consortiums taking concepts that we’ve thought about for many years, and actually creating tools and techniques that we can offer as part of our AI solutions for customers to make sure AI is safe and ready to deploy.
There are also other frameworks. There’s this whole notion of, how do you make sure your model’s not biased? How is it providing fair predictions? Particularly if it’s something impacting things like insurance pricing for people or whatever. And there are frameworks out there’s, there’s like AI Fairness 360 is one that comes to mind. A suite of tests that are for the producers and deployers of AI saying, how do you make sure that you scrub your data, your training process? And when you’re in deployment, how do you make sure that the outcomes are fair? And we’re seeing that starting to take off as well. And when you look at coming back to Llama 3, how are those models tuned for human interactions?
They use human in the loop. So I don’t think we’re avoiding humans completely. This is not a self-supervised process in its entirety. There’s a big portion of it that is self-supervised. But at some point you have things like, techniques like recurrent human-led feedback. Where you have humans actually prompting and assessing the quality of the responses from the model, as a final check to make sure this model is actually ready and fit for use. And all of these are techniques that, our goal is to offer as part of our end-to-end portfolio and AI stack, so customers don’t run into these pitfalls and end up in the news.
Keith Townsend: So, let’s talk about the C-suite, let’s uplevel the conversation. One of the difficult parts of infrastructure, and the reason why this podcast exists, Infrastructure Matters, and we are here to educate not just infrastructure leaders, but their C-suites that they serve. Talk to me about how the C-suite should recognize and understand the journey to AI, the value they’re trying to extract from it, and how they can better enable their infrastructure leaders to achieve the outcomes that they desire.
Bharath Ramesh: Yeah. I think the first thing we should acknowledge is AI is a tool. It’s a very, very powerful tool, but it is a tool nonetheless. And what you’re trying to achieve is a business outcome. So, it’s important that as a customer, you’re really clear about what is it that AI is going to do for my business? And have that in a very quantified and clear way, so your project knows what it’s seeking to achieve. While things like generative AI are very interesting and snazzy, maybe your problem can be solved by kind of the more prosaic mundane AI. So, it’s important to understand, what is the business outcome you’re seeking to achieve? Have very clear success criteria. Have a team that’s adequately funded and resourced to go work on that project.
So that’s step number one, which we’d recommend the CIO to go through. Step number two is, look to what is the state of the art that’s already out there? There’s always the temptation with AI, because so much of AI is open-sourced. Not just the models, but the frameworks and the tools, that there’s a temptation of doing it yourself. And we’ve seen a lot of customers go down that path, and very quickly realize there are many rough edges when you try to integrate a complex suite, such as a suite needed to build and apply generative AI models. There are hundreds of different options at each layer.
And trying to identify the precise option that interoperates with the other one, and making sure that that entire stack continues to work and doesn’t fall apart when you change something, is very, very difficult to do. And we ourselves, as leaders in the industry, sometimes struggle trying to get the right recipe identified in a reliable and stable way. So, our guidance is like, once you have identified what your project is, and you have the criteria and the team in place, talk to companies like us. Understand what are those prepackaged solutions that we’ve already ironed the risk out of. And then come to us and say, “What is your scale challenge?”
Are you trying to apply this in a scale up fashion where you aspire to become a deployer of bigger and bigger models? Are you trying to deploy this in a more scale out fashion? What are the enterprise challenges you have around things like security, logging, audit ability, all the things you don’t get through just the AI model? Come and talk to us. Help us build you that big picture on how AI addresses those business goals, and provide you the equipment and the tooling around the AI model to help you get to those goals. So it’s important that we think through this project end to end, and we’ve seen successful customers do just that.
Keith Townsend: So Bharath, talk to me about what you’re excited for, and some practical next steps that enterprises should consider as they get ready for these future facing AI innovations.
Bharath Ramesh: There’s no shortage of innovation in AI. Allowed to say that the community has taken these open source models and done miracles with it. Optimized it and tweaked it in ways that even the creators of these models have never thought about. The one shift I am seeing is, well, traditionally AI has been a problem of data management and model building. Very quickly it’s becoming a problem of inference. Because inference side argues when the value of AI is really apparent to the business, because that’s when it starts to see the light of day, and it’s interacting with your clients, and benefiting your top line or bottom line.
And we’re really seeing with the advent of Gen AI, the interest in inference has gone up tremendously. But the big challenge is, how do you achieve that inference in a cost-effective way with all the enterprise-ready things I just said around security audit ability, et cetera? So you got to think with the full-stack mindset. You got to think all the way from, which accelerator do I pick? And then videos, I would say the de facto leader there, they have everything from Jetsons to the gigantic Blackwell architecture that they announced at GTC. But we’re going to see more heterogeneity in the inference space than we ever saw in the training space.
Because the span of use cases for inference is vastly larger. Think anything from embedded devices, consumer devices, vehicles, to big inference clusters sitting in a data center somewhere. And that heterogeneity and the need to hit different performance and price points and capabilities, is going to create innovation. And you’re going to see a lot more inference accelerators come out there than we saw with the training accelerators. And in fact, we are already seeing other vendors come up with some pretty intriguing accelerators. Maybe not as same level of maturity as we’ve seen from NVIDIA, but pretty interesting nonetheless.
We’re also seeing heterogeneity in the CPUs. X86 plus GPUs remains the gold standard. But you’re starting to see ARM-based systems, because there’s a really good memory bandwidth and power efficiency story that you get when you go to ARM. And NVIDIA with their gray software architecture is a great example of a non-X86 version of an AI stack. And then the other one is just location of AI compute. We tend to think of AI computers, these are big rack scale systems sitting in a data center. But really what customers want is to locate compute close to where the data resides. And that’s where a hybrid cloud strategy resonates.
Because our hybrid cloud strategy says, “We’ll allow you to run this stack with the same customer experience no matter where the workload is located.” Whether it’s sitting in the cloud, whether it’s sitting on your premise, whether it’s sitting in the edge. Our value is that we provide a consistent user experience. So you locate the compute where it makes sense for your business and where the data is located. Don’t worry about whether this can run here or cannot run there. So increasingly, we’re going to see these AI workloads, particularly inference workloads, be way more hybrid in nature than training workloads. That tended to be more data center centric.
Moving up the stack, the other thing I’m seeing is, there’s been a Cambrian explosion of software tools to do every part of the AI lifecycle. I have this chart, which I should show you sometime that has I think, 500 different choice points on that one slide. And it’s across all the spectrum of things you need to do to actually productionize AI in your enterprise. And that’s the Cambrian explosion has happened over the last few years, and we’re starting to see that window down. Because ultimately what the industry wants to get to, is a few industry standard options and run times that can handle everything from small models to large models.
That’s the Holy Grail that every customer wants to get to. And that’s an area since we have a software stack, we’re actively driving the industry towards. So we’re part of an AI infrastructure alliance consortium, who’s aiming, again, to simplify and decomplexify the choice points customers have in software. What is all this culminating to? It’s ultimately towards building turnkey full stack solutions. Our mission in life is, I want to allow AI novices to easily adopt AI, the best in breed AI, and impute that into their business without having to become AI experts. That’s our mission in life.
Although there is all this innovation and an evolution that’s happening in the layers of the stack, because we’re packaging it into a turnkey solution, we’re helping you pick the best choice points, so you don’t have to think through all of these, But we can still get you to a better state than if you try to do it this way.
Keith Townsend: So, Bharath, I really appreciate you spending the time to sit down with me and explain HPE’s approach to an opinionated AI platform stack across hybrid. I’m a big fan of hybrid, and the idea that there’s not a single solution or approach from where you put your workloads, and how to manage that data.
If you want to find out more, or discover better insights around the research that we’re doing, The Futurum Group, around AI, we have a outstanding platform, an intelligence platform where you can see some of these insights and self-service for yourself. You’re going to see a lot more content coming from The Futurum Group and HPE around the enterprise AI journey. Follow me on the web. You can do that at ctoadvisor on x.com. Talk to you next Infrastructure Matters podcast, if I can convince Steven and Camberley to give me the keys again. Thanks a lot.
Author Information
Keith Townsend is a technology management consultant with more than 20 years of related experience in designing, implementing, and managing data center technologies. His areas of expertise include virtualization, networking, and storage solutions for Fortune 500 organizations. He holds a BA in computing and an MS in information technology from DePaul University. He is the President of the CTO Advisor, part of The Futurum Group.