AWS ML Capacity Blocks

AWS ML Capacity Blocks

The Six Five team discusses AWS ML Capacity Blocks.

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.

Transcript:

Patrick Moorhead: Yeah, it’s interesting in all this world of generative AI and all this stuff, it’s like gosh-

Daniel Newman: Are you still ML?

Patrick Moorhead: Are we still doing ML? Yes, we absolutely are doing ML and quite frankly, for narrower data sets than generative AI, which can be up to 100 petabytes at this point just for the training model, it’s more efficient and less expensive. So I saw that, Dan. I saw that.

Daniel Newman: I bit it.

Patrick Moorhead: Okay. No, this is good. But one of the challenges for let’s say smaller businesses and even startups is, how do I come in there and reserve enough GPU to do what I need to do? It’s not just one GPU. When you’re doing machine learning training, let’s say you’re trying to train a vision model or something like that, you need hundreds of GPUs that are interconnected in a logical way that have a singular memory plane to be able to do all that work on. So what Amazon did is they brought out what’s called Capacity Blocks for ML for machine learning workloads. What that does is you can actually schedule like a hotel, which says, “Hey, on January 5th, I need this much capacity for my GPUs maybe for this long.”

What they do is they reserve that, and these are EC2 Ultra Clusters, which means that they’re of the highest performance optimized for ML. Then they’re connected through what’s called EFA, which is Elastic Fabric Adapter, because we all know that it’s not just about what you can do on that rack, but what you can do across multiple racks. EFA gives you what they’re calling petabit-scale, non-blocking network, which essentially is to get your workload done a lot quicker, again, to have that plainer memory. So that’s it, baby, MSL Capacity Blocks. Look at this space. I see no reason why they also wouldn’t do 1000-node systems for foundational models, but that’s just a prediction.

Daniel Newman: Read my lips, this will get bigger. Pat, why I think this is also important is, I don’t know about you, but I’m really getting tired of having to always say, “I know everyone thinks AWS is behind on generative AI.” You know what I’m saying? I feel like if I don’t start…

Patrick Moorhead: I know it came up in even our interviews yesterday.

Daniel Newman: It came up, and I’m going cross-eyed. It’s like, look, AWS has more workloads on more computes across more geographies than I think pretty much all the hyperscalers combined. Now, again, that’s funny math, so don’t hold me to it. But the reason I’m pointing that out is computes and data and workloads are the impetus of AI. So AWS had a different strategy, and it wasn’t based upon necessarily completely locking it up with a single large language model in a closed architecture, which some others did early on, and we were able to get good out-the-gate marketing and story narrative leads. But AWS has taken that we’re going to be the open source. We’re going to be open in our approach, and we’ll offer some FMs, foundational models. We’ll offer Titan. We’ll offer Bedrock. We’ll offer API connectors. You can run multiple models same time, all that kind of stuff.

So Pat, I just chalked this up to AWS being AWS, offering lots of services, looking at how did you take all of its capacity, add value, and make it easier for companies to do the things that are going to need to be done. As we know, there’s going to be more and more smaller models, middle-sized models, foundational models, and the large language models are increasingly commoditized. We’re seeing a world where hugging faces GitHub for AI, and that’s where this is going. There’s competition for that, but I think it’s a good use of resources.

Author Information

Daniel is the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise.

From the leading edge of AI to global technology policy, Daniel makes the connections between business, people and tech that are required for companies to benefit most from their technology investments. Daniel is a top 5 globally ranked industry analyst and his ideas are regularly cited or shared in television appearances by CNBC, Bloomberg, Wall Street Journal and hundreds of other sites around the world.

A 7x Best-Selling Author including his most recent book “Human/Machine.” Daniel is also a Forbes and MarketWatch (Dow Jones) contributor.

An MBA and Former Graduate Adjunct Faculty, Daniel is an Austin Texas transplant after 40 years in Chicago. His speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

SHARE:

Latest Insights:

Brad Shimmin, VP and Practice Lead at The Futurum Group, examines why investors behind NVIDIA and Meta are backing Hammerspace to remove AI data bottlenecks and improve performance at scale.
Looking Beyond the Dashboard: Tableau Bets Big on AI Grounded in Semantic Data to Define Its Next Chapter
Futurum analysts Brad Shimmin and Keith Kirkpatrick cover the latest developments from Tableau Conference, focused on the new AI and data-management enhancements to the visualization platform.
Colleen Kapase, VP at Google Cloud, joins Tiffani Bova to share insights on enhancing partner opportunities and harnessing AI for growth.
Ericsson Introduces Wireless-First Branch Architecture for Agile, Secure Connectivity to Support AI-Driven Enterprise Innovation
The Futurum Group’s Ron Westfall shares his insights on why Ericsson’s new wireless-first architecture and the E400 fulfill key emerging enterprise trends, such as 5G Advanced, IoT proliferation, and increased reliance on wireless-first implementations.

Book a Demo

Thank you, we received your request, a member of our team will be in contact with you.