Menu

AWS Serves Up NVIDIA GPUs for Short-Duration AI/ML Workloads

AWS Serves Up NVIDIA GPUs for Short-Duration AI/ML Workloads

The News: Amazon Web Services (AWS) launched Amazon Elastic Compute Cloud (EC2) Capacity Blocks for ML, a consumption model that lets customers reserve NVIDIA graphics processing units (GPUs) co-located in EC2 UltraClusters for short-duration machine learning (ML) workloads. You can read the press release on the AWS website.

AWS Serves Up NVIDIA GPUs for Short-Duration AI/ML Workloads

Analyst Take: NVIDIA has cemented its position as a leading GPU provider with its high-performance computing (HPC) and deep learning capabilities capturing significant market share, particularly among gamers, data scientists, and AI researchers. Hyperscale cloud providers are capitalizing on this demand by offering NVIDIA’s GPU-accelerated cloud instances, which cater to a wide array of workloads from complex AI modeling to graphics-intensive applications, thereby expanding access to these high-end computing resources without the upfront investment in physical hardware.

Against this backdrop, AWS has come up with a way to get around NVIDIA GPU demand issues while enabling customers to avoid making a long-term commitment to expensive GPUs to run short-term jobs. In his blog, Channy Yun, AWS principal developer advocate, compared this approach to making a hotel room reservation. The customer reserves a block of time starting and finishing on specific dates. Instead of picking a room type, the customer selects the number of instances required. When the start date arrives, the customer can access the reserved EC2 Capacity Block and launch P5 instances. At the end of the EC2 Capacity Block duration, any running instances are terminated.

The usage model provides GPU instances to train and deploy generative AI and ML models. EC2 Capacity Blocks are available for Amazon EC2 P5 instances powered by NVIDIA H100 Tensor Core GPUs in the AWS US East (Ohio) Region. The EC2 UltraClusters designed for high-performance ML workloads are interconnected with Elastic Fabric Adapter (EFA) networking for the best network performance available in EC2.

Capacity options include 1, 2, 4, 8, 16, 32, or 64 instances for up to 512 GPUs, and they can be reserved for between 1 and 14 days. EC2 Capacity Blocks can be purchased up to 8 weeks in advance. Keep in mind, EC2 Capacity Blocks cannot be modified or cancelled after purchase.

EC2 Capacity Block pricing depends on available supply and demand at the time of purchase (again, like a hotel). When a customer searches for Capacity Blocks, AWS will show the lowest-priced option to meet the specifications in the selected data range. The EC2 Capacity Block price is charged up front and will not change after purchase.

We see this usage model as a particularly good fit for organizations that need GPU for a single large language model (LLM) job and do not want to pay for long-term instances. This setup is especially valuable now with interest in generative AI peaking and GPU resources in great demand and priced at a premium.

Looking Ahead

Looking ahead, the GPU provisioning marketplace is poised for further innovation, with hyperscale cloud providers such as AWS leading the charge by offering flexible and cost-effective GPU access models akin to the EC2 Capacity Blocks. This approach not only circumvents the scarcity and high upfront costs of NVIDIA GPUs but also aligns with the growing enterprise demand for scalability and agility, especially as interest in generative AI peaks. AWS’s model, which facilitates short-term, high-intensity compute jobs without long-term commitment, is likely to become a blueprint for cloud services, offering a strategic advantage to organizations that engage in sporadic, resource-intensive tasks such as training LLMs.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

AWS Storage Day 2023: AWS Tackles AI/ML, Cyber-Resiliency in the Cloud

AWS Announces New Offerings to Accelerate Gen AI Innovation

Google Cloud Set to Launch NVIDIA-Powered A3 GPU Virtual Machines

Author Information

Steven engages with the world’s largest technology brands to explore new operating models and how they drive innovation and competitive edge.

Dave focuses on the rapidly evolving integrated infrastructure and cloud storage markets.

Related Insights
CIO Take Smartsheet's Intelligent Work Management as a Strategic Execution Platform
December 22, 2025

CIO Take: Smartsheet’s Intelligent Work Management as a Strategic Execution Platform

Dion Hinchcliffe analyzes Smartsheet’s Intelligent Work Management announcements from a CIO lens—what’s real about agentic AI for execution at scale, what’s risky, and what to validate before standardizing....
Will Zoho’s Embedded AI Enterprise Spend and Billing Solutions Drive Growth
December 22, 2025

Will Zoho’s Embedded AI Enterprise Spend and Billing Solutions Drive Growth?

Keith Kirkpatrick, Research Director with Futurum, shares his insights on Zoho’s latest finance-focused releases, Zoho Spend and Zoho Billing Enterprise Edition, further underscoring Zoho’s drive to illustrate its enterprise-focused capabilities....
NVIDIA Bolsters AI/HPC Ecosystem with Nemotron 3 Models and SchedMD Buy
December 16, 2025

NVIDIA Bolsters AI/HPC Ecosystem with Nemotron 3 Models and SchedMD Buy

Nick Patience, AI Platforms Practice Lead at Futurum, shares his insights on NVIDIA's release of its Nemotron 3 family of open-source models and the acquisition of SchedMD, the developer of...
Will a Digital Adoption Platform Become a Must-Have App in 2026?
December 15, 2025

Will a DAP Become the Must-Have Software App in 2026?

Keith Kirkpatrick, Research Director with Futurum, covers WalkMe’s 2025 Analyst Day, and discusses the company’s key pillars for driving success with enterprise software in an AI- and agentic-dominated world heading...
Broadcom Q4 FY 2025 Earnings AI And Software Drive Beat
December 15, 2025

Broadcom Q4 FY 2025 Earnings: AI And Software Drive Beat

Futurum Research analyzes Broadcom’s Q4 FY 2025 results, highlighting accelerating AI semiconductor momentum, Ethernet AI switching backlog, and VMware Cloud Foundation gains, alongside system-level deliveries....
Oracle Q2 FY 2026 Cloud Grows; Capex Rises for AI Buildout
December 12, 2025

Oracle Q2 FY 2026: Cloud Grows; Capex Rises for AI Buildout

Futurum Research analyzes Oracle’s Q2 FY 2026 earnings, highlighting cloud infrastructure momentum, record RPO, rising AI-focused capex, and multicloud database traction driving workload growth across OCI and partner clouds....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.