Search

Hammerspace Shows Storage Acceleration for AI Training at AI Field Day

Hammerspace Shows Storage Acceleration for AI Training at AI Field Day

Introduction

Training large language models (LLMs) is a big business requiring a lot of data to be fed to many servers. It is like conventional high-performance computing (HPC), where effort is spent tuning and matching the data transfer with the computing capabilities. The tuning ensures no bottlenecks lead to idle computing and, in turn, a longer time to complete training. Training a new LLM takes thousands of GPU-equipped servers weeks or months. Sam Altman stated that training GPT-4 cost over $100 million. It is worth making sure that money is well spent.

It was in this background that Hammerspace showed Hyperscale NAS accelerating underperforming scale-out NAS to provide storage acceleration for AI training. Hammerspace presented about storage acceleration for AI Training at AI Field Day 4. Hammerspace has some impressive NAS virtualization technologies, but their separation of metadata and data access impacted massive AI training projects for their customers. These customers found that their existing scale-out NAS solutions could not deliver the file access rate that their training required. It wasn’t that the storage was too slow, but that it took too long to find the correct file on the storage. Hammerspace Hyperscale NAS gathers the file metadata into a dedicated high-availability server cluster running their Anvil server and leverages standard NFSv4 features. The Hyperscale NAS does the metadata operation of identifying where the file exists, allowing the original NAS to serve the files directly to the compute servers. Hyperscale NAS does not require the existing NAS to operate as NFS v4; older V3 is just fine for accessing the NAS data. This Storage Acceleration for AI Training is conventionally achieved by having a proprietary client on the servers, which complicates deployment and may require moving data onto a newer NAS. Hammerspace is a significant contributor to the Linux NFS software and uses these contributed features to achieve faster file access without needing a custom client. All the features Hyperscale NAS requires are already in standard Linux distributions.

This Storage Acceleration of NFS3 file servers for AI Training is not the core function of Hammerspace; it is just a beneficial side effect for a specific use case. Hammerspace is far more widely applicable as a NAS virtualization solution, allowing unified, multi-protocol access to multiple NAS clusters across multiple physical locations, both on-premises and in the public cloud. These storage virtualization features require the Hyperscale NAS to be between the NAS clients and the existing NAS servers; the Data Services (DSX) component of Hyperscale NAS does this in a highly available, scale-out fashion. These capabilities were not the focus of the AI Field Day presentation but are undoubtedly valuable for complex enterprise NAS deployments.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

Hammerspace Adds AWS SVP and LLM Training Architecture

Hammerspace Unveils Hyperscale NAS Addressing the AI/HPC Workloads

Hammerspace Global Data Environment Product Review

Author Information

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

SHARE:

Latest Insights:

Kate Woolley, General Manager, Ecosystem at IBM, joins Daniel Newman and Patrick Moorhead on Six Five On The Road to share her insights on the growth of IBM's Partner Plus program and the strategic importance of partnerships in the AI landscape.
Dr. Darío Gil and Rob Thomas from IBM join Daniel Newman and Patrick Moorhead on the Six Five On The Road to share their insights on how IBM's AI initiatives are driving significant transformations and value for enterprises across the globe.
Tina Tarquinio, VP at IBM, joins Steven Dickens to share her insights on leveraging AI with the mainframe to enhance productivity and modernize applications, charting the course for future innovations in IT infrastructure.
New Catchpoint Capability Transforms Internet Performance Monitoring with Its Real-Time, Comprehensive Internet Stack Visualization
Paul Nashawaty, Practice Lead, and Sam Holschuh, Analyst, at The Futurum Group share their insight on how Catchpoint's Internet Stack Map affects IPM by enhancing real-time, comprehensive monitoring capabilities.