Search
Close this search box.

Hammerspace Shows Storage Acceleration for AI Training at AI Field Day

Hammerspace Shows Storage Acceleration for AI Training at AI Field Day

Introduction

Training large language models (LLMs) is a big business requiring a lot of data to be fed to many servers. It is like conventional high-performance computing (HPC), where effort is spent tuning and matching the data transfer with the computing capabilities. The tuning ensures no bottlenecks lead to idle computing and, in turn, a longer time to complete training. Training a new LLM takes thousands of GPU-equipped servers weeks or months. Sam Altman stated that training GPT-4 cost over $100 million. It is worth making sure that money is well spent.

It was in this background that Hammerspace showed Hyperscale NAS accelerating underperforming scale-out NAS to provide storage acceleration for AI training. Hammerspace presented about storage acceleration for AI Training at AI Field Day 4. Hammerspace has some impressive NAS virtualization technologies, but their separation of metadata and data access impacted massive AI training projects for their customers. These customers found that their existing scale-out NAS solutions could not deliver the file access rate that their training required. It wasn’t that the storage was too slow, but that it took too long to find the correct file on the storage. Hammerspace Hyperscale NAS gathers the file metadata into a dedicated high-availability server cluster running their Anvil server and leverages standard NFSv4 features. The Hyperscale NAS does the metadata operation of identifying where the file exists, allowing the original NAS to serve the files directly to the compute servers. Hyperscale NAS does not require the existing NAS to operate as NFS v4; older V3 is just fine for accessing the NAS data. This Storage Acceleration for AI Training is conventionally achieved by having a proprietary client on the servers, which complicates deployment and may require moving data onto a newer NAS. Hammerspace is a significant contributor to the Linux NFS software and uses these contributed features to achieve faster file access without needing a custom client. All the features Hyperscale NAS requires are already in standard Linux distributions.

This Storage Acceleration of NFS3 file servers for AI Training is not the core function of Hammerspace; it is just a beneficial side effect for a specific use case. Hammerspace is far more widely applicable as a NAS virtualization solution, allowing unified, multi-protocol access to multiple NAS clusters across multiple physical locations, both on-premises and in the public cloud. These storage virtualization features require the Hyperscale NAS to be between the NAS clients and the existing NAS servers; the Data Services (DSX) component of Hyperscale NAS does this in a highly available, scale-out fashion. These capabilities were not the focus of the AI Field Day presentation but are undoubtedly valuable for complex enterprise NAS deployments.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

Hammerspace Adds AWS SVP and LLM Training Architecture

Hammerspace Unveils Hyperscale NAS Addressing the AI/HPC Workloads

Hammerspace Global Data Environment Product Review

Author Information

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

SHARE:

Latest Insights:

Veeam Makes a Strategic Move to Enhance Positioning in Next-Generation, AI-Driven Cyber Resilience
Krista Case, Research Director at The Futurum Group, covers Veeam’s acquisition of Alcion and its appointment of Niraj Tolia as CTO. The move will strengthen its AI cyber resilience capabilities.
Google’s New Vault Offering Enhances Its Cloud Backup Services, Addressing Compliance, Scalability, and Disaster Recovery
Krista Case, Research Director at The Futurum Group, offers insights on Google Cloud’s new vault offering and how this strategic move enhances data protection, compliance, and cyber recovery, positioning Google against competitors such as AWS and Azure.
Capabilities Focus on Helping Customers Execute Tasks and Surface Timely Insights
Keith Kirkpatrick, Research Director with The Futurum Group, shares his insights on Oracle’s Fusion Applications innovations announced at CloudWorld, and discusses the company’s key challenges.
OCI Zero Trust Packet Routing Zeros in on Enabling Organizations to Minimize Data Breaches by Decoupling Network Configuration from Network Security
Futurum’s Ron Westfall examines why newly proposed OCI ZPR technology can usher in a new era of network security across multi-cloud environments by decoupling security policies from the complexities of network configurations and simplifying security policy management.