The News: Google has released a series of storage enhancements targeted at supporting AI and machine learning (ML) workloads. The announcement came at Google Next ‘24 as part of a larger announcement around Google’s AI Hypercomputer. The storage updates include enhancements to Cloud Storage FUSE, Parallelstore, and Hyperdisk ML. More information about the AI Hypercomputer announcements can be found here.
Google Enhances Storage for AI
Analyst Take: As part of a series of AI Hypercomputer enhancements, Google announced multiple storage updates targeted at enhancing AI and ML. The new storage updates focus on maximizing GPU and TPU utilization to accelerate model training. The announcement includes the following product updates:
- Cloud Storage FUSE: Google announced new caching capabilities to Cloud Storage FUSE. Cloud Storage FUSE lets you mount and access Cloud Storage buckets as local file systems so that you can read and write objects using standard file system protocols. Caching will improve access time to the buckets, though customers may look to other file systems to provide the very high-speed needed for training. Google claims that the new caching functionality improves training by 2.9x and improves serving performance of Google foundational models by 2.2x.
- Parallelstore: Google added caching to its parallel file system, which is targeted for scratch storage use. Parallelstore is a file system based on DAOS, which is a key data store architecture written for NVMe technology. Originally designed for storage class memory (SCM, i.e. Optane), the caching will provide faster access. The Parallelstore and the caching capability is still in preview. Google claims it can provide up to 3.9x faster training times and up to 3.7x higher training throughput compared with native ML framework data loaders.
- Hyperdisk ML: Google is introducing a Hyperdisk ML block storage solution targeted at supporting AI inference workloads. This would be the fourth offering for Hyperdisk, thought the details on what is in the ML version are not available. Still, for this offering, Google claims Hyperdisk ML can provide up to 12x faster model load times and expects to trump the performance and throughput from with Azure UltraDisk SSD and Amazon EBS io2 BlockExpress.
The overall announcement from Google is optimizing for AI and ML across several layers of hardware and software. When considering the storage announcements specifically, the emphasis was placed on caching and keeping data close to compute resources to maximize training performance. Caching certainly is not a new concept, but it may hold extra significance when considering AI training. Training models is a time-consuming process that relies on expensive, compute-intensive resources. Keeping data near these resources and maximizing their utilization becomes a key priority when considering storage requirements for AI.
The bottleneck of the data to feed GPUs and train has always been an ongoing issue with traditional HPC/AI practitioners. There have been lots of gyrations, including processes, code adjustments, xPUs, etc., to address the IO problems. We expect the problem to ramp up as organizations take on more data and train for their specific use cases. Google is actively addressing these needs and looks to help customers further optimize their AI training.
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other Insights from The Futurum Group:
2023 Cloud Downtime Incident Report
Google Cloud Launches Axion and Enhances AI Hypercomputer
Public Cloud Storage Catered to AI Data in 2023
Author Information
Mitch comes to The Futurum Group through the acquisition of the Evaluator Group and is focused on the fast-paced and rapidly evolving areas of cloud computing and data storage. Mitch joined Evaluator Group in 2019 as a Research Associate covering numerous storage technologies and emerging IT trends.
With a passion for all things tech, Mitch brings deep technical knowledge and insight to The Futurum Group’s research by highlighting the latest in data center and information management solutions. Mitch’s coverage has spanned topics including primary and secondary storage, private and public clouds, networking fabrics, and more. With ever changing data technologies and rapidly emerging trends in today’s digital world, Mitch provides valuable insights into the IT landscape for enterprises, IT professionals, and technology enthusiasts alike.
Camberley brings over 25 years of executive experience leading sales and marketing teams at Fortune 500 firms. Before joining The Futurum Group, she led the Evaluator Group, an information technology analyst firm as Managing Director.
Her career has spanned all elements of sales and marketing including a 360-degree view of addressing challenges and delivering solutions was achieved from crossing the boundary of sales and channel engagement with large enterprise vendors and her own 100-person IT services firm.
Camberley has provided Global 250 startups with go-to-market strategies, creating a new market category “MAID” as Vice President of Marketing at COPAN and led a worldwide marketing team including channels as a VP at VERITAS. At GE Access, a $2B distribution company, she served as VP of a new division and succeeded in growing the company from $14 to $500 million and built a successful 100-person IT services firm. Camberley began her career at IBM in sales and management.
She holds a Bachelor of Science in International Business from California State University – Long Beach and executive certificates from Wellesley and Wharton School of Business.