Analyst(s): Jonathan Fellows
Publication Date: November 7, 2024
YanRong Tech F9000X storage system achieves high performance scores in the MLPerf Storage v1.0 Benchmark, demonstrating YanRong’s ability to handle high data throughput to maximize GPU utilization for efficient AI model training and deployment.
What is Covered in this Article:
- Key performance metrics of YanRong’s F9000X storage system in MLPerf.
- The importance of high-performance storage in AI training.
- Details of MLPerf Storage Benchmark.
- Implications of YanRong’s success for the future of AI storage.
The News: YanRong Tech recently secured high rankings in the MLPerf Storage v1.0 Benchmark, establishing their F9000X all-flash storage system, as a performance leader for AI workloads. YanRong Tech’s F9000X combined with the YRCloudFile distributed file system, achieved leading performance across all three MLPerf Storage workloads including 3D-Unet, ResNet50, and CosmoFlow. By delivering high bandwidth and I/O rates, their solution performed well on large data sets such as 3D-Unet and CosmoFlow, along with the ResNet50 workload.
The MLPerf Storage benchmark, an industry-standard measure, evaluates a storage system’s ability to various AI workloads. YanRong’s results are significant, highlighting the F9000X’s capability to provide consistent data access for GPUs, reduce idle time, and improve operational efficiency.
YanRong Tech Achieves Strong MLPerf Storage Results for AI workloads
Analyst Take: YanRong’s MLPerf benchmark performance reflects the growing importance of specialized storage systems in AI. As large-scale AI model training and real-time data processing become more central to enterprise operations, traditional storage solutions may fail to meet these performance needs. YanRong’s F9000X offers advantages through its ability to linearly scale bandwidth and I/O rates, both essential for handling large datasets and processing demands of AI workloads.
YRCloudFile, YanRong’s distributed file system, also plays a crucial role by enabling data management across AI environments. YanRong’s F9000X achieved exceptional results in single-host and three-host configurations, securing the highest reported throughput and accelerators per compute node for the ResNet50 and CosmoFlow workloads. The F9000X was one of the best performers in those categories for the 3D-Unet workload as well.
YanRong Tech’s MLPerf Benchmark Results:
Single-Host Configuration Results:
- CosmoFlow: Supported 60 – H100 accelerators, delivering a bandwidth of 34 GB/s.
- ResNet50: Reached 188 – H100 accelerators, achieving a bandwidth of 37 GB/s.
- 3D-Unet: Attained 20 – H100 accelerators, reaching 58 GB/s.
Three-Host Configuration Results:
- CosmoFlow: Supported 120 – H100 accelerators with a bandwidth of 72 GB/s.
- ResNet50: Attained 540 – H100 accelerators, achieving 103 GB/s.
- 3D-Unet: Reached 60 – H100 accelerators, achieving a bandwidth of 169 GB/s.
These results demonstrate YanRong’s F9000X storage system’s capacity to support demanding data workloads at scale, ensuring that GPUs remain fully utilized and reducing bottlenecks that typically slow down model training, and is beneficial for industries that rely on high-speed, large-scale data processing to support real-time analysis, innovation, and precision.
Understanding the MLPerf Storage Benchmark
The MLPerf Storage v1.0 Benchmark is a performance standard to assess storage systems’ ability to handle AI training workloads. It provides a consistent, transparent method of comparing storage solutions by simulating the environments where AI models are typically trained. By focusing on real-world AI tasks, MLPerf enables vendors to assess how well their storage systems handle diverse AI applications.
MLPerf Storage v1.0 uses NVIDIA A100 and H100 accelerators to emulate computing resources and create a standardized testing environment. Three different AI models with varying data and access requirements are used:
- 3D-Unet: This model, commonly used in medical imaging, processes large 3D data files and requires high bandwidth to handle substantial data flows.
- ResNet50: Often applied in image recognition, ResNet50 requires fast data access to handle numerous, varied samples.
- CosmoFlow: A model used in scientific fields like astrophysics, CosmoFlow requires efficient data handling to accommodate different sample sizes.
The benchmark evaluates storage performance under conditions that reflect typical AI workloads, allowing end-users to understand a storage solution’s scalability and efficiency in high-demand scenarios. By excelling in all three models, YanRong’s F9000X demonstrates the ability to handle the complexities of AI-driven data workloads, positioning it as a versatile solution for businesses with varied AI applications.
Why High-Performance Storage is Essential in AI Model Training
In AI model training, particularly with large models like 3D-Unet and ResNet50, data must be fed to GPUs at high speeds to prevent idle time and maintain optimal performance. When storage systems cannot keep up with data demands, the GPUs – critical assets in AI infrastructure – sit idle, creating inefficiencies and raising costs.
Recent Futurum Group research indicates that data management now represents a more significant challenge than computing power when scaling AI. This is mainly due to limitations in traditional storage architectures, which needed to be designed to meet the high-throughput and low-latency requirements of modern AI applications.
Detailed Analysis of YanRong’s Performance Metrics
YanRong’s F9000X displayed superior bandwidth and scalability capabilities, two critical factors in AI storage. These achievements are worth examining in detail.
Bandwidth Efficiency and High Data Throughput – High throughput is essential in AI storage because it ensures that data can be processed and delivered to GPUs without delays. YanRong’s F9000X achieved high levels across all three MLPerf benchmark models, from CosmoFlow to 3D-Unet.
For example, in a three-host configuration running the 3D-Unet model, the system delivered 169 GB/s of throughput with 60 H100 accelerators. This performance guarantees that data is available to GPUs continuously, enabling uninterrupted training and reducing costs associated with delayed processing.
Linear Scalability for Distributed AI Environments – Linear scalability refers to a system’s ability to scale its performance directly to added resources, an essential attribute in distributed AI environments. YanRong’s F9000X demonstrated linear scalability in the MLPerf benchmark, particularly in multi-node configurations.
For instance, during training with the 3D-Unet model in a three-node setup, the F9000X processed over 1,100 samples per second with throughput exceeding 160 GB/s, almost 3X the performance of a single node which had 58 GB/s. This scaling ability ensures that as computational demands grow, storage performance remains consistent, eliminating data bottlenecks.
YanRong’s storage system allows organizations to expand their AI capabilities without re-engineering their data infrastructure by maintaining consistent, scalable performance. This feature is particularly advantageous in high-performance computing (HPC) environments, where incremental growth often accompanies increased compute requirements.
IOPS – Having high throughput is important for performance, but additionally the ability to support applications that require high I/O rates is an important metric for some AI workloads. While not explicitly measured by the MLPerf workloads, IOPS can be calculated by dividing the throughput / file size. The F9000X was able to provide high I/O rate for small file sizes utilized in the Resnet50 workload. Other competitor’s solutions are only able to provide high performance in either small file size workloads or big file workloads (3D-Unet), with the exception of one competitor. Additionally, the F9000X was able to perform well in all 3 benchmark tests, which is uncommon in the current published results.
Looking Forward – A New Standard in AI Storage Performance
With an excellent MLPerf benchmark ranking, YanRong Tech is positioned to appeal to high-demand sectors like healthcare, automotive, and research. Thanks to the scalability and high throughput of its F9000X storage system, this capability allows organizations to expand AI capacities without major infrastructure changes, strengthening YanRong’s role in AI-optimized storage.
Overall, as AI data demands continue to rise, YanRong’s success in the benchmark exemplifies how specialized storage solutions can address the growing complexity of AI workloads. This achievement provides organizations a path toward efficient, scalable data management, enabling faster, cost-effective AI deployments. YanRong’s technology enables companies to focus on AI innovation rather than infrastructure limitations by eliminating storage bottlenecks, supporting high GPU utilization.
What to Watch:
- Competitors are developing AI-optimized storage solutions, potentially challenging YanRong’s market position with innovations like NVMe over Fabrics (NVMe-oF) and hybrid cloud storage.
- Emerging technologies, such as high-bandwidth memory (HBM) and advanced GPU architectures, may reduce dependency on traditional storage upgrades, influencing enterprise preferences toward integrated solutions.
- As AI applications expand to edge environments, there’s a growing demand for decentralized storage with low latency. YanRong’s adaptability to support both centralized and edge use cases will be essential.
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other insights from The Futurum Group:
Agentic AI Frameworks at IBM TechXchange Conference – Six Five in the Booth
IBM Q3 FY 2024 Earnings Deliver Strong Software Growth
ServiceNow’s Q3 FY2024 Results Highlight AI-Driven Growth & Expansion
Author Information
Jon first joined Evaluator Group in 2018 as a summer intern and was then brought on full time as a Jr. Lab Associate. Now a part of the Validation and Benchmarking Lab group at The Futurum Group, he brings over 5 years of experience working with a variety of benchmarking tools on enterprise platforms. Jon has tested things from on prem storage to cloud compute and everything in between. Additionally, his skills include website development and data analysis, along with system administration.
As a fun fact, Jon loves weather and holds a BS in Meteorology from Metro State University in Denver.