The News: Vast Data announced partnerships with both NVIDIA and Supermicro to deliver the VAST Data platform for AI pipelines and hyperscale storage.

The VAST Data Platform Delivers for AI Pipelines at AI Field Day

Analyst Take: VAST Data continued to tell the story of its data platform at AI Field Day, highlighting how the VAST Data Platform enables AI pipelines. Central to the story is cost-effective flash storage and high-performance, non-volatile memory, which provide a single location to store all enterprise data. VAST Data integrates data ingest tools such as Apache Spark and event-driven computing with Kafka and containers. Together, these allow the VAST platform to bring data in and transform (ETL) the data into a suitable format for AI training. VAST optimizes data flow by completing the ETL functions in-place rather than copying data to a separate ETL tool. AI training is usually an iterative process, with multiple training runs required to build and identify a useful model, with each model check-pointed to storage along the way. The VAST DataBase is another feature of the platform, a SQL database that could store a catalogue of training data and the progress of the model development.

The architecture of VAST Data has always separated the data persistence (SSD and NVM) from the stateless data access layer (NFS, SMB, Object, etc.), allowing separate scalability and optimized data flow. This segregation also allows the additional functionality, such as the DataBase, to be added without changing the persistence. Traditionally, the data access and persistence layers were implemented as x86 servers.

VAST Data announced that the data access software has been implemented on NVIDIA BlueField data processing units (DPUs). Previously referred to as smart NICs, these DPUs have a fast network interface and CPUs but are add-in cards installed in a server. By implementing data access on a DPU, VAST delivers a dedicated storage controller optimizing data flow inside a DPU-equipped server. For example, a GPU-equipped server training an AI model can directly access the VAST Data storage through its DPU for faster access to training data and saving of checkpoints.

VAST Data also partnered with Supermicro to provide a hyperscale architecture for the VAST Data Platform on Supermicro servers. Supermicro’s modular approach to hardware design allows an optimized solution for the VAST Data architecture. The design uses InfiniBand as the connectivity between the data access servers and the persistence server, minimizing latency and maximizing throughput. Data clients use standard multigigabit Ethernet to connect to the data access servers. At this stage, the Supermicro solution does not have the BlueField DPUs for AI pipelines but is intended more for massive-scale data centralization and public-cloud infrastructure.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

VAST Data Unveils New Data Center Architecture to Accelerate AI

VAST Data Announces New Partnership with Genesis Cloud

Demystifying AI, ML, and Machine Learning

Author Information

Alastair Cooke

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

The VAST Data Platform Delivers for AI Pipelines at AI Field Day