The News: At Cloud Field Day, Juniper presented its products, technologies, and architecture for using 800GB Ethernet for AI training. The presentation covered 800GB Ethernet products, the Apstra intent-based networking platform, the Juniper AI innovation lab, and Juniper Validated Designs. Watch the presentations here.

Juniper 800GB Ethernet for AI Training

Analyst Take: One of the underlying themes of presentations at Cloud Field Day 20 was using 800GB Ethernet for high-performance, special-purpose networks, including Ethernet for AI training. Juniper Networks opened that discussion with their 64x800GB switch, which is already shipping. The switch is based on the Broadcom Tomahawk 5 ASIC, providing 51.2 TbpS total bandwidth. With this vast bandwidth, Ethernet will keep expensive GPUs fed with data. Juniper also had much to say about validated designs, which help clients avoid pitfalls and painful design lessons. Juniper Validated Designs (JVDs) are available on GitHub and leverage the Apstra platform for multi-vendor, intent-based design principles.

Ethernet for Generative AI Training

The network infrastructure is vital in large-scale AI training of Large Language Models (LLMs). Hundreds of GPUs must coordinate and collaborate across vast training and model datasets. The GPUs are expensive to buy and run, so keeping them supplied with data is vital for achieving cost-effective results. The Juniper design of Ethernet for AI training directly provides up to 800Gbps of lossless ethernet to each GPU. It delivers 100Gb and 200Gb ethernet to the CPUs in each server for data ingestion, model checkpoints, and overall orchestration. Juniper offers Ethernet as a standard for all workloads with the benefit of industry-wide development and skill sets. This is a counterpoint to InfiniBand, where a different technology (InfiniBand) is used for GPU networking compared to Ethernet for all other networking.

Juniper AI Labs

Implementing a new infrastructure to train or fine-tune a generative AI model is expensive, and the field is moving fast, possibly making that investment redundant. Companies want to be sure that the investment will provide a return to the business. One of the ways to reduce the risk and prove or disprove the returns is to rent or borrow some AI infrastructure for a proof of concept. The Juniper AI innovation lab is one AI infrastructure clients can use to dip a toe in the Generative AI waters quickly and with minimal risk using Ethernet of AI training. Unlike public-cloud AI services, corporate data can remain in a trusted private location for the PoC. This is vital if the implementation must remain on-premises due to data governance. The lab infrastructure has multiple H100 and A100 GPUs in servers and scale-out object storage, all connected with 200Gb and 400Gb ethernet. Million-dollar investments in on-premises Generative AI are easier to justify if you know what they will return.

Juniper Validated Designs

The Juniper AI Labs infrastructure implements the Juniper Validated Design for AI Data Center Network. This JVD is a set of Terraform templates to configure the three separate networks that make up the design. The templates live on GitHub and are publicly available for customers. The JVD also includes architecture documentation that describes the networks and their requirements. I like well-documented reference architectures, particularly where the design decisions are documented. Customers seldom deploy these reference architectures without making customizations for their specific needs. Knowing why the vendor put in elements enables better decisions around customization and avoids misconfigurations along the way, allowing customers to deploy Ethernet for AI training more easily.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

MFD 11: Juniper Choreographs AI-Native Networking Breakthroughs

Juniper AI-Native Networking – Making Every Connection Count

Juniper AI-Native Networking Platform: Ready to Transform AI

Author Information

Alastair Cooke

Alastair has made a twenty-year career out of helping people understand complex IT infrastructure and how to build solutions that fulfil business needs. Much of his career has included teaching official training courses for vendors, including HPE, VMware, and AWS. Alastair has written hundreds of analyst articles and papers exploring products and topics around on-premises infrastructure and virtualization and getting the most out of public cloud and hybrid infrastructure. Alastair has also been involved in community-driven, practitioner-led education through the vBrownBag podcast and the vBrownBag TechTalks.

Juniper 800GB Ethernet for AI Training