OpenAI, in partnership with AMD, Broadcom, Intel, Microsoft, and NVIDIA, has launched the Multipath Reliable Connection (MRC) protocol to address supercomputer networking bottlenecks in large-scale AI training. MRC’s efficiency and reliability could set a new standard for AI infrastructure, but scaling these gains across the industry will test both operational models and supply chains.
What is Covered in this Article
- OpenAI’s MRC protocol and its impact on training cluster network design
- Operational and architectural implications for hyperscalers and networking vendors
- Competitive dynamics among NVIDIA, Microsoft, and cloud providers
The News: OpenAI, in collaboration with AMD, Broadcom, Intel, Microsoft, and NVIDIA, has open-sourced the Multipath Reliable Connection (MRC) protocol to tackle the demands of large-scale AI training. MRC uses a multi-plane network architecture and adaptive packet spraying to distribute data across hundreds of network paths, reducing congestion and enabling rapid recovery from failures. This design connects over 100,000 GPUs using only two tiers of switches, lowering costs and power consumption while improving resilience. Early deployments on Oracle Cloud Infrastructure and Microsoft supercomputers have shown tangible operational benefits, minimizing disruptions from frequent link or switch failures. OpenAI has open-sourced the MRC specification through the Open Compute Project to drive industry-wide adoption. As AI model training becomes more resource-intensive, efficient networking is increasingly critical for both cost control and reliability.
Can OpenAI’s MRC Networking Protocol Redefine the Economics of AI Training?
Analyst Take: OpenAI’s MRC protocol directly targets one of the biggest structural bottlenecks in AI infrastructure: the cost and fragility of scaling synchronous training jobs across massive GPU clusters. As hyperscalers race to meet surging AI demand, the ability to keep 100,000+ GPUs synchronized without crippling network complexity or downtime is an ecosystem catalyst that will shift value capture in the dynamic networking market.
MRC’s Shift Toward Software-Defined and Disaggregated Networking
MRC’s use of SRv6 static routing fundamentally changes the operational model for AI networking. Unlike traditional dynamic routing, which relies on protocols like BGP and ECMP that require complex, hardware-accelerated control planes, MRC precomputes forwarding tables at initialization and rarely changes them. Switches only need to perform fast, simple SRv6 uSID forwarding at line rate. This reduces the need for premium, vertically integrated switch platforms and opens the field for software-defined and white-box switch vendors. Arista, named as a collaborator implementing SRv6 in EOS on Broadcom Tomahawk 5, exemplifies the advantage of software flexibility over commodity silicon. With the control-plane logic moving to the endpoint NIC and centralized management, the value shifts away from switch software complexity.
The NIC as the Center of Gravity
Software-defined platforms, such as NVIDIA Spectrum-4 and 5 switches running Cumulus and SONiC, benefit directly, aligning with Microsoft’s push for open network operating systems in production AI infrastructure. Pure software-defined vendors and intent-based networking platforms also stand to gain, as MRC’s centralized path computation and static forwarding model are closer to SDN than to legacy distributed routing. Broadcom, as a key silicon supplier and MRC co-author, benefits from adoption but faces reduced differentiation from its advanced switching features. The competitive battleground shifts to raw silicon performance, port density, and power efficiency, where white-box and software-defined alternatives are most competitive. MRC’s architecture drives performance from the NIC and the network management plane.
OpenAI’s Ecosystem Play
By open-sourcing MRC through the Open Compute Project, OpenAI is betting that industry-wide adoption will create a de facto standard, driving down costs and giving the company and its partners a first-mover advantage [1]. This move pressures rivals such as Google and AWS to match or exceed MRC’s operational benefits or risk being perceived as laggards in AI infrastructure. It also puts switch and network hardware vendors on notice: proprietary approaches that add complexity or cost will struggle to gain traction in a market prioritizing simplicity and reliability.
The relationship between Microsoft and OpenAI has also been visibly evolving. Microsoft’s $13 billion investment gave it preferential access to infrastructure and deep technical integration, but OpenAI has been diversifying its compute relationships. The Oracle Abilene data center running MRC is a direct expression of that, and OpenAI’s recent moves toward building its own infrastructure more independently signal that the original Microsoft exclusivity arrangement is loosening. The MRC OCP publication should be read partly in that context. Opening the standard is consistent with OpenAI’s efforts to reduce its dependence on any single infrastructure partner while preserving the technical advantages the joint work has produced.
What to Watch
- Will cloud and enterprise data centers outside OpenAI’s orbit standardize on MRC by 2027?
- How quickly will AWS, Google, and hardware vendors deliver competing or complementary solutions?
- Will MRC’s real-world reliability and cost claims hold up as deployments scale beyond hyperscalers?
Sources
1. Supercomputer networking to accelerate large scale AI training
Declaration of generative AI and AI-assisted technologies in the writing process: This content has been generated with the support of artificial intelligence technologies. Due to the fast pace of content creation and the continuous evolution of data and information, The Futurum Group and its analysts strive to ensure the accuracy and factual integrity of the information presented. However, the opinions and interpretations expressed in this content reflect those of the individual author/analyst. The Futurum Group makes no guarantees regarding the completeness, accuracy, or reliability of any information contained herein. Readers are encouraged to verify facts independently and consult relevant sources for further clarification.
Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.
Read the full Futurum Group Disclosure.
Other Insights from Futurum:
Meta’S MTIA Partnership With Broadcom Solidifies The Future Of Xpus
Anthropic’S 3.5 Gigawatt TPU Deal With Broadcom Creates A Structural Advantage
Broadcom’S DSP Launch Intensifies The AI Optics Race With Marvell
Author Information
Brendan is Research Director, Semiconductors, Supply Chain, and Emerging Tech. He advises clients on strategic initiatives and leads the Futurum Semiconductors Practice. He is an experienced tech industry analyst who has guided tech leaders in identifying market opportunities spanning edge processors, generative AI applications, and hyperscale data centers.
Before joining Futurum, Brendan consulted with global AI leaders and served as a Senior Analyst in Emerging Technology Research at PitchBook. At PitchBook, he developed market intelligence tools for AI, highlighted by one of the industry’s most comprehensive AI semiconductor market landscapes encompassing both public and private companies. He has advised Fortune 100 tech giants, growth-stage innovators, global investors, and leading market research firms. Before PitchBook, he led research teams in tech investment banking and market research.
Brendan is based in Seattle, Washington. He has a Bachelor of Arts Degree from Amherst College.
