Menu

NVIDIA’s New Rubin CPX Targets Future of Large-Scale Inference

NVIDIA’s New Rubin CPX Targets Future of Large-Scale Inference

Analyst(s): Ray Wang
Publication Date: September 18, 2025

NVIDIA has introduced Rubin CPX, a GPU purpose-built for massive-context inference. In the new Vera Rubin NVL144 CPX racks, the company highlights configurations of 8 EF NVFP4 compute, 100 TB of high-speed memory, and 1.7 PB/s bandwidth. We believe CPX represents a game-changing architecture that pressures competitors to re-architect around prefill-optimized inference.

What is Covered in this Article:

  • NVIDIA Rubin CPX announcement and Vera Rubin NVL144 CPX specs and claims.
  • CPX’s role in disaggregated serving (prefill vs. decode) and attention acceleration.
  • Analysis on competitive impact (AMD and custom silicon) and rack-scale options.
  • Practical design changes: GDDR7, PCIe Gen6, cableless modules, liquid cooling.
  • A $5 billion token revenue per $100 million invested and million-token use cases.

The News: NVIDIA has introduced Rubin CPX, a new GPU class built for massive-context processing, powering million-token coding and generative video. The Vera Rubin NVL144 CPX platform delivers eight exaflops of NVFP4 AI compute, 100TB of high-speed memory, and 1.7 PB/s bandwidth per rack – 7.5x more AI performance than GB300 NVL72 systems. A dedicated CPX compute tray is also available for existing setups.

NVIDIA positions Rubin CPX as the top option for long-context processing, combining video encode/decode with large-context inference on a single chip. It integrates into the full NVIDIA AI stack (Dynamo, Nemotron, NIM, CUDA-X), with interest from Cursor, Runway, and Magic. General availability is expected at the end of 2026, with NVIDIA projecting $5 billion in token revenue per $100 million invested.

NVIDIA’s New Rubin CPX Targets Future of Large-Scale Inference

Analyst Take: Rubin CPX is designed for the compute-heavy prefill phase, using a monolithic die optimized for NVFP4 and GDDR7 memory, while standard Rubin GPUs focus on the bandwidth-heavy decode phase. The Vera Rubin NVL144 CPX rack combines 144 Rubin GPUs, 144 Rubin CPX GPUs, and 36 Vera CPUs to hit 8 EF NVFP4, 100TB fast memory, and 1.7 PB/s bandwidth – a 7.5x jump over GB300 NVL72. Each Rubin CPX GPU delivers up to 30 PFLOPS (NVFP4), 3x faster attention, and 128GB GDDR7 for context-driven workloads.

We think CPX could be a game changer, widening NVIDIA’s lead in rack-scale design and pressuring rivals to rework their silicon strategies. In effect, Rubin CPX shifts inference economics toward disaggregated serving, making it a key term for investors and industry players to watch. Zooming out, we believe CPX positions NVIDIA even more strongly against rivals targeting the inference market—a compute segment expanding faster than training. We expect this segment to accelerate as video and coding applications powered by generative AI mature, driving increasingly complex inference workloads in the coming years.

Prefill Specialization and System Design

Rubin CPX is built for massive-context inference, handling million-token coding and long-form video within a single chip that merges video codecs with context processing. Its design emphasizes compute power (NVFP4) over bandwidth, aligning with the FLOPS-heavy prefill phase while avoiding costly HBM. Attention throughput is central, with CPX delivering 3x faster attention than GB300 NVL72 to sustain long sequences at high speed. In the Vera Rubin NVL144 CPX rack, Rubin and Rubin CPX GPUs complement each other – one for prefill, the other for decode – enabling efficient disaggregated serving across phases. This pairing makes large-scale inference faster, cheaper, and more balanced across phases.

Rack-Scale Options and Engineering Choices

The lineup includes three rack designs: VR200 NVL144, VR200 NVL144 CPX, and a Vera Rubin CPX Dual Rack pairing a VR NVL144 with a CPX rack. Engineering updates feature cableless, modular trays, liquid cooling on the CPX modules (~370 kW per NVL144 CPX rack), and daughter cards integrating CX-9 NICs, OSFP cages, NVMe, and Rubin CPX. Signal routing now uses Paladin board-to-board connectors and a PCB midplane, with NIC placement optimized for shorter high-speed paths, enabling PCIe Gen6 over PCB. These design moves focus on density, serviceability, and scale-out networking.

Competitive Implications for Rivals

We believe the emergence of Rubin CPX could compel AMD and custom silicon vendors to reassess their roadmaps, as the growing relevance of prefill-optimized hardware intersects with NVIDIA’s widening rack-scale advantage. By leveraging GDDR7, PCIe Gen6, and a lower-cost profile relative to HBM-centric designs, CPX delivers strong efficiency for prefill workloads. System-level gains beyond the chip level, such as higher aggregate memory bandwidth and an integrated rack architecture, further reinforce NVIDIA’s lead.

Monetization Claims and Early Ecosystem Signals

NVIDIA projects Vera Rubin NVL144 CPX to deliver $5 billion in token revenue per $100 million invested, tying its pitch directly to ROI in long-context inference. Early partners such as Cursor, Runway, and Magic are testing CPX for coding assistants, cinematic content, and agent-driven software engineering with massive context windows. Full-stack support (Dynamo, Nemotron, NIM, CUDA-X, AI Enterprise) ensures smooth deployment across cloud and data center environments.

With launch slated for late 2026, we see the new product as a potential incremental driver of NVIDIA’s quarterly performance from late 2026 through 2027. In light of ongoing advances in inference, we believe CPX strengthens NVIDIA’s competitive edge by providing a more cost-efficient yet high-performance solution, positioning the company to capture greater market share in inference-driven workloads.

What to Watch:

  • Practical benefits of 3x faster attention on real-world long-context workloads.
  • Adoption of Vera Rubin NVL144 CPX vs. Dual Rack, where power/cooling differ (~370 kW vs. ~190 kW for NVL144).
  • How Cursor, Runway, and Magic productize million-token contexts and generative video.
  • Competitive responses: prefill-specialized chips from AMD and custom silicon providers.
  • Deployment timelines toward end-2026 availability and integration with NVIDIA’s AI stack.

See the complete press announcement and event details on the NVIDIA Rubin CPX introduction on the NVIDIA website.

Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.

Other insights from Futurum:

Could NVIDIA’s Collaboration with MediaTek Trigger a $73 Billion Acquisition Bid?

NVIDIA Q2 FY 2026 Earnings: Networking Steals the Spotlight and Q3 Ramp Will Be Key To Watch

Is NVIDIA’s Jetson Thor the New Brain for General Robotics?

Image Credit: NVIDIA

Author Information

Ray Wang is the Research Director for Semiconductors, Supply Chain, and Emerging Technology at Futurum. His coverage focuses on the global semiconductor industry and frontier technologies. He also advises clients on global compute distribution, deployment, and supply chain. In addition to his main coverage and expertise, Wang also specializes in global technology policy, supply chain dynamics, and U.S.-China relations.

He has been quoted or interviewed regularly by leading media outlets across the globe, including CNBC, CNN, MarketWatch, Nikkei Asia, South China Morning Post, Business Insider, Science, Al Jazeera, Fast Company, and TaiwanPlus.

Prior to joining Futurum, Wang worked as an independent semiconductor and technology analyst, advising technology firms and institutional investors on industry development, regulations, and geopolitics. He also held positions at leading consulting firms and think tanks in Washington, D.C., including DGA–Albright Stonebridge Group, the Center for Strategic and International Studies (CSIS), and the Carnegie Endowment for International Peace.

Related Insights
CIO Take Smartsheet's Intelligent Work Management as a Strategic Execution Platform
December 22, 2025

CIO Take: Smartsheet’s Intelligent Work Management as a Strategic Execution Platform

Dion Hinchcliffe analyzes Smartsheet’s Intelligent Work Management announcements from a CIO lens—what’s real about agentic AI for execution at scale, what’s risky, and what to validate before standardizing....
Will Zoho’s Embedded AI Enterprise Spend and Billing Solutions Drive Growth
December 22, 2025

Will Zoho’s Embedded AI Enterprise Spend and Billing Solutions Drive Growth?

Keith Kirkpatrick, Research Director with Futurum, shares his insights on Zoho’s latest finance-focused releases, Zoho Spend and Zoho Billing Enterprise Edition, further underscoring Zoho’s drive to illustrate its enterprise-focused capabilities....
Micron Technology Q1 FY 2026 Sets Records; Strong Q2 Outlook
December 18, 2025

Micron Technology Q1 FY 2026 Sets Records; Strong Q2 Outlook

Futurum Research analyzes Micron’s Q1 FY 2026, focusing on AI-led demand, HBM commitments, and a pulled-forward capacity roadmap, with guidance signaling continued strength into FY 2026 amid persistent industry supply...
NVIDIA Bolsters AI/HPC Ecosystem with Nemotron 3 Models and SchedMD Buy
December 16, 2025

NVIDIA Bolsters AI/HPC Ecosystem with Nemotron 3 Models and SchedMD Buy

Nick Patience, AI Platforms Practice Lead at Futurum, shares his insights on NVIDIA's release of its Nemotron 3 family of open-source models and the acquisition of SchedMD, the developer of...
Will a Digital Adoption Platform Become a Must-Have App in 2026?
December 15, 2025

Will a DAP Become the Must-Have Software App in 2026?

Keith Kirkpatrick, Research Director with Futurum, covers WalkMe’s 2025 Analyst Day, and discusses the company’s key pillars for driving success with enterprise software in an AI- and agentic-dominated world heading...
Broadcom Q4 FY 2025 Earnings AI And Software Drive Beat
December 15, 2025

Broadcom Q4 FY 2025 Earnings: AI And Software Drive Beat

Futurum Research analyzes Broadcom’s Q4 FY 2025 results, highlighting accelerating AI semiconductor momentum, Ethernet AI switching backlog, and VMware Cloud Foundation gains, alongside system-level deliveries....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.