Analyst(s): Nick Patience
Publication Date: June 2, 2026
At GTC Taipei, NVIDIA launched Cosmos 3, an open physical AI foundation model using a mixture-of-transformers architecture, and released a major open source collection of physical AI agent skills and tools. Together, the announcements represent NVIDIA’s effort to close the gap between AI research and production deployment in robotics, autonomous vehicles, and industrial AI, with real-world adoption data already suggesting the approach is gaining traction.
What is Covered in This Article:
- NVIDIA launched Cosmos 3, an open world foundation model for physical AI built on a mixture-of-transformers architecture that natively combines vision reasoning, world generation, and action prediction.
- Cosmos 3 is available in two variants – Super and Nano – with an Edge variant for real-time inference in development. It ranks first across multiple physical AI benchmarks among open models.
- NVIDIA launched the Cosmos Coalition, a collaboration with Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI to advance open-world model development.
- NVIDIA released a major open source collection of physical AI agent skills and tools spanning Omniverse, Cosmos, Alpamayo, and Metropolis for robotics, AVs, vision AI, and industrial digital twins.
- Industry deployments — including reported efficiency gains at Pegatron, Delta Electronics, Foxconn, Inventec, and Li Auto — provide early evidence of production-scale adoption, though results vary and are largely vendor-reported.
The News: At GTC Taipei, NVIDIA made two significant announcements targeting the physical AI market. First, it launched Cosmos 3, described as the world’s first fully open omnimodel for physical AI, built on a mixture-of-transformers architecture that pairs a reasoning transformer with an expert generation transformer. Cosmos 3 natively understands and generates text, images, video, ambient sound, and action trajectories and is positioned as a foundation for training robots, autonomous vehicles, and vision AI agents. The model is available in two variants: Cosmos 3 Super for maximum physics accuracy, and Cosmos 3 Nano for speed. A Cosmos 3 Edge variant for real-time inference at the edge is forthcoming.
Second, NVIDIA announced a major open source release of physical AI agent skills and tools, available via GitHub and skills.sh, covering its Omniverse, Cosmos, Alpamayo, Isaac, Metropolis, and Jetson platforms. The skills are designed to make NVIDIA’s physical AI stack accessible to coding agents, turning complex training, simulation, evaluation, and deployment workflows into repeatable, agent-executable instructions.
NVIDIA Cosmos 3 and Open Agent Tools: Is Physical AI About to Leave the Lab?
Analyst Take: These two announcements from NVIDIA at GTC Taipei continue a consistent thread we’ve seen from the company over the last year or so: using openness as a mechanism for ecosystem capture while locking developers into its broader physical AI stack. Cosmos 3 and the open agent skills releases are components of a single platform play. The question worth asking is whether the underlying architecture is as differentiated as NVIDIA claims, and whether the enterprise adoption data reflects durable production use or early-stage pilots that have not yet encountered real-world complexity.
Cosmos 3: A Meaningful Architectural Step, With Important Caveats
The mixture-of-transformers architecture in Cosmos 3 is a genuine technical development. Combining a reasoning transformer with a generation transformer in a single system capable of handling text, image, video, sound, and action trajectories together addresses a real limitation in current physical AI development pipelines, where perception, simulation, and policy models are typically trained and evaluated in separate stages. If Cosmos 3 can deliver the claimed reduction in physical AI training cycles from months to days at a meaningful scale, that is a significant workflow change for robotics and AV developers.
The tiered model lineup – Super, Nano, and the forthcoming Edge – follows a now-standard NVIDIA pattern of segmenting capability by use case and deployment context. We believe Cosmos 3 Edge, targeting real-time inference, is the variant likely to be most relevant for production robotics deployments. The ability to deploy at the edge will determine whether Cosmos 3 is a training and simulation tool or a viable component of deployed physical AI systems.
The benchmark results cited by NVIDIA – first-ranked open model across Physics-IQ, PAI-Bench, R-Bench, RoboLab, and RoboArena, among others – are notable, though benchmark performance and production reliability in messy real-world environments are different things. For physical AI workloads in particular, where proprietary sensor data, safety validation, and operational continuity are all in play, the open model framing may appeal less to established industrial manufacturers than to the startup and research community.
Agent Skills: Lowering the Floor for Physical AI Development
The open source agent skills release is arguably the more immediately practical announcement. NVIDIA is effectively turning its existing stack – Omniverse, Isaac, Metropolis, Cosmos, Alpamayo, Jetson – into callable tools for coding agents. This matters because the bottleneck in physical AI development is rarely the availability of foundation models; it is the cost and complexity of integrating simulation, data generation, training, and evaluation pipelines. By wrapping these into agent-executable skills, NVIDIA is addressing a real friction point.
The early adoption figures provided by NVIDIA, such as Pegatron reducing training and deployment time by 67%, Delta Electronics improving defect detection rates by 17%, Foxconn improving first-pass yield by roughly 3%, and Inventec reducing data collection effort by 30%, are worth noting, even though they are vendor-reported and cover specific use cases rather than general deployments. The Li Auto figures are more substantial in volume: 1,000-plus neural scene reconstructions and more than 300,000 renders and simulations per day using Omniverse NuRec models. These numbers indicate production-scale workloads, rather than mere pilots.
The pairing of NemoClaw and OpenShell for policy-based security and privacy governance over agent execution addresses a concern that will be increasingly relevant as physical AI agents take on consequential tasks in manufacturing, healthcare, and autonomous driving. This is an area where enterprise buyers will require clarity before deploying agentic systems in regulated or safety-critical environments.
The Cosmos Coalition: Ecosystem Governance as Competitive Strategy
The Cosmos Coalition follows the pattern NVIDIA has used effectively in other domains. By convening a coalition around an open model, NVIDIA defines the technical center of gravity while distributing the cost of model development across ecosystem participants. Members contribute research, evaluation techniques, and training data while building on Cosmos 3 technologies and NVIDIA DGX Cloud infrastructure.
The Cosmos Coalition currently favors early-stage firms, with Runway being the most commercially scaled. While credible robotics players like Agile Robots and Skild AI are involved, the lack of major industrial automation vendors is notable, given their importance to the physical AI market. However, NVIDIA’s history of turning developer ecosystems into enterprise-level relationships suggests the coalition will eventually expand beyond startups.
What to Watch:
- Cosmos 3 Edge Availability: The real-time inference variant is the most consequential for production deployments. Its technical specification and timeline will indicate how seriously NVIDIA is pursuing embedded and operational physical AI, versus training and simulation use cases.
- Cosmos Coalition Expansion: Watch for the addition of industrial OEMs, automotive Tier 1 suppliers, or major manufacturing technology vendors, which would indicate the platform is gaining traction beyond the AI developer community.
- Competitive Responses: Google DeepMind’s robotics foundation model work, along with efforts from other well-resourced physical AI labs, represents a meaningful alternative trajectory that could limit Cosmos’s reach in higher-end robotics applications.
- Inference Infrastructure Dynamics: NVIDIA’s strategy depends on retaining commercial value in deployment infrastructure even as model weights are open-sourced. Pricing and availability shifts at CoreWeave, Nebius, and Microsoft Azure for Cosmos-based workloads will signal how that value chain is evolving.
For more on these announcements and everything else NVIDIA has launched at GTC Taipei, see the NVIDIA Newsroom.
Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.
Other Insights From Futurum:
NVIDIA Q1 FY2027: Data Center Diversification, Blackwell Scale, CPU Upside
Does NVIDIA’s Physical AI Gambit with T-Mobile Redraw the Edge Compute Map?
Author Information
Nick Patience is VP and Practice Lead for AI Platforms at The Futurum Group. Nick is a thought leader on AI development, deployment, and adoption - an area he has researched for 25 years. Before Futurum, Nick was a Managing Analyst with S&P Global Market Intelligence, responsible for 451 Research’s coverage of Data, AI, Analytics, Information Security, and Risk. Nick became part of S&P Global through its 2019 acquisition of 451 Research, a pioneering analyst firm that Nick co-founded in 1999. He is a sought-after speaker and advisor, known for his expertise in the drivers of AI adoption, industry use cases, and the infrastructure behind its development and deployment. Nick also spent three years as a product marketing lead at Recommind (now part of OpenText), a machine learning-driven eDiscovery software company. Nick is based in London.