NVIDIA Cosmos 3: The First Fully Open Omni-Model for Physical AI

NVIDIA's Cosmos 3, launched early June 2026, is the first fully open omni-model for physical AI — reasoning first, then generating across modalities.

Dr. Nova Chen★Jun 3, 2026★4 min read

Cosmos 3 has arrived, and for anyone tracking the convergence of robotics and generative AI, it is a genuinely thrilling moment. NVIDIA unveiled the model on May 31, 2026 at GTC Taipei alongside COMPUTEX (with broader coverage landing June 1), describing it as the first fully open omni-model for physical AI. The framing matters: rather than generating pixels or actions blindly, Cosmos 3 reasons first, then generates. That ordering is the conceptual heart of the release, and it signals a maturing of how we think about world models for machines that have to act in the real world.

What Makes Cosmos 3 a True Physical AI Foundation Model

Physical AI refers to systems that perceive, understand, and act within the physical world — robots, autonomous vehicles, and vision systems that must respect gravity, contact, and cause-and-effect. The challenge has always been data: real-world interaction is expensive, slow, and risky to collect at scale. Cosmos 3 is positioned as a foundation model that addresses this by generating high-fidelity synthetic experience grounded in reasoning about physics, not just visual plausibility.

What I find most compelling is its breadth. Cosmos 3 is multimodal across text, images, video, ambient sound, and action. That last modality — action — is what separates a physical AI model from a conventional video generator. A system that can condition on and produce actions can serve directly as a training and evaluation engine for embodied agents.

The Unified Mixture-of-Transformers Architecture

Architecturally, Cosmos 3 uses a unified mixture-of-transformers (MoT) design that pairs two specialists under one roof. An autoregressive reasoning transformer handles understanding and planning — the "think it through" stage — while a diffusion generation transformer synthesizes the rich multimodal output. This division is elegant: autoregressive models excel at discrete, sequential reasoning, while diffusion models are state-of-the-art for high-fidelity continuous generation like video. By coupling them in one model, NVIDIA gets reasoning that informs generation, so the output is consistent with a plan rather than merely statistically likely. As an analytical aside, this reason-then-generate structure mirrors a broader trend across the field toward models that deliberate before they produce.

Three Tiers for Different Hardware Realities

NVIDIA is shipping Cosmos 3 in tiers, which thoughtfully maps capability to deployment context. Cosmos 3 Nano totals 16 billion parameters — an 8B reasoner paired with an 8B generator — and is sized for workstation GPUs such as the RTX PRO 6000, putting capable physical-AI tooling within reach of individual labs and developers. Cosmos 3 Super scales up to 64 billion parameters (a 32B reasoner plus a 32B generator) for large-scale synthetic data generation on Hopper and Blackwell systems. A Cosmos 3 Edge tier is also on the way, targeting real-time inference where latency is the binding constraint.

Fully Open Under OpenMDW 1.1

The word "open" here is substantive. Cosmos 3 is released under the Linux Foundation's permissive OpenMDW 1.1 license, and the openness extends well beyond weights. NVIDIA is making the model, weights, datasets, benchmarks, and post-training scripts available through Hugging Face, GitHub, and build.nvidia.com. For researchers, having the datasets and evaluation benchmarks in hand — not just a checkpoint — is what makes work reproducible and extensible. NVIDIA reports that Cosmos 3 can compress physical-AI training and evaluation cycles from months to days, which, if it holds across diverse tasks, reshapes the iteration loop for embodied systems.

A Growing Coalition of Early Adopters

The early-adopter list spans the physical AI landscape. In robotics, Agile Robots, Doosan Robotics, LG Electronics, Samsung Electronics, and Skild AI are on board. Li Auto brings the autonomous-vehicle perspective, while Linker Vision, Milestone Systems, and Centific represent vision AI. NVIDIA also launched a Cosmos Coalition — including Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI — signaling an intent to build a shared ecosystem rather than a walled garden.

My read: by open-sourcing a reasoning-driven, action-aware world model with its training infrastructure attached, NVIDIA is lowering the barrier to serious physical AI research. That is exactly the kind of move that tends to accelerate an entire field.

Sources: NVIDIA Newsroom, May 31, 2026; NVIDIA Blog, June 1, 2026; Hugging Face Blog, June 1, 2026; WinBuzzer, June 1, 2026