NVIDIA Nemotron 3 Ultra: A 550B Open Model Built for Long-Running AI Agents

NVIDIA released Nemotron 3 Ultra on June 4, 2026 — a fully open 550B-parameter reasoning model topping US open-model benchmarks and tuned for long-running AI agents.

Dr. Nova Chen★Jun 9, 2026★5 min read

Every so often a model release does more than add another row to a leaderboard — it resets expectations for what an open model can be. NVIDIA's Nemotron 3 Ultra, released on June 4, 2026, is that kind of release. It is a fully open 550-billion-parameter reasoning model, and it arrives as the most capable open model yet to come out of a U.S. lab. For developers building agentic systems, it is one of the most consequential open-weight launches of the year.

What Makes Nemotron 3 Ultra Different

Nemotron 3 Ultra is the flagship of NVIDIA's new Nemotron 3 family, which spans Nano, Super, and Ultra sizes. First teased at Computex on June 1, the Ultra model uses a Mixture-of-Experts hybrid Mamba-Attention architecture: 550 billion total parameters, but only about 55 billion active on any given forward pass. That design is the heart of its appeal — you get frontier-class reasoning without paying frontier-class compute on every token.

The headline result is a score of 48 on the Artificial Analysis Intelligence Index, the highest of any open model from a U.S. lab to date. Just as important for production use, an early endpoint served the model at more than 300 tokens per second, which matters enormously for the workloads NVIDIA is targeting.

Why an Open Reasoning Model for Agents Matters

The phrase NVIDIA keeps returning to is long-running agents — systems that reason across many steps, call tools, and stay coherent over extended tasks. Those workloads punish two things: slow inference and high per-token cost. The hybrid Mamba-Attention approach addresses both, keeping memory and throughput manageable as context grows. For anyone who has watched an agent stall partway through a multi-step job, a fast, efficient, open reasoning model is exactly the missing piece.

Open Weights Change the Calculus

Because the weights are open, teams can run Nemotron 3 Ultra on their own infrastructure, fine-tune it for a domain, and audit its behavior — the kind of control that closed APIs cannot offer. That is a meaningful win for the open-source AI community and for organizations that need their reasoning model to live inside their own walls.

Where to Get It

Nemotron 3 Ultra is already available on Hugging Face, ModelScope, and OpenRouter, and it ships as a NIM microservice on NVIDIA's own build.nvidia.com. The smaller Nano and Super variants round out the family for teams that want the same architecture at the edge or on a single GPU. Taken together, the Nemotron 3 lineup is a clear, optimistic signal: open models are not trailing the frontier anymore — they are helping define it.

Sources: NVIDIA Newsroom, "NVIDIA Debuts Nemotron 3 Family of Open Models" (June 2026); Artificial Analysis, "NVIDIA Nemotron 3 Ultra Launch" (June 4, 2026); NVIDIA Technical Blog (June 2026).