NVIDIA Nemotron 3 Ultra: Its Most Capable Open-Weight LLM Lands at Computex 2026

NVIDIA's Nemotron 3 Ultra debuts at Computex 2026: a 550B sparse Mixture-of-Experts open-weight LLM topping the US Intelligence Index at 48, with open datasets.

Dr. Nova Chen★Jun 3, 2026★4 min read

NVIDIA's Nemotron 3 Ultra arrived on the Computex 2026 stage in Taipei on June 1, and I will admit I leaned closer to my screen the way I do when a new instrument comes online. Standing at his keynote, NVIDIA CEO Jensen Huang unveiled what the company calls its most capable open-weight large language model to date — and the architecture underneath is genuinely worth slowing down to appreciate. This is not the Cosmos 3 physical-AI system making headlines elsewhere this week; Nemotron 3 Ultra is a distinct, general-purpose open-weight LLM, and it is a beautiful piece of engineering.

What Makes Nemotron 3 Ultra Tick

Here is the part that delights the systems scientist in me. Nemotron 3 Ultra carries roughly 550 billion total parameters, yet it is a *sparse* Mixture-of-Experts (MoE) model with about 90% sparsity. In plain terms: of those 550 billion parameters, only about 55 billion are active for any given token. Think of it as a vast research library where you do not read every book to answer a question — a router quietly walks you to the handful of "expert" shelves that matter, and the rest stay dormant.

That sparsity is why a model this large can run so briskly. On a pre-release endpoint, Nemotron 3 Ultra clocked over 300 tokens per second — the kind of throughput that turns a thoughtful assistant into a responsive collaborator. NVIDIA ships it in two numeric formats: BF16 for full-fidelity work and NVFP4, a 4-bit quantization that shrinks the memory footprint so the model fits more comfortably on real hardware. The wonder of MoE is precisely this decoupling of *knowledge capacity* from *per-query cost*.

The Intelligence Index, in Context

The independent benchmarking group Artificial Analysis places Nemotron 3 Ultra at an Intelligence Index of 48, billing it as the most intelligent US open-weights model available. For perspective, that sits ahead of Google's Gemma 4 (39), NVIDIA's own Nemotron 3 Super (36), and gpt-oss-120b (33). A 12-point jump over the prior Nemotron tier is not a rounding error — it is a meaningful step on a composite scale that blends reasoning, coding, and instruction-following.

Why Open Weights Matter for Agentic AI

What genuinely excites me is not the leaderboard line; it is the openness. NVIDIA is releasing Nemotron 3 Ultra with open weights *plus* open datasets and open libraries — including NeMo Gym and NeMo RL, the tooling used to train and reinforce agentic behavior. That combination is rarer than it sounds. Open weights let you run and fine-tune the model yourself; open datasets and training libraries let you understand *how* it learned and reproduce or extend that process.

For researchers, startups, and students who cannot commission a frontier model from scratch, this is an accelerant. An open-weight LLM of this caliber means a university lab or a small agentic-AI team can build autonomous coding assistants, tool-using research agents, and instruction-following systems on a foundation they can inspect, audit, and adapt. Reproducibility is the bedrock of good science, and NVIDIA leaning into transparency here moves the whole field forward.

A Quiet Lesson in Efficiency

There is an elegant thesis embedded in Nemotron 3 Ultra: that capability and accessibility need not trade off. Sparse Mixture-of-Experts design, NVFP4 quantization, and open tooling together say *you can have a 550-billion-parameter mind without paying 550 billion parameters of inference cost every single time*. That is the sort of architectural cleverness I love explaining, because it reframes what "big" even means.

The Takeaway

Nemotron 3 Ultra is a landmark for open-weight AI — fast, sparse, transparent, and built to make agentic systems broadly buildable. I find that thrilling. When the most capable US open-weight model also ships with the datasets and libraries to learn from it, the barrier to participating in this science drops for everyone. And a lower barrier, dear reader, is how discovery accelerates.

Sources: Artificial Analysis (June 1, 2026); NVIDIA Blog / GTC Taipei at Computex (June 1, 2026); Crypto Briefing (June 1, 2026).