
Cohere's North Mini Code Runs an Open Coding Agent on One GPU
Cohere's North Mini Code is a 30B open-weight coding model under Apache 2.0 that runs on a single H100, scoring 83.2% on SWE-Bench Verified with a 256K context.
A Frontier-Class Coding Model You Can Actually Host Yourself
Every few weeks the open-weight ecosystem produces a release that quietly resets expectations, and on June 9, 2026, Cohere delivered one. North Mini Code is the company's first developer-focused model and the opening entry in a new family — and most importantly, it is small enough to run a capable software-engineering agent on a single GPU. For anyone who has watched powerful coding assistants stay locked behind metered cloud APIs, this is a genuinely encouraging shift.
North Mini Code ships under a permissive Apache 2.0 license, with weights available in both BF16 and FP8 on Hugging Face, the Cohere API, OpenRouter, and OpenCode. That licensing choice matters as much as the benchmarks: it means teams can fine-tune, self-host, and deploy the model without usage restrictions.
What's Inside the North Mini Code Architecture
Under the hood, North Mini Code is a 30-billion-parameter Mixture-of-Experts model that activates only about 3 billion parameters per token. It uses 128 experts with 8 routed per token, a design that keeps the compute footprint low while preserving the breadth of a much larger network. Purpose-built for code generation, agentic software engineering, and terminal tasks, it supports a 256K-token context window with up to 64K tokens of generation — enough headroom to reason across a sizeable codebase in a single pass.
The efficiency story is the headline for our local AI audience. Quantized to FP8, the model runs on a single NVIDIA H100, and Cohere reports up to 2.8x higher output throughput than comparable small coding models. Native tool use and interleaved reasoning are built in, so the model can plan, call tools, and refine its work the way a real engineering agent needs to.
Benchmarks That Back Up the Pitch
Numbers ground the enthusiasm. On the independent Artificial Analysis Coding Index, North Mini Code scores 33.4. More tellingly for practitioners, it posts 83.2% pass@10 on SWE-Bench Verified — the benchmark that measures whether a model can resolve real GitHub issues — and 62.9% on Terminal-Bench v2 after a round of reinforcement learning. Those are strong results for any model, and remarkable for one this size.
What makes the benchmarks meaningful is the context in which they're achievable. A score is only useful if you can reproduce the conditions that produced it, and here you can: download the open weights, load them onto one accelerator, and run the same agentic workflows locally.
Why an Open Coding Model on a Single GPU Matters
The deeper significance is about access and sovereignty. A truly open, frontier-adjacent coding model that fits on one GPU puts capable software-engineering agents within reach of individual developers, small studios, and enterprises with strict data-residency requirements. Code never has to leave the building, latency drops, and there are no recurring per-token costs to budget around — the same self-hosting appeal that drives so much of the interest in the compact hardware we cover in our mini computers section.
It also broadens the research surface. Open weights with a permissive license invite the community to probe, adapt, and improve the model, which historically accelerates the entire field faster than any single closed release can.
What to Watch Next
The open question, as always, is how North Mini Code holds up across messy, real-world repositories beyond curated benchmarks — and the community will spend the coming weeks finding out, precisely because they can. But as a proof point that high-quality agentic coding no longer requires a data center, this release lands squarely on the right side of the trend. The future of developer tooling looks increasingly local, open, and fast.
Sources: Cohere, "Introducing North Mini Code" (June 9, 2026); MarkTechPost, "Cohere Releases North Mini Code, a 30B Open-Weight Coding Model" (June 11, 2026); VentureBeat, "Cohere open-sources a coding agent that runs on a single H100" (June 2026); Artificial Analysis Coding Index benchmark write-up (June 2026).
