Qwen3.6 Arrives on Ollama: Run a 35B Agentic Coding AI Locally With 256K Context

Alibaba's Qwen3.6 is now on Ollama — a 35B open-weight model with 256K context, vision support, and thinking preservation built for agentic coding workflows you can run on your own hardware.

Dr. Nova Chen★Apr 18, 2026★5 min read

Qwen3.6 Is on Ollama — and It Is Built for Agentic Coding

Alibaba's Qwen team has been on a steady upward trajectory, and Qwen3.6 is their most ambitious open-weight release yet. The model landed on Ollama this week, immediately available as a 35B parameter download — meaning anyone with sufficient local hardware can pull it and run it without an API key, a cloud subscription, or a waitlist. The headline capabilities are agentic coding workflows, thinking preservation, and a 256K token context window that covers the scope of real software projects.

The Qwen team's stated direction for this release is captured in three words: "Towards Real World Agents." That framing signals a deliberate shift from benchmark-optimized performance to the messier, more demanding requirements of actual developer workflows.

What "Thinking Preservation" Actually Means

The standout architectural feature in Qwen3.6 is thinking preservation — the model's ability to retain and carry forward reasoning context from prior messages across a multi-turn session. This is a different capability from a large context window, and the distinction matters for agentic use cases.

A large context window lets the model see more tokens at once. Thinking preservation means the model's intermediate reasoning — the problem decomposition, the plan it formed in earlier turns, the conclusions it drew from prior tool calls — stays coherent and accessible as the session extends. For multi-step coding workflows where the model is coordinating across files, managing a debugging loop, or executing a sequence of tool calls, that preserved reasoning continuity is what separates useful autonomous behavior from a model that loses the thread after a few turns.

What Qwen3.6 Handles in Practice

The capability set maps directly to the workflows that matter for developers running local AI:

- Repository-level reasoning: Comprehending and navigating multi-file codebases rather than treating each file as an isolated context

- Frontend workflow handling: Reading UI layouts, reasoning about component structure, and generating frontend code from designs with coherent understanding of how the pieces connect

- Terminal-based execution: Iterating through build outputs, error traces, and test results as part of a coordinated debugging workflow

- Long-horizon planning: Maintaining a coherent task plan across multiple tool calls and intermediate results without losing the original goal

These are not toy benchmarks. They are the capabilities that make a local coding assistant genuinely useful for extended development sessions.

The 35B Model on Ollama: What You Need to Run It

The Ollama-available variant is the 35B parameter model, weighing in at approximately 24GB. The 256K context window is the full open-weight specification — not a trimmed-down version. Input modalities include both text and image, making it useful for workflows that involve UI screenshots, diagrams, or visual assets alongside code.

To pull and run it:

ollama run qwen3.6

Twelve model variants are available in the Ollama library, covering different quantization levels and size configurations for hardware ranging from workstations to high-memory consumer GPUs.

Open-Weight Significance

Qwen3.6's open-weight availability is worth emphasizing specifically because of what it enables for developers who care about privacy, latency, and control. Running a 35B agentic coding model locally means:

- No API costs for high-volume agentic workflows that make hundreds of tool calls per session

- Complete data privacy — code and proprietary logic never leaves the local machine

- Zero latency overhead from cloud round-trips in tight tool-call loops

- Full customization — the model weights are available for fine-tuning on domain-specific codebases

For the self-hosted AI community and developers working in environments where sending code to external APIs is not permitted, Qwen3.6 on Ollama is the most capable open-weight option currently available for agentic coding workflows.

A Strong Open-Weight Moment

The open-weight frontier has moved decisively upward in 2026. Qwen3.6 joining Ollama's library is one of the clearest signs yet that capable agentic AI is no longer exclusively available through proprietary APIs — and that the gap between what you can run locally and what requires a cloud subscription continues to narrow.

Sources: Ollama Library — qwen3.6 (April 2026), Qwen AI Blog — "Qwen3.6-Plus: Towards Real World Agents" (April 2026), Alibaba Cloud Community (April 2026), BuildFastWithAI (April 2026), MindStudio Qwen3.6-Plus Review (April 2026)