Qwen3.6-Max-Preview Tops Six Coding Benchmarks Including SWE-Bench Pro

Alibaba's Qwen3.6-Max-Preview launched April 20 and immediately claimed #1 on SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, and three more — with a 256K context window and agentic preserve_thinking mode.

Dr. Nova Chen★Apr 22, 2026★4 min read

Alibaba's Qwen3.6-Max-Preview Just Set a New Bar for AI Coding

Alibaba released Qwen3.6-Max-Preview on April 20, 2026 — the flagship tier of the Qwen3.6 family and the company's most capable model to date. Within hours of release, independent benchmark runs confirmed what Alibaba's internal numbers suggested: Qwen3.6-Max-Preview claims the top position on six major coding benchmarks, making it one of the strongest AI coding tools publicly available.

The model is available via Qwen Studio and Alibaba Cloud Model Studio. Unlike the open-weight Qwen3.6 series that runs locally on Ollama, Max-Preview is a hosted proprietary offering — Alibaba's direct answer to the frontier closed-model services from Anthropic, OpenAI, and Google in the professional coding segment.

Six Benchmark Wins — What They Cover

Qwen3.6-Max-Preview claimed the top score on:

- SWE-bench Pro — real-world software engineering task resolution on production codebases

- Terminal-Bench 2.0 — autonomous terminal-based agent programming under real execution conditions

- SkillsBench — multi-skill software engineering across diverse programming disciplines

- QwenClawBench — complex, multi-step coding agent tasks requiring planning and tool use

- QwenWebBench — web development and browser automation agent evaluation

- SciCode — scientific computing and research code generation

The combination covers the full spectrum of what AI coding tools are actually used for: resolving GitHub issues in production repos, running autonomous terminal agents, building web applications, and writing research-grade scientific code. Topping all six in a single release is a meaningful result.

Gains Over Qwen3.6-Plus

The improvement over Qwen3.6-Plus — the prior top of the Qwen family — is quantified:

- +9.9 points on SkillsBench

- +10.8 points on SciCode

- +5.0 points on NL2Repo (natural language to repository-level code)

- +3.8 points on Terminal-Bench 2.0

- +2.3 points on SuperGPQA (world knowledge)

- +5.3 points on QwenChineseBench

The SciCode and SkillsBench jumps are the most notable — double-digit gains in a single model generation suggest a real architectural or training advancement rather than marginal optimization.

256K Context and the preserve_thinking Feature

Qwen3.6-Max-Preview ships with a 256K token context window — large enough to hold substantial codebases, long conversation histories, or extended research documents in a single inference call without chunking.

The preserve_thinking feature is specifically designed for agentic coding workflows. When enabled, the model retains its intermediate reasoning steps across tool calls and multi-step executions rather than restarting its reasoning context on each action. This produces more coherent decision-making across long autonomous coding sessions — a direct response to one of the core failure modes that makes frontier AI coding agents less reliable on complex, multi-step projects.

Drop-In API Compatibility

One practical detail that matters for developers: Qwen3.6-Max-Preview's API is compatible with both the OpenAI and Anthropic API specifications. Teams running coding agents or LLM pipelines built against either of those API formats can test Max-Preview with minimal integration work — it is a configuration change rather than a rewrite.

Where This Sits in the Coding AI Landscape

The Qwen team has been explicit that Qwen3.6-Max-Preview is a "preview" — the model is still being refined and the full release will follow. Shipping a preview that claims six benchmark tops is a confident public statement about the direction of travel.

For developers running AI-assisted coding workflows who are looking for the highest benchmark performance available via API today, Qwen3.6-Max-Preview is worth evaluating. The API compatibility with existing OpenAI and Anthropic tooling means the evaluation cost is low.

Sources: Qwen Blog (April 20, 2026), Decrypt (April 20, 2026), BuildFastWithAI (April 2026), DigitalApplied (April 2026), Qubrid AI (April 2026)