Alibaba's Qwen3.7-Plus Pairs Vision With Autonomous Agent Skills at $0.40 per Million Tokens

Alibaba's Qwen team launched Qwen3.7-Plus on June 2, 2026 — a multimodal model combining vision and video understanding with deep reasoning, tool use, and autonomous iteration.

Dr. Nova Chen★Jun 4, 2026★5 min read

Qwen3.7-Plus Brings Vision and Agency Together at a Friendly Price

Alibaba's Qwen team moved Qwen3.7-Plus into general availability on the Bailian (Model Studio) platform on June 2, 2026, and it is a notable step for accessible agentic AI. The model combines vision and video understanding with a suite of autonomous agent capabilities, and it is priced at just $0.40 per million input tokens — an aggressive number for a multimodal model that can both see and act.

Qwen3.7-Plus understands images and video alongside text, then layers on five agentic skills: deep step-by-step reasoning, self-programming (writing and revising its own code), external tool and API invocation, output verification and testing, and autonomous iteration that loops until a task is genuinely complete. That last capability — keep working, check the result, try again — is what separates a true agentic model from a one-shot assistant.

Benchmark Results Point to Real Agentic Competence

The general-availability release builds on the Qwen3.7-Plus-Preview, which ranked #16 on Vision Arena in May and made Alibaba the #5 lab in vision. On agentic and GUI-automation benchmarks, the model scores 79.0 on GUI automation, with strong showings across ScreenSpot Pro, OSWorld-Verified, AndroidWorld, Terminal Bench, and SWE-Bench.

Why GUI Automation Scores Matter

GUI-automation benchmarks measure whether a model can actually operate software the way a person does — locating buttons, reading screens, and completing multi-step workflows. A high score there suggests Qwen3.7-Plus can reliably drive applications, which is the foundation for practical computer-use agents. Combined with its self-programming and verification skills, the model can write code, run it, check whether it worked, and iterate without a human babysitting every step.

Accessible Pricing Expands the Developer Toolkit

The $0.40-per-million-input-token price is the part that makes this broadly interesting. Agentic workloads tend to be token-hungry because the model reasons, acts, checks, and retries. Low input pricing keeps those loops affordable, which means more developers can experiment with autonomous agents that operate GUIs, write and test code, and handle multi-step tasks end to end.

The wider picture is a healthy one for the open ecosystem. A competitively priced multimodal agent model gives builders another strong option, and the steady benchmark progress from the Qwen team shows how quickly accessible agentic AI is maturing. For anyone prototyping computer-use agents or multimodal assistants, Qwen3.7-Plus is a welcome addition to the toolbox.

Sources: MarkTechPost (June 2, 2026); AI Weekly (June 2, 2026); apidog (June 2026).