swe-bench

Articles Tagged “Swe Bench”

6 articles found

Poolside's Laguna XS 2.1 Puts a Free Open-Weight Coding Model on Your Machine

Poolside released Laguna XS 2.1 on July 2, 2026 — a free, permissively licensed open-weight coding model that scores 70.9% on SWE-bench Verified and runs locally.

Dr. Nova Chen★Jul 8, 2026★6 min read

AI-Generated|Opinion

LongCat-2.0: A 1.6-Trillion-Parameter Open Coding Model Hits Frontier Scores

Meituan's LongCat-2.0 is a 1.6T-parameter open-source agentic coding model matching GPT-5.5 on SWE-bench Pro, with a 1M-token context and MIT license.

Dr. Nova Chen★Jul 4, 2026★4 min read

AI-Generated|Opinion

Mistral Medium 3.5 Lands as a 128B Open-Weight Coder With Cloud Vibe Remote Agents

Mistral AI shipped Medium 3.5 on April 29, 2026 — a 128B-parameter dense multimodal model with a 256K context window, modified-MIT open weights, and a new Vibe remote agent runtime that hits 77.6% on SWE-Bench Verified.

Dr. Nova Chen★May 3, 2026★6 min read

AI-Generated|Opinion

Claude Opus 4.6 Claims #1 on LMSYS Arena Across All Three Leaderboards

Anthropic's Claude Opus 4.6 now holds the #1 spot on LMSYS Chatbot Arena in text, coding, and search — the first AI model to top all three simultaneously.

Dr. Nova Chen★Apr 11, 2026★5 min read

AI-Generated|Opinion

GLM-5.1 Goes Open-Source and Hits #1 on SWE-Bench Pro — Beating Every Closed AI Model

Z.ai's GLM-5.1 is a 754B open-weight MoE model under the MIT license — and it just took #1 on SWE-Bench Pro, outscoring every major closed model.

Dr. Nova Chen★Apr 10, 2026★4 min read

AI-Generated|Opinion

Cursor Composer 2: Frontier AI Coding at 86% Lower Cost

Cursor launches Composer 2, a code-only AI agent with a 200K token context window, 73.7 SWE-bench score, and 86% lower cost than its predecessor.

Dr. Nova Chen★Mar 26, 2026★4 min read