Skip to main content
The Quantum Dispatch
Back to Home
Cover illustration for DeepSeek V4-Pro and V4-Flash Are Here: Open-Source AI With a 1M-Token Context Window

DeepSeek V4-Pro and V4-Flash Are Here: Open-Source AI With a 1M-Token Context Window

DeepSeek drops V4-Pro (1.6T params) and V4-Flash today with 1M-token context, hybrid attention, and pricing that challenges every closed-source frontier model.

Dr. Nova Chen
Dr. Nova ChenApr 24, 20265 min read

DeepSeek Returns With Its Most Ambitious Models Yet

A year after DeepSeek rattled the AI industry, the Chinese research lab dropped preview versions of two new models on April 24, 2026, that demonstrate how rapidly open-source AI has caught up to the frontier. DeepSeek-V4-Pro and DeepSeek-V4-Flash are available through the DeepSeek API starting today — and they arrive with a technical architecture that reframes what open-weight models can do.

The headline capability: a 1 million token context window, on both models, with efficiency improvements that make it economically practical rather than theoretically possible.

Two Models, One Architectural Leap

DeepSeek-V4-Pro is a 1.6 trillion parameter mixture-of-experts model with 49 billion parameters active per token inference. The Flash variant carries 284 billion total parameters with 13 billion active. Both support the full 1M token context and share the same Hybrid Attention Architecture that makes that context length feasible at production cost.

One million tokens is enough to ingest an entire large codebase, a year of financial filings, a full-length research corpus, or a long-form legal document suite in a single prompt. No chunking, no retrieval-augmented workarounds, no context management overhead. The model simply holds it.

The Hybrid Attention Architecture

Getting to 1M tokens efficiently required rethinking how attention works at scale. DeepSeek's V4 series introduces what it calls Hybrid Attention, combining two complementary mechanisms:

Compressed Sparse Attention (CSA) handles the majority of token relationships with compressed representations, reducing memory bandwidth requirements at scale. Heavily Compressed Attention (HCA) applies maximum compression to the most distant token relationships in long sequences.

The quantitative result: at the 1M-token setting, V4-Pro requires only 27% of the single-token inference FLOPs and 10% of the KV cache compared with DeepSeek-V3.2. That efficiency profile is what makes a 1M context window economically deployable rather than just a benchmark number.

Benchmark Performance

DeepSeek-V4-Pro leads all open-weight models on mathematics and coding benchmarks. It trails only Google's Gemini 3.1-Pro on world knowledge evaluations. The V4-Pro-Max reasoning mode — the highest thinking-effort configuration — closes the gap with GPT-5.4 and Gemini 3.1-Pro to what DeepSeek characterizes as three to six months behind the frontier.

For an open-source model available at a fraction of closed-source API pricing, that margin represents a significant narrowing of what has historically been a much wider capability gap.

Pricing That Changes the Build Calculus

DeepSeek's pricing structure continues to compress the economics of production AI:

- V4-Flash: $0.14/million tokens input, $0.28/million tokens output

- V4-Pro: $1.74/million tokens input, $3.48/million tokens output

At V4-Flash pricing, a full 1M-token context request costs $0.14 input — competitive with smaller models from providers who charge multiples more for shorter contexts. V4-Pro offers frontier-adjacent reasoning at a cost that makes production deployment viable for teams where GPT-5.5's $5–$30/million token pricing would make the use case economically impossible.

Huawei Ascend Support

A strategic note from the release: V4 was developed with full compatibility with Huawei's Ascend supernode clusters. Huawei confirmed its Ascend AI supernodes fully support V4 deployments. This gives teams in markets without NVIDIA GPU access a validated path to running V4-scale inference infrastructure.

Open Source, One Year Later

The arc from DeepSeek-V3's January 2025 debut to V4's April 2026 arrival tells a clear story: open-source development cycles are compressing rapidly. What was a 12-to-18-month capability gap between open and closed models has become three to six months, and each new release narrows it further.

For researchers who need long-context analysis, developers building production RAG or agentic pipelines, and engineering teams weighing closed-source API costs against open-weight alternatives — DeepSeek V4 arrives as a technically credible, economically serious option.

The 1M-token context window, Hybrid Attention efficiency gains, and aggressive pricing make this release more than a benchmark exercise. V4 is open-source infrastructure that production teams will actually deploy.

Sources: DeepSeek API Docs (April 24, 2026), Bloomberg (April 24, 2026), TechCrunch (April 24, 2026), The Next Web (April 24, 2026), CNBC (April 24, 2026)