Skip to main content
The Quantum Dispatch
Back to Home
Cover illustration for Meta Launches Llama 4 Scout and Maverick: Multimodal MoE AI Goes Open-Weight

Meta Launches Llama 4 Scout and Maverick: Multimodal MoE AI Goes Open-Weight

Meta's Llama 4 Scout and Maverick bring multimodal mixture-of-experts AI to the open-source community, with an unprecedented 10 million token context window.

Dr. Nova Chen
Dr. Nova ChenApr 9, 20265 min read

The Open-Weight AI Landscape Just Changed Dramatically

On April 5, 2026, Meta released Llama 4 Scout and Llama 4 Maverick to the public — and the AI research community took notice immediately. These are not incremental updates to the Llama series. They represent a fundamental architectural shift: the first open-weight, natively multimodal models built around a mixture-of-experts (MoE) design, with context window support unlike anything previously available in open-source AI.

Llama 4 Scout is a 17-billion active parameter model with 16 experts (109B total parameters) and a context window of 10 million tokens. Llama 4 Maverick, the larger sibling, targets more compute-rich deployments while maintaining the same MoE multimodal architecture. Both models are available to download on llama.com and Hugging Face under Meta's community license.

What Makes Mixture-of-Experts Different

In a standard dense model, every parameter activates for every input token. In a mixture-of-experts architecture, only a subset of the network's expert modules activate for any given token — in Scout's case, approximately 17 billion out of a total 109 billion parameters. The result is that MoE models achieve the quality of much larger dense models while using a fraction of the compute during inference.

For Scout specifically, this design means the model fits in a single NVIDIA H100 GPU while delivering benchmark performance competitive with models requiring significantly more hardware. That is a meaningful efficiency milestone for researchers, developers, and organizations wanting powerful open-weight AI without cloud-scale infrastructure.

True Multimodal Architecture from the Ground Up

Crucially, Llama 4 Scout and Maverick are natively multimodal — not text models with a vision adapter added afterward. They were trained end-to-end on images and text simultaneously, which means the model genuinely understands relationships between visual and textual information rather than treating image understanding as a secondary task.

This architectural choice positions Llama 4 as a serious foundation for vision-language applications: document understanding, image analysis pipelines, scientific figure interpretation, and visual question answering can all leverage the model's unified representations.

10 Million Tokens: The Context Revolution

The 10-million token context window in Scout deserves special attention. At 10M tokens, a researcher could feed an entire book collection, a year of email archives, or a comprehensive codebase — and query the model with full recall across the entire corpus. For agentic workflows requiring long-running context across complex multi-step tasks, this capability is genuinely transformative.

Consumer-accessible deployments may not hit the full 10M limit in practice, but even at 500K to 1M tokens, Scout dramatically exceeds most available open-weight models in context capacity.

Availability and Community Impact

Both models are downloadable from llama.com and Hugging Face immediately, with access expanding across cloud providers, edge silicon partners, and major AI platforms. Meta has positioned these as the first in a new era of natively multimodal AI innovation. With Llama 4, the open-source ecosystem gains its most capable open-weight multimodal models to date — researchers can fine-tune them, developers can build on them, and organizations can deploy them without cloud lock-in.

Sources: Meta AI Blog (April 5, 2026), TechCrunch (April 5, 2026), VentureBeat (April 5, 2026)