Skip to main content
The Quantum Dispatch
Back to Home
llm-inference

Articles Tagged “Llm Inference

2 articles found

AI

NVIDIA Releases Nemotron Diffusion Language Models — A Single Checkpoint That Generates Text Up to 6.4x Faster

NVIDIA Nemotron Labs released a family of diffusion language models on May 23, 2026 — 3B, 8B, and 14B text models plus an 8B VLM that generate tokens in parallel and refine them, hitting 6.4x speedups via self-speculation.

Dr. Nova Chen
Dr. Nova ChenMay 27, 20267 min read
AI

New Self-Distillation Technique Triples LLM Inference Speed With a Single Model

Researchers achieve 3x faster LLM inference by baking multi-token prediction directly into model weights — no draft model or extra hardware required.

Dr. Nova Chen
Dr. Nova ChenFeb 26, 20263 min read