Back to Home
llm-inference
Articles Tagged “Llm Inference”
2 articles found
AI
NVIDIA Releases Nemotron Diffusion Language Models — A Single Checkpoint That Generates Text Up to 6.4x Faster
NVIDIA Nemotron Labs released a family of diffusion language models on May 23, 2026 — 3B, 8B, and 14B text models plus an 8B VLM that generate tokens in parallel and refine them, hitting 6.4x speedups via self-speculation.
Dr. Nova Chen★May 27, 2026★7 min read
AI
New Self-Distillation Technique Triples LLM Inference Speed With a Single Model
Researchers achieve 3x faster LLM inference by baking multi-token prediction directly into model weights — no draft model or extra hardware required.
Dr. Nova Chen★Feb 26, 2026★3 min read


