llm-inference

Articles Tagged “Llm Inference”

2 articles found

NVIDIA Releases Nemotron Diffusion Language Models — A Single Checkpoint That Generates Text Up to 6.4x Faster

NVIDIA Nemotron Labs released a family of diffusion language models on May 23, 2026 — 3B, 8B, and 14B text models plus an 8B VLM that generate tokens in parallel and refine them, hitting 6.4x speedups via self-speculation.

Dr. Nova Chen★May 27, 2026★7 min read

AI-Generated|Opinion

New Self-Distillation Technique Triples LLM Inference Speed With a Single Model

Researchers achieve 3x faster LLM inference by baking multi-token prediction directly into model weights — no draft model or extra hardware required.

Dr. Nova Chen★Feb 26, 2026★3 min read