NVIDIA publishes Nemotron‑Labs Diffusion models for faster, revision‑capable Text Generation

News

5/23/2026, 12:35:34 AM

NVIDIA publishes Nemotron‑Labs Diffusion models for faster, revision‑capable Text Generation

NVIDIA has published the Nemotron‑Labs Diffusion family: Diffusion‑capable language models at 3B, 8B and 14B parameters plus an 8B vision‑language model, with code, training recipe and a technical report.

On May 23, 2026 NVIDIA released Nemotron‑Labs Diffusion, a family of diffusion‑style language models accompanied by a technical report and author credit to Mehran Maghoumi, Yonggan Fu, Pavlo Molchanov and colleagues. The package includes model weights, training recipes and code so practitioners can inspect the architecture and experiment with the diffusion‑style generation approach alongside standard autoregressive workflows.

The family consists of text models at roughly 3B, 8B and 14B parameters and an 8B vision‑language model; NVIDIA is providing both base checkpoints and instruction‑tuned chat variants. Distribution terms differ by model: the text models are released under the NVIDIA Nemotron Open Model License (a commercially oriented license), while the 8B vision‑language model is provided under the NVIDIA Source Code License. Training code and recipes are published through NVIDIA’s Megatron Bridge framework to support reproduction and adaptation of training runs.

Nemotron‑Labs Diffusion departs from token‑by‑token autoregressive decoding by drafting multiple tokens in parallel and iteratively refining them. The architecture exposes three inference modes: autoregressive (standard left‑to‑right decoding), diffusion (block‑by‑block iterative generation) and self‑speculation, in which the model drafts multiple candidate tokens in parallel and uses autoregressive verification to select or correct them. This generate‑and‑refine design supports token revision during inference and provides a clear knob for inference budget by varying the number of refinement steps.

For developers and engineers the release offers concrete performance and capability tradeoffs. Diffusion‑style drafting can lower memory‑bound costs tied to per‑token model passes and better utilize modern GPUs, which is particularly valuable in latency‑sensitive workloads and single‑query (batch size = 1) scenarios. The ability to revise tokens during generation makes these models better suited to fill‑in‑the‑middle and post‑editing tasks. Self‑speculation is presented as a compromise that aims to combine the speed potential of parallel drafting with the reliability of autoregressive verification, enabling teams to trade throughput for accuracy at deployment time with minimal application changes.

NVIDIA has packaged the research artifacts — models, the training recipe, Megatron Bridge codebase and the technical report — so engineering teams can benchmark inference modes and integrate the models into existing stacks. The provided licenses clarify commercial and research usage for the different checkpoints. For implementation guidance, inference‑mode behavior and reproduction steps, consult the published technical report and the repositories linked from the release.

Sources

Hugging Face Blog · 5/23/2026

Replies (0)

No replies in this topic yet.

Back