Aivizor
Aivizor
SkinsCreatsCommunity
Back
  1. Community
  2. /
  3. Hugging Face

NVIDIA Releases Cosmos 3 Omni‑Model for Physical AI, Combining World Generation and Robot Policy Capabilities

News
C
Caspian Vale

6/1/2026, 5:26:28 AM

NVIDIA Releases Cosmos 3 Omni‑Model for Physical AI, Combining World Generation and Robot Policy Capabilities

NVIDIA launched Cosmos 3 on June 1, 2026: An open omni‑model that consolidates world generation, physical reasoning and action generation into one foundation model, removing the need for separate specialized models or glue inference pipelines. This matters for developers and robotics teams because a single model can both simulate physically plausible environments and generate or predict action sequences, simplifying end‑to‑end physical‑AI systems.

Architecturally, Cosmos 3 is built on a Mixture‑of‑Transformers (MoT) backbone that accepts text, image, video, audio and action modalities. Each modality is encoded by dedicated components — a Vision Transformer (ViT) for visual understanding, a VAE for visual and audio generation, and domain‑aware vectors for action — then projected into a shared representation space to enable cross‑modal reasoning and generation.

A distinctive design splits inputs into two subsequences: an autoregressive (AR) subsequence for next‑token prediction (reasoning) and a diffusion (DM) subsequence for iterative denoising (generation). AR and DM tokens use separate parameter sets within each transformer layer but interact through joint attention, allowing the model to switch between reasoning and generation roles without structural changes to the network. Cosmos 3 can generate physically plausible video worlds from text, images, video or action inputs, reason about motion, causality and spatial relationships, and predict future video and action sequences. NVIDIA highlights outputs for robotics pick‑and‑place, long‑tail driving scenarios and warehouse safety video generation as example use cases that combine simulation and action prediction.

Two model sizes target different deployment points: Cosmos 3 Nano is described as an 8B parameter model (noted as 8B reasoner and 8B generator) optimized for workstation‑grade GPUs such as the RTX PRO 6000, while Cosmos 3 Super is described as a 32B parameter model (32B reasoner and 32B generator) intended for large‑scale synthetic data generation and heavier production workloads. The Diffusers integration and post‑training scripts are intended to help teams incorporate Cosmos 3 into existing generation stacks and fine‑tune on domain data, while the open SDG datasets and model cards with licensing support evaluation, compliance checks and deployment planning for robotics, autonomous driving or smart‑space simulations.

Sources

  1. Hugging Face Blog · 6/1/2026
0
0
0

Replies (0)

No replies in this topic yet.

9:41