Stability AI launches Stable Audio 3.0 with six-minute generation and three open models

News

5/21/2026, 1:08:00 AM

Stability AI launches Stable Audio 3.0 with six-minute generation and three open models

Stability AI released Stable Audio 3.0, a family of text-to-audio models that can generate music up to about six minutes long; three smaller variants are available with open weights while the largest model is restricted to API and enterprise channels.

Stability AI has launched Stable Audio 3.0, a suite of text-to-audio models that can produce music tracks up to about six minutes long, marking a major step in the company’s push into audio generation. The update matters because it combines variable — length generation, on-device composition, and a mix of open weights and enterprise — only access — giving builders, platforms, and businesses distinct paths to adopt the technology.

The Stable Audio 3.0 family includes four variants. Two Small models — Stable Audio 3.0 Small SFX and Stable Audio 3.0 Small — each have 459 million parameters and generate tracks up to two minutes long; the SFX variant targets sound effects and lightweight devices. Stable Audio 3.0 Medium runs 1.4 billion parameters and can generate audio up to 6:20 minutes; the Medium model and both Small models are being released with open weights on Hugging Face.

The largest variant, Stable Audio 3.0 Large, has 2.7 billion parameters and is available only through the Stability AI API, partner fal.ai, or enterprise licensing. Stability AI describes the release’s core architecture as a semantic — acoustic autoencoder that supports variable — length generation and a second — level control mechanism over output. The Small model is positioned as the only variant that enables full music composition on-device and offline, without the short — sample limits of earlier releases. Stability AI is also publishing LoRA training documentation alongside the Small and Medium weights to let users fine-tune the models on their own audio libraries.

Editing and extension tools are built into the 3.0 models: inpainting lets users edit individual segments or multiple sections of a track, and causal continuation can extend an existing piece beyond its original endpoint. Stability AI offers guided fine-tuning support for enterprise customers, and it frames the Large model as delivering the highest musicality for music platforms that need large — volume generation and prefer API or on-premise enterprise hosting.

On licensing and commercial terms, Stable Audio 3.0 is covered by the Stability AI Community License: users own the audio they generate and may use it commercially for free up to $1 million in annual revenue. Organizations exceeding that threshold must obtain enterprise licensing, which provides expanded commercial coverage and legal indemnification. Stability AI says the models were trained entirely on licensed data and cites partnerships with Universal Music Group and Warner Music Group in support of that claim.

The release further underscores Stability AI’s shift from image generation toward audio. The company’s earlier Stable Audio milestones include a September 2023 debut supported by AudioSparx’s roughly 800,000 — song library, Stable Audio 2.0 in April 2024 (up to three minutes at 44.1 kHz), an open-source Stable Audio Open in summer 2024, and a compact Open Small with Arm in May 2025.

Sources

The Decoder AI · 5/20/2026

Replies (0)

No replies in this topic yet.

Back