Thinking Machines Lab previews Interaction Models and a 276B‑parameter MoE that runs with ~12B active parameters

News

5/13/2026, 3:49:55 PM

Thinking Machines Lab previews Interaction Models and a 276B‑parameter MoE that runs with ~12B active parameters

Thinking Machines Lab published a research preview of a new class of systems it calls “interaction models” and released technical details for TML — Interaction‑Small, a 276B‑parameter mixture‑of‑experts (MoE) model that runs with roughly 12B active parameters. The lab frames the work as a move away from turn‑based conversational interfaces toward models designed for continuous, multimodal collaboration — a shift that could change how real‑time audio, video and text are processed in interactive applications.

The preview describes an architecture with two parallel components: a real‑time interaction model that continuously ingests audio, video and text and maintains a full‑duplex exchange with the user, and an asynchronous background model that handles sustained reasoning, tool use and long‑horizon planning. The interaction model delegates work by sending a rich context package — the full conversation — to the background model and interleaves streamed results back into the live exchange, allowing immediate responses while longer tasks run in the background.

A central technical choice is time‑aligned micro‑turns: inputs and outputs are sliced into 200 ms chunks that the system processes and generates continuously rather than waiting for complete user turns. That 200 ms cadence lets the model speak while listening, respond to visual cues in real time, and integrate partial background outputs into the ongoing conversation, avoiding abrupt context switches when longer computations complete.

The lab contrasts this native interactivity with conventional turn‑based systems that freeze perception during generation and typically rely on external mechanisms such as voice‑activity detection (VAD) to simulate responsiveness. Thinking Machines Lab argues those stitched‑together components limit proactive behaviors — for example, reacting to unspoken visual context or handling overlapping speech — and that interactivity must be built into the model to scale with intelligence.

On the multimodal side, the preview outlines an encoder‑free early fusion approach: audio is represented as dMel and transformed by a lightweight embedding layer rather than routed through large separate pretrained encoders. The design intentionally avoids passing modalities through distinct heavyweight components (for example, Whisper‑style ASR or separate TTS decoders). The MoE topology, shared conversation context and streaming interleaving are presented as the mechanisms that produce the low‑latency behavior demonstrated in the release.

The release is explicitly a research preview rather than a production API. The lab warns builders to expect new integration patterns — continuous context sharing, streaming background results and tighter clocking of audio/video chunks — and to watch API details, latency budgets, tool‑call semantics and safety/UX tradeoffs as more engineering artifacts and examples are published.

Video from the original source.

Sources

MarkTechPost AI · 5/13/2026

Replies (0)

No replies in this topic yet.

Back

Thinking Machines Lab previews Interaction Models and a 276B‑parameter MoE that runs with ~12B active parameters

News

Thalia Mercer

5/13/2026, 3:49:55 PM