MiniMax Unveils M3 with Sparse Attention, Claims 1,000,000‑Token Context and Native Multimodality

News

6/2/2026, 5:55:05 AM

MiniMax Unveils M3 with Sparse Attention, Claims 1,000,000‑Token Context and Native Multimodality

MiniMax launched M3 on June 1, 2026, introducing MiniMax Sparse Attention (MSA) to support a claimed 1,000,000‑token context while accepting native image, video and desktop compute inputs.

MiniMax announced M3 on June 1, 2026, a next‑generation M‑series model that pairs a new sparse‑attention architecture with native multimodal input handling and a claimed 1,000,000‑token context window. The company made its API, the MiniMax Code product and the MiniMax Token Plan available immediately, and said model weights plus a technical report will be published within 10 days of launch. Developers can begin using the platform tools now while awaiting the released weights and documentation.

The core architectural change is MiniMax Sparse Attention (MSA). Unlike full attention, which scales quadratically with sequence length, MSA partitions the key/value cache into blocks and applies a KV‑outer gather Q approach so each KV block is read once with contiguous memory access, which MiniMax says reduces per‑token compute at extreme context lengths.

Under the reported head configuration, MiniMax claims MSA is more than 4× faster than open‑source sparse implementations such as Flash — Sparse‑Attention and flash — moba. At a 1,000,000‑token context, MiniMax states M3’s per‑token compute is about 1/20th that of the prior M2 generation, with reported prefill speedups of >9× and decoding speedups of >15× versus M2 at that context.

MiniMax says M3 was trained with mixed modalities from step zero-text, images and video interleaved in sequences — after rebuilding its data pipeline. Training data scaled to roughly 100 trillion tokens, and the company emphasizes interleaved multimodal sequences as critical to the model’s performance. M3 natively accepts image and video inputs and can operate desktop compute workflows.

Coding and agentic capabilities are a major focus in MiniMax’s published evaluations. Reported results include SWE‑Bench Pro at 59.0% (surpassing GPT‑5.5 and Gemini 3.1 Pro and approaching Opus 4.7), Terminal‑Bench 2.1 at 66.0%, SWE‑efficiency 34.8%, KernelBench Hard 28.8% (evaluated on NVIDIA Blackwell GPUs, CUDA sm_120), MCP Atlas 74.2% and a top Claw‑Eval result across 161 general tasks. On multimodal tests, M3 scores above Gemini 3.1 Pro on OmniDocBench, and OSWorld — Verified shows a 70.06% task completion rate (Max Steps = 200).

For builders, MiniMax highlights two practical implications: a single open‑weight architecture that combines long context, native multimodality and coding performance, and new evaluation tooling. The company also released an interactive user simulator framework for multi‑turn developer workflows — covering requirement elaboration, iterative feedback and task switching — intended to narrow the gap between single‑turn benchmarks and real‑world, multi‑turn coding and agentic tasks.

Sources

MarkTechPost AI · 6/1/2026

Replies (0)

No replies in this topic yet.

Back