Alibaba's Qwen3.7 — Max ran 35 hours autonomously to optimize a kernel for its custom AI chip

News

5/23/2026, 10:53:28 AM

Alibaba's Qwen3.7 — Max ran 35 hours autonomously to optimize a kernel for its custom AI chip

On May 23, 2026 Alibaba released Qwen3.7 — Max, an API-only model designed for long-running autonomous agent workflows; the team demonstrated a 35 — hour autonomous kernel‑optimization run that produced up to a 10× speedup on their reference implementation.

Alibaba’s Qwen team on May 23, 2026 published Qwen3.7 — Max, presenting it as a model built specifically for prolonged autonomous agent workflows and offering it exclusively through the Alibaba Cloud Model Studio API. In a headline demonstration, the model ran autonomously for roughly 35 hours to optimize a hardware attention kernel for Alibaba’s own accelerators — an experiment the team says shows the model’s ability to sustain complex, long‑running engineering tasks within the cloud provider’s ecosystem.

Qwen3.7 — Max is positioned around four primary use cases: acting as a coding agent for projects from front‑end prototypes to multi‑file systems, automating office workflows via external tools, sustaining extended autonomous runs, and operating consistently across different agent frameworks. The model exposes OpenAI‑ and Anthropic‑compatible interfaces and integrates with tooling such as Claude Code, OpenClaw and Qwen Code to fit into existing agent stacks.

The team’s most detailed stress test tasked Qwen3.7 — Max with optimizing a hardware‑based attention kernel for the open‑source inference runtime SGLang on cloud instances running T‑Head‑ZW‑M890 accelerators. Starting without chip documentation, measurement data or sample code and working from a Triton reference implementation, the model executed 432 distinct kernel tests and issued 1,158 tool calls over the roughly 35‑hour run, iterating autonomous compile/measure/revise loops and autonomously handling compilation errors.

According to the Qwen team, that experiment produced an average 10× speedup over the reference implementation. On the standardized KernelBench L3 benchmark, Qwen3.7 — Max produced accelerated kernels 96% of the time; by comparison the team reports Anthropic’s Opus 4.6 at 98% on the same benchmark. In raw speedup comparisons under the same setup, GLM 5.1 reached 7.3×, Kimi K2.6 5×, DeepSeek V4 Pro 3.3×, and Qwen3.6 — Plus 1.1×.

The team credits part of the model’s robustness to a training split that isolates the task, the tool environment and the validator, allowing components to be mixed during training so strategies generalize across setups. Cross‑harness tests reportedly show near‑identical scores for Qwen3.7 — Max across OpenClaw, Claude Code and Hermes, and stable results on QwenClawBench and CoWorkBench rather than framework‑specific shortcuts.

Beyond kernel optimization, the team says Qwen3.7 — Max was used to autonomously detect undesirable behavior and cheating attempts during training and was demonstrated steering a four‑legged robot. The move to API‑only distribution — the last openly released flagship listed as Qwen3.5 — 397B‑A17B in February 2026 — means builders gain a model tailored for extended autonomous engineering tasks but must work inside Alibaba Cloud’s API and proprietary hardware environment.

Sources

The Decoder AI · 5/23/2026

Replies (0)

No replies in this topic yet.

Back