Anthropic's Claude Mythos Preview Outpaces AISI Forecasts, Clears Multiple Cyber Simulations

News

5/14/2026, 11:51:42 AM

Anthropic's Claude Mythos Preview Outpaces AISI Forecasts, Clears Multiple Cyber Simulations

Britain’s AI Security Institute (AISI) says frontier models are advancing faster than its latest forecasts: After halving its estimated AI cyber capability doubling time from eight months to about 4.

Anthropic’s Claude Mythos Preview and OpenAI’s GPT-5.5 have outpaced the UK AI Security Institute’s (AISI) revised forecasts for cyber skill growth, producing a sudden performance leap that shortens the time defenders have to respond. AISI had cut its estimated capability — doubling interval from eight months to roughly 4.7 months; the agency says the new Mythos checkpoint and GPT-5.5 have substantially exceeded even that accelerated pace in its testbeds, clearing late-stage escalation scenarios the institute had expected would take longer to reach.

AISI runs cyber ranges that include a 32 — step simulated corporate network attack that would take a human expert about 20 hours to complete. The new Claude Mythos Preview checkpoint finished the full 32 — stage attack in six out of ten attempts, against three out of ten for an earlier Mythos checkpoint. That same Preview checkpoint solved AISI’s industrial control system “Cooling Tower” simulation in three out of ten runs, a scenario no previously tested model had passed, demonstrating both broader and deeper practical exploitation ability in these controlled exercises.

The agency warned that the saturation of its existing ranges undermines their informativeness unless evaluations evolve. AISI noted both Mythos Preview and GPT-5.5 “substantially exceeded” prior trends, but said it is still uncertain whether the jump represents a persistent acceleration or a one-off improvement. Because models can now reach late-stage escalation within the token budgets AISI uses, the institute is drafting harder evaluations with active defenses and has distributed the new Mythos checkpoint to partners to aid faster, broader testing and mitigation planning.

Independent offensive security firm XBOW conducted a ten-expert assessment of Mythos Preview and reported marked gains in source — code analysis. XBOW described a near token — for-token precision in vulnerability detection, finding a 42% reduction in false negatives compared with Anthropic’s Opus 4.6; when testers were given additional source — code access, the false — negative rate fell 55%. XBOW also said Mythos Preview identified vulnerabilities in Chromium’s V8 sandbox where prior models mostly produced false positives, a sign that static analysis capability has advanced materially.

XBOW and AISI both flagged practical limits: while increased access to source code amplified Mythos’s strengths, access to a running system can be more important for real-world exploitation. The firms’ findings imply that improvements in static analysis alone will not fully predict a model’s operational hazard, and that red-teamers and defenders must consider a range of access scenarios when assessing risk.

The immediate consequence for defenders and platform operators is compressed time for mitigation and less reliable forecasting of model progress. Anthropic’s head of red teaming, Logan Graham, underscored the rapid pace of change, warning, “Within a year, Mythos will probably look quite dumb,” a reminder that advances both raise near-term risk and can be quickly eclipsed by further progress. Security teams should assume models will keep reaching and surpassing current benchmarks as evaluations are updated.

Sources

The Decoder AI · 5/14/2026

Replies (0)

No replies in this topic yet.

Back