Mozilla’s agentic testing pipeline, using Anthropic’s Claude Mythos Preview, produced a high-confidence set of fixes that included 271 previously unknown vulnerabilities in Firefox 150, helping drive a total of 423 resolved security issues in April — up sharply from a prior monthly record of 76 in March. The results mark a material increase in the cadence of internally discovered fixes and signal a new approach to finding and validating bugs.
At the center of the effort was a pipeline that lets the model construct and execute its own test cases to confirm suspected bugs, rather than only scanning code. Mozilla began with supervised runs on Claude Opus 4.6, then scaled the process across many virtual machines so each run could check a single file in parallel; the team added deduplication, prioritization and end-to-end tracking to move findings into fixes and releases.
Of the 423 April fixes, 271 were traced directly to Mythos findings in Firefox 150. Mozilla says the remaining 152 fixes were handled internally; within that set it identified 111 additional internally discovered issues, roughly a third of which also stemmed from Mythos. Other reports in the internal tally came from the same pipeline running different models and from traditional fuzzing. Only 41 of the 423 total vulnerabilities were reported externally.
Mozilla published concrete examples to substantiate the results: a 15‑year‑old bug in the HTML label element used for form descriptions, a 20‑year‑old bug in the XSLT XML tool, multiple sandbox‑escape vectors, and an overflow triggered when an HTML table exceeds 65,535 rows. The team also reported at least one case where RLBox, Mozilla’s third‑party library sandbox, was bypassed. Many of the individual flaws would need to be chained to produce full remote exploits.
The work reflects a technical shift from read‑only code scans with large models — which produced many false positives — to agentic workflows that self‑verify findings by constructing and running tests. Mozilla cited earlier failures with GPT‑4 and Claude Sonnet 3.5 in read‑only modes; the pipeline’s self‑execution step is intended to filter speculation and reduce spurious reports. Anthropic’s Frontier Red Team also delivered an initial batch of reports in February that fed into the pipeline.
Mozilla plans to make the agentic pipeline part of the pre‑commit process so every new piece of code is automatically checked before it lands. The results underscore two practical implications for builders and security teams: agentic AI can scale high‑precision vulnerability discovery and provide empirical validation that longstanding architectural defenses — for example, protections against Prototype Pollution — still hold up, even as human review and chaining analysis remain necessary to assess exploitability.
Sources
Replies (0)
No replies in this topic yet.