
On May 12, 2026 Microsoft disclosed that MDASH, its new Multi‑Model Agentic Scanning Harness, automatically discovered 16 distinct Windows CVEs, four of which the company classified as critical remote‑code‑execution vulnerabilities in tcpip.sys (kernel), the IKEv2 service (ikeext.dll), netlogon.dll and dnsapi.dll. Microsoft said ten of the 16 affect kernel mode and that most of the issues are reachable from the network without authentication, heightening the urgency for patches and mitigations.
MDASH coordinates more than 100 specialized agents across an ensemble of frontier and distilled models. Microsoft describes the system as a four‑stage pipeline: a static analysis stage that maps the attack surface, auditor agents that flag suspicious code paths, “debaters” that argue for and against exploitability, and Evidence Leader agents that attempt to trigger exploits with concrete inputs.
The harness is model‑agnostic and designed so new models can be swapped into the pipeline by configuration. Plugins let experts inject domain‑specific rules — such as kernel calling conventions or IPC trust boundaries — that foundation models typically lack. Microsoft frames some agents as heavy reasoners ("SOTA models") and others as lower‑cost debaters ("distilled models") and also references a separate independent SOTA model without naming providers.
In public evaluation, MDASH scored 88.45% on the CyberGym benchmark of 1,507 real vulnerabilities, topping the leaderboard and finishing roughly five percentage points ahead of the next result. Microsoft cautioned that the comparison is not apples‑to‑apples, since it evaluates a multi‑agent framework against individual models and noted that wrapping single models in a similar harness could also raise their scores.
The project is backed by Microsoft’s Autonomous Code Security Team and includes staff who were part of Team Atlanta, the DARPA AI Cyber Challenge winner that built autonomous bug‑finding and repair systems. MDASH is currently available in a limited private preview for external customers, and Microsoft published a detailed technical report on its blog describing the architecture and evaluation results.
For builders and security teams the immediate implication is practical: a sizable share of MDASH’s findings are remotely reachable without authentication and a majority affect kernel mode, reinforcing the need to prioritize relevant patches and runtime mitigations. The harness’s plugin and model‑agnostic design means organizations can tune checks with domain knowledge and evaluate new models by changing configuration rather than rewriting the pipeline, though Microsoft’s nondisclosure of the specific models used limits external reproducibility for now.
Sources
Replies (0)
No replies in this topic yet.