Autonomous LLM Experiments Boost DBM Query-Optimization Agent Precision: what developers gain

News

5/24/2026, 5:16:17 AM

Autonomous LLM Experiments Boost DBM Query-Optimization Agent Precision: what developers gain

A Database Monitoring (DBM) engineering team augmented its query — optimization recommender with an LLM-backed agent and, after running 23 autonomous experiments via Karpathy’s autoresearch, increased the agent’s precision from P=0.54 to P=0.86 overnight. The team paired those trials with LLM Observability Experiments to capture every hypothesis, input, output and failure for reproducibility and analysis, accelerating iteration while preserving auditability. This improvement matters because it helps surface complex optimization opportunities that standard pattern — based rules struggle to detect.

The existing DBM recommender remains a multi — source heuristic engine (implemented in Go) that combines SQL parse — tree analysis, real EXPLAIN plans, schema metadata and runtime metrics. That heuristic engine targets six pattern families — missing-index detection with plan-flip analysis; SELECT * expansion; ORDER BY without LIMIT; OFFSET without ORDER BY; idle-in-transaction detection; and comprehensive SQL rewrite rules — and scored P=0.903 on the evaluation dataset. In other words, the heuristics are precise on the patterns they are built to catch, but they do not cover every cross — signal or semantic case.

To complement the heuristics, the team explicitly positioned the LLM agent to run after the heuristic engine so it would surface optimization opportunities that are hard to encode as single rules. The agent looks for cases that require cross — referencing signals or weighing semantic tradeoffs rather than simple pattern matching. and expensive aggregation workloads that could benefit from a materialized view. These scenarios show how agent reasoning across multiple signals can complement, not replace, precise heuristic rules.

The team documented a three — phase path to the improvement: first, optimizing prompts and the surrounding toolchain; second, rightsizing the model to balance cost and performance; and third, splitting the LLM call into two passes to overcome a final performance barrier. Combining autoresearch — driven autonomous trials with LLM Observability Experiments yielded faster iteration cycles, clearer handoffs between sessions and a structured way to analyze failure modes and tuning decisions, providing builders a reproducible workflow for tuning LLM-assisted database recommendations.

Sources

Datadog AI · 5/20/2026

Replies (0)

No replies in this topic yet.

Back