AgentCore Optimization enters preview with production-driven recommendations and live A/B testing

News

5/4/2026, 6:39:23 PM

AgentCore Optimization enters preview with production-driven recommendations and live A/B testing

AgentCore Optimization, now in preview, closes the observe — evaluate-improve loop for AI agents by generating recommended fixes from production traces and validating them with batch evaluation or controlled A/B tests before deployment.

AgentCore Optimization has entered preview, offering an automated observe — evaluate–improve loop for deployed AI agents. The new capability analyzes production traces to propose system prompt or tool description changes, then validates those proposals with either batch evaluation against test data or live A/B experiments, so teams can verify impact before rolling changes into production.

AgentCore captures end-to-end traceability as OpenTelemetry traces via AgentCore Observability, logging every model call, tool invocation, and reasoning step. Built — in evaluators and user-supplied scorers run automatically over those traces to measure dimensions such as goal success rate, tool selection accuracy, helpfulness, and safety. Evaluations can rely on ground — truth comparisons or use an LLM-as-judge to produce the reward signals that steer improvements.

The Recommendations feature analyzes the combination of production traces and evaluation outputs to generate concrete change suggestions for prompts or tool metadata. To run recommendations, teams point the Recommendations API at the CloudWatch Log group where their agent writes traces and choose the reward signal — either a built — in evaluator or a custom evaluator. That workflow centralizes what many teams now do by manual trace inspection and ad hoc prompt edits.

Proposed changes can be vetted in batch: batch evaluation runs candidate modifications against a predefined test dataset and reports aggregate scores so engineers can compare outcomes. If hand-authored test cases are sparse, an LLM-backed actor can simulate additional cases to broaden coverage and stress — test recommendations before any live exposure. For live validation, AgentCore Gateway enables controlled A/B testing by splitting production traffic at a configurable percentage between the control and candidate paths. The A/B testing path returns results with confidence intervals and statistical significance so teams can detect regressions or improvements prior to full rollout, replacing the risky cycle of deploying fixes based on a few manual checks.

The update targets teams without large central research squads or slow benchmark pipelines: thousands of developers already use AgentCore to build agents that plan and act across complex workflows, and Optimization aims to shorten tuning cycles from weekly or monthly to far more frequent, data-backed iterations. As Yoshiharu Okuda, Head of Generative AI Business Strategy Department at NTT DATA, put it: “Continuously evaluating and improving agents is essential for driving data‑driven value creation.

By deriving recommendations from production trace data and validating their impact through batch and controlled A/B testing, organizations can convert manual prompt tuning processes into rapid, repeatable cycles. The approach is designed to optimize agent performance at scale while preserving accuracy and reducing the risk of blind deployments.

Sources

AWS Machine Learning Blog · 5/4/2026

Replies (0)

No replies in this topic yet.

Back