
Elastic’s Field Technology team published an inaugural AI Year in Review on May 14, 2026, after analyzing more than one million messages gathered since 2024. Authored by Chris Blaisure and Riya Juneja, the report summarizes operational data and quality metadata from production agent deployments on a shared Elasticsearch platform and highlights why those findings matter for teams running support and sales automation: improving retrieval and observability delivered bigger quality gains than changing underlying models. This distinction points teams toward engineering and data practices that directly affect end‑user outcomes.
The team deployed five agentic tools scoped to different workflows and user populations but running on common infrastructure: a customer‑facing Support Assistant, an internal Support Assistant with an expanded knowledge base, a Case Summarizer for engineers, a Knowledge Drafter that turns cases into KB entries, and a Sales Assistant embedded in Salesforce. Across 2025 those agents processed 209,220 conversation threads spanning multiple user groups and dozens of use cases; overall usage logs since 2024 exceed one million messages.
roughly 8% of users — the power users — generated about 80% of sessions, while the remaining 92% accounted for 20% of sessions, underscoring concentrated, heavy usage by a small cohort. From their logs the team distilled practical levers for production quality. The single biggest improvements came from closing feedback loops that prioritize retrieval relevance and observability rather than swapping LLMs. In particular, the report flags retrieval thresholds and token strategy as operational settings that materially shape response quality in the field.
for roughly every simple query the assistants received three or more complex technical questions, driving higher token counts and distinct retrieval needs that standard prompt adjustments alone did not fix. To operationalize observability, the team built a centralized AI gateway to evaluate all LLM traffic and convert raw conversations into structured metadata. An LLM redacts PII and maps logs into metrics under a process the authors call Context Performance Monitoring, or Context Observability. That pipeline produces sentiment, response quality, and accuracy metrics used to iterate on retrieval, prompt design, and relevance thresholds.
Sources
Replies (0)
No replies in this topic yet.