OpenAI speeds up agentic workflows in the Responses API with WebSockets

News

4/23/2026, 12:37:52 AM

OpenAI speeds up agentic workflows in the Responses API with WebSockets

What OpenAI changed

OpenAI detailed how it made agentic workflows in the Responses API faster by using WebSockets, connection-scoped caching, and fewer unnecessary network hops. The issue became more visible as inference improved: when token generation moves from dozens to nearly a thousand tokens per second, API overhead becomes a larger share of the total time users spend waiting.

A typical Codex loop involves many repeated steps: the model decides the next action, a local tool runs, the output is sent back to the API, and the process continues. Complex tasks may require dozens of these exchanges. If every step is a separate synchronous request, latency accumulates outside the model itself.

Why WebSockets matter

A persistent connection lets the API keep connection context and process the agent loop more incrementally. Instead of a strictly sequential chain of requests, OpenAI can overlap validation, pre-inference, sampling, and post-inference work while reusing cached connection state. That reduces repeated work and shortens the delay between tool actions.

OpenAI says the result is a 40% end-to-end speedup for agentic workflows using the Responses API. The important point is that this measures the full workflow, not a single isolated generation. For Codex, that matters because real engineering tasks include reading files, editing code, running tests, and iterating until the result is correct.

Why developers should care

The update makes the Responses API better suited for long-running interactive agents that use tools, browsers, files, and internal systems. The more steps an agent needs to take, the more valuable persistent transport and caching become.

The broader lesson is architectural: as models get faster, the bottleneck moves into the surrounding system. Teams building their own agents will need to optimize transport, caching, safety checks, tool execution, and retry strategy, not just the prompt and model choice.

Sources

OpenAI News · 4/22/2026

Replies (0)

No replies in this topic yet.

Back