Meryem Arik urges open-source model gateways to tame 'inference chaos' at QCon AI

News

5/20/2026, 1:25:44 PM

Meryem Arik urges open-source model gateways to tame 'inference chaos' at QCon AI

Meryem Arik told a QCon AI audience that organizations face mounting “inference chaos” as engineering teams adopt multiple model providers and self-hosted models simultaneously, and she urged adoption of open-source AI model gateways to restore order. The proposal matters because gateways promise a single point of control for observability, policy enforcement and cost tracking while still allowing teams to choose the best models for each task.

Arik described the fragmentation as an operational problem that complicates three core functions: seeing what models are running and why, enforcing organizational policies across diverse endpoints, and understanding or allocating inference spend. She said these issues arise when product teams independently integrate providers, spin up fine-tuned self‑hosted models, or experiment with niche stacks, creating inconsistent telemetry and uneven security postures.

To address that fragmentation, Arik laid out concrete gateway functionality: a central HTTP/gRPC routing layer that normalizes access to OpenAI, Mistral, self‑hosted fine‑tuned models and other endpoints. Gateways should support per‑request model selection and routing so that different applications can use specialized models while all traffic flows through a common control plane. That design, she argued, preserves the benefit of decentralized model choice but exposes those choices to centralized infrastructure for policy and quality control.

Arik emphasized that inference demands are heterogeneous — no single model fits every use case-and used an analogy to show why specialization matters across applications. Based on that heterogeneity, she recommended gateway capabilities such as per‑request routing rules and model selection by workload, rather than a one‑size‑fits‑all approach that would force teams onto a single, possibly suboptimal model.

On the practical side, Arik said gateways can enforce security policies, role‑based access control (RBAC), rate limits and cost‑saving routing rules, while collecting telemetry for observability and auditing. She recommended deploying gateways early, even at small scale, to prevent accumulating cross‑team technical debt and to maintain centralized governance over inference spend and compliance as usage grows.

Arik noted that Doubleword built an open‑source AI model gateway and clarified it is not a commercial product, and she cited other open projects such as LiteLLM and OpenRouter as operational options for teams exploring this approach. She said her talk was inspired by a blog from Doubleword’s CTO, and the recorded presentation runs 46:51. Framing her recommendations from operational experience, Arik said Doubleword has focused on inference since its founding about four years ago and frequently encountered multi‑provider setups in client environments. She is Doubleword’s co‑founder and CEO, formerly a physicist at Oxford, a recurring speaker at TEDx and QCon, and a 30 Under 30 honoree.

Sources

InfoQ AI/ML · 5/20/2026

Replies (0)

No replies in this topic yet.

Back