Aivizor
Aivizor
SkinsCreatsCommunity
Back
  1. Community
  2. /
  3. Google

A SIGMOD paper publishes 2026 — 05-13 shows that lightweight Proxy Models working on precomputed embeddings can replace

News
E
Elara Winslow

5/15/2026, 7:13:54 AM

A SIGMOD paper publishes 2026 — 05-13 shows that lightweight Proxy Models working on precomputed embeddings can replace

A SIGMOD paper published 2026 — 05-13 demonstrates that proxy models can replace the majority of large language model (LLM) calls inside AI-powered SQL functions, cutting query latency and token — related costs by more than 100× in many workloads. This matters because databases that accept natural — language instructions can run semantic filters and classifiers far faster and cheaper without large accuracy losses for common tasks.

Proxy models are cost-optimized, ultra — lightweight models tailored to a specific query prompt and tuned to the database’s data; they can be trained ahead of time or on the fly. The paper cites practical implementations such as a logistic regression currently used in BigQuery and AlloyDB. In execution, the proxy stands in for expensive LLM evaluations, which is why the authors label these small models as proxies.

The key technique is to feed the proxy models dense semantic embeddings rather than raw text. The paper uses Gemini embedding generators to convert text into vectors once and then reuse those embeddings across queries. By amortizing the cost of semantic encoding and running proxies on CPU without special hardware, the approach makes most semantic filters and classifiers orders of magnitude faster and cheaper.

The need for proxies arises from the cost and latency of full LLM evaluations: the paper notes LLM calls can add 10 — 100× to query latency and roughly 1,000× to token — related cost, and that a medium query scanning 10 — 100 million rows can consume prohibitively many tokens. The authors trace core proxy ideas to the Universal Query Engine (UQE) work presented at NeurIPS 2024 by Google DeepMind.

Benchmarks across 10 tasks show the proxy — to-LLM predictive — performance ratio (measured by F1) ranged from 90% to 116%, indicating proxies are automatically applicable in many but not all cases — sometimes matching LLM quality, sometimes slightly worse, and occasionally outperforming the LLM. The authors warn proxies remain approximations and can fail on tasks that require reasoning that connects multiple semantic concepts.

From an operational perspective, query processors can test whether a proxy is effective and feasible for a given AI function before using it in execution. The paper notes that BigQuery and AlloyDB already expose the optimization under an optimized mode for AI.IF and AI.CLASSIFY per their documentation, and includes concrete examples — for instance, a semantic filter over IMDB reviews using a review_embedded column — to illustrate how embeddings plus a lightweight proxy can execute common semantic queries.

Sources

  1. Google Cloud Blog — AI & Machine Learning · 5/13/2026
0
0
0

Replies (0)

No replies in this topic yet.

9:41