Lindorm Vector Retrieval Service Posts Record Throughput and Low P99 Latency in VectorDBBench Tests

News

5/11/2026, 10:07:04 AM

Lindorm Vector Retrieval Service Posts Record Throughput and Low P99 Latency in VectorDBBench Tests

Independent VectorDBBench runs in a real cloud environment show Lindorm’s upgraded Vector Retrieval Service delivering very high throughput and sub‑millisecond to single‑millisecond P99 latencies on Cohere datasets while preserving recall.

Independent VectorDBBench runs evaluated Lindorm’s upgraded Vector Retrieval Service in a real cloud environment and reported record throughput, low P99 latency, and strong recall across standard benchmarks, validating the system’s ability to handle massive, high‑concurrency vector retrieval workloads without a query cache. The measurements were taken under realistic production constraints and are presented as proof points for serving latency‑sensitive applications.

On the Cohere datasets the numbers are explicit. On Cohere‑10M with a single 32‑core node Lindorm reached a peak of 24,346 QPS with P99 latency of about 2.5 ms while maintaining recall above 99%. On Cohere‑1M the service exceeded 56,000 QPS with roughly 2 ms P99 latency and similarly high recall. benchmark conditions.

Beyond raw vector KNN, the tests emphasize hybrid search under scalar filtering. Lindorm applies a CBO/RBO hybrid optimizer plus an adaptive hybrid index to choose execution plans across different filtering ratios. The system typically selects vector‑first policies at low filter ratios and scalar‑driven plans at high filter ratios, and it applies parallelized scalar filtering during graph traversal when appropriate to avoid sacrificing recall.

The benchmarking report provides concrete hybrid‑mode metrics for builders. In vector‑driven mode, cross‑pipeline parallel filtering during traversal keeps QPS above 50,000. In scalar‑driven mode the engine switches to Bitmap and inverted indexes for tight filters and reportedly reaches over 260,000 QPS. These modes are presented as a way to avoid the performance collapses that can occur with traditional “filter then retrieve” or “retrieve then filter” patterns as selectivity changes.

Test setup and reproducibility details are provided for engineering teams. The environment used 32‑core, 128 GB instances (32C128G), Lindorm vector engine version 3.10.16 or later, and VectorDBBench as the stress tester. Measurements were taken in “no Query Cache” mode; standard hardware specs and open‑source testing frameworks were used to keep comparisons fair, and an adapter was submitted to add Lindorm protocol support to VectorDBBench.

For practitioners the takeaway is practical: these vendor benchmarks position Lindorm as a candidate for production vector workloads — LLM applications, search, recommendations, and advertising — where low latency, high throughput under concurrency, index freshness during frequent updates, and mixed scalar filtering matter. As with any vendor‑published test, teams should re‑run comparable benchmarks on their own data shapes, cluster sizes, and query mixes to validate cost, SLAs, and update‑latency tradeoffs.

Sources

Alibaba Cloud Blog · 5/11/2026

Replies (0)

No replies in this topic yet.

Back