
Turbovec, an open-source Rust vector index with Python bindings, implements Google Research’s TurboQuant to dramatically reduce embedding storage and speed nearest — neighbor search for RAG and local/on-prem inference.
Turbovec has implemented Google Research’s TurboQuant as a production — ready vector index written in Rust with Python bindings, targeting retrieval — augmented generation (RAG) and local/on-prem inference where embedding storage and runtime cost matter. The maintainers report that a 10 million — document embedding corpus that consumes 31 GB in float32 can be quantized to about 4 GB with turbovec, a material saving for teams operating large local indices.
TurboQuant in turbovec is data-oblivious and requires no training or passes over corpus data. The pipeline normalizes each vector (storing norms separately), applies a single random orthogonal rotation to make coordinates analytically predictable, runs Lloyd — Max scalar quantization using precomputed bucket boundaries and centroids, and bit-packs coordinates. That process gives large compression: a 1,536 — dimensional vector drops from 6,144 bytes (FP32) to 384 bytes at 2 — bit encoding, a cited 16× reduction.
The runtime is optimized with SIMD kernels tuned for modern platforms: NEON on ARM, AVX‑512BW on x86 with an AVX2 fallback, plus nibble — split lookup tables to accelerate scoring. Benchmarks reported in the project use 100k vectors, 1,000 queries, k=64 and report medians over five runs to reduce variance in comparative timings.
On raw throughput, turbovec reportedly outperforms FAISS IndexPQFastScan by 12 — 20% on an Apple M3 Max across the tested configurations. On an Intel Xeon Platinum 8481C (Sapphire Rapids), turbovec wins most 4 — bit cases by 1–6% and runs within roughly 1% of FAISS on 2 — bit single — threaded workloads; however, some multithreaded 2 — bit scenarios still favor FAISS when it can leverage an AVX‑512 VBMI path.
Recall comparisons use FAISS IndexPQ (LUT256, nbits=8, float32 LUT) as a baseline. For OpenAI embeddings at d=1,536 and d=3,072, TurboQuant and FAISS deliver very similar R@1 (0–1 point difference) and both reach near‑perfect recall by k≈4–8. For lower dimensions such as GloVe d=200, TurboQuant trails FAISS by roughly 3–6 points at R@1 and only closes the gap by k≈16 — 32, showing the quantizer’s sensitivity to embedding dimensionality and family.
The Python API is intentionally simple: install with pip install turbovec and create an index via TurboQuantIndex(dim=..., bit_width=...). Typical operations include index.add(vectors), index.search(query, k) and index.write/read for persistence. Because TurboQuant is data‑oblivious, teams avoid k‑means codebook training and retraining cycles when corpora shift, simplifying CI/CD and enabling smaller memory footprints for on‑prem RAG deployments.
Sources
Replies (0)
No replies in this topic yet.