Alibaba Open-Sources Zvec: An Embedded Vector Database Bringing SQLite-like Simplicity and High-Performance On-Device RAG to Edge Applications

“`html

Alibaba Tongyi Lab research team released ‘Zvec’, an open source, in-process vector database that targets edge and on-device retrieval workloads. It is positioned as ‘the SQLite of vector databases’ because it runs as a library inside your application and does not require any external service or daemon. It is designed for retrieval augmented generation (RAG), semantic search, and agent workloads that must run locally on laptops, mobile devices, or other constrained hardware/edge devices

The core idea is simple. Many applications now need vector search and metadata filtering but do not want to run a separate vector database service. Traditional server style systems are heavy for desktop tools, mobile apps, or command line utilities. An embedded engine that behaves like SQLite but for embeddings fits this gap.

https://zvec.org/en/blog/introduction/

Why embedded vector search matters for RAG?

RAG and semantic search pipelines need more than a bare index. They need vectors, scalar fields, full CRUD, and safe persistence. Local knowledge bases change as files, notes, and project states change.

Index libraries such as Faiss provide approximate nearest neighbor search but do not handle scalar storage, crash recovery, or hybrid queries. You end up building your own storage and consistency layer. Embedded extensions such as DuckDB-VSS add vector search to DuckDB but expose fewer index and quantization options and weaker resource control for edge scenarios. Service based systems such as Milvus or managed vector clouds require network calls and separate deployment, which is often overkill for on-device tools.

Zvec claims to fit in specifically for these local scenarios. It gives you a vector-native engine with persistence, resource governance, and RAG oriented features, packaged as a lightweight library.

Core architecture: in-process and vector-native

Zvec is implemented as an embedded library. You install it with pip install zvec and open collections directly in your Python process. There is no external server or RPC layer. You define schemas, insert documents, and run queries through the Python API.

The engine is built on Proxima, Alibaba Group’s high performance, production grade, battle tested vector search engine. Zvec wraps Proxima with a simpler API and embedded runtime. The project is released under the Apache 2.0 license.

Current support covers Python 3.10 to 3.12 on Linux x86_64, Linux ARM64, and macOS ARM64.

The design goals are explicit:

Embedded execution in process

Vector native indexing and storage

Production ready persistence and crash safety

This makes it suitable for edge devices, desktop applications, and zero-ops deployments.

Developer workflow: from install to semantic search

The quickstart documentation shows a short path from install to query.

Example:

import zvec

# Define collection schema

schema = zvec.CollectionSchema(

name=”example”,

vectors=zvec.VectorSchema(“embedding”, zvec.DataType.VECTOR_FP32, 4),

)

# Create collection

collection = zvec.create_and_open(path=”./zvec_example”, schema=schema,)

# Insert documents

collection.insert([

zvec.Doc(id=”doc_1″, vectors={“embedding”: [0.1, 0.2, 0.3, 0.4]}),

zvec.Doc(id=”doc_2″, vectors={“embedding”: [0.2, 0.3, 0.4, 0.1]}),

])

# Search by vector similarity

results = collection.query(

zvec.VectorQuery(“embedding”, vector=[0.4, 0.3, 0.3, 0.1]),

topk=10

)

# Results: list of {‘id’: str, ‘score’: float, …}, sorted by relevance

print(results)

Results come back as dictionaries that include IDs and similarity scores. This is enough to build a local semantic search or RAG retrieval layer on top of any embedding model.

Performance: VectorDBBench and 8,000+ QPS

Zvec is optimized for high throughput and low latency on CPUs. It uses multithreading, cache friendly memory layouts, SIMD instructions, and CPU prefetching.

In VectorDBBench on the Cohere 10M dataset, with comparable hardware and matched recall, Zvec reports more than 8,000 QPS.

“` Zvec, an embedded vector database, has demonstrated exceptional performance, surpassing the previous leaderboard leader, ZillizCloud, by more than double while also significantly reducing index build time. This achievement indicates that an embedded library can match cloud-level performance for high-volume similarity searches under similar benchmark conditions.

The capabilities of Zvec are specifically tailored for RAG (Retrieval Augmented Generation) and agentic retrieval. It supports a range of features including full CRUD operations on documents for dynamic knowledge base updates, schema evolution for adjusting index strategies and fields, multi-vector retrieval for queries involving multiple embedding channels, and a built-in reranker supporting weighted fusion and Reciprocal Rank Fusion. Additionally, Zvec enables scalar vector hybrid search by incorporating scalar filters into the index execution path, with the option of inverted indexes for scalar attributes. This versatility allows for the creation of on-device assistants that combine semantic retrieval, user-defined filters, and multiple embedding models within a single embedded engine.

Key takeaways from Zvec include its positioning as the ‘SQLite of vector databases’ for on-device and edge RAG workloads. Built on Alibaba’s high-performance Proxima vector search engine, Zvec is released under Apache 2.0 license with Python support on Linux x86_64, Linux ARM64, and macOS ARM64 platforms. It achieves over 8,000 queries per second on VectorDBBench with the Cohere 10M dataset, outperforming the previous leaderboard leader, ZillizCloud, while also reducing index build time. The engine offers explicit resource governance through features such as 64MB streaming writes, mmap mode, memory_limit_mb, configurable concurrency, optimize_threads, and query_threads for CPU control. Zvec is fully equipped for RAG applications with support for full CRUD, schema evolution, multi-vector retrieval, built-in reranking capabilities, and scalar vector hybrid search with optional inverted indexes. The ecosystem roadmap includes integration with LangChain, LlamaIndex, DuckDB, PostgreSQL, and real device deployments.

For more details and access to the repository, visit the Technical details and Repo. Stay updated by following us on Twitter and joining our ML SubReddit community with over 100k members. Don’t forget to subscribe to our Newsletter and join us on Telegram for more insights.

[Image credit: https://zvec.org/en/blog/introduction/] Transform the following sentence into a question:

“You are going to the party tonight.”

Are you going to the party tonight?

Alibaba Open-Sources Zvec: An Embedded Vector Database Bringing SQLite-like Simplicity and High-Performance On-Device RAG to Edge Applications

Why embedded vector search matters for RAG?

Core architecture: in-process and vector-native

Developer workflow: from install to semantic search

Performance: VectorDBBench and 8,000+ QPS

Be the first to comment

Leave a Reply Cancel reply