An advanced graph-based index that combines the HNSW graph structure with RaBitQ quantization algorithm — delivering dramatically lower memory usage while maintaining state-of-the-art search quality.

Platform Requirement: HNSW-RaBitQ is currently supported only on x86_64 with AVX2 or higher instruction set support. It is not available on ARM architectures.

How It Works

HNSW-RaBitQ combines two techniques to achieve high recall with minimal memory:

HNSW graph for navigation 🪜
- Same multi-layer graph structure as the standard HNSW index — sparse upper layers for fast long-range jumps, dense lower layers for fine-grained local search.
RaBitQ for distance estimation 🔍
- RaBitQ processes the data by applying a random rotation to the vectors before converting them into binary codes (1s and 0s). This approach allows the system to estimate distances using efficient bitwise operations, significantly reducing both memory usage and the computational cost compared to processing full-precision numbers.

When to Use HNSW-RaBitQ?

✅ Production systems needing fast search and high recall with controlled memory budgets
✅ Massive high-dimensional datasets — billion-scale vectors with 1536+ dimensions that would consume terabytes of RAM in FP32 format
✅ Workloads on x86_64 servers with AVX2/AVX-512 support

Best Practice: Use HNSW-RaBitQ when you want HNSW-quality search without the memory overhead.

The total_bits parameter controls the accuracy–memory trade-off. According to the paper, on specific datasets, 7 bits achieves ~99% recall, 5 bits ~95%, and 4 bits ~90%. Going as low as 1 bit maximizes compression at the cost of lower recall.

Advantages

✨ Dramatically lower memory — Quantized vectors are up to 32x smaller than FP32, reducing active index size
✨ Fast Distance Estimation — RaBitQ supports to estimate the similarity metrics with high efficiency based on bitwise operations
✨ Promising Recall Without Re-ranking — Graph construction uses original vectors, preserving graph quality and RaBitQ provides an asymptotically optimal error bound for reliable ordering and reranking.

Trade-offs

⚠️ x86_64 only — Requires AVX2 or AVX-512; ARM is not supported
⚠️ Training overhead — Requires a KMeans training step before index construction, adding to build time
⚠️ Dimension constraints — Only supports vectors between 64 and 4095 dimensions

Key Parameters

Tuning Tip: Start with the defaults (total_bits=7, num_clusters=16). Adjust ef first for recall/latency trade-offs at query time. Only reduce total_bits if you need to cut memory further and can tolerate slightly lower recall.

Index-Time Parameters

Code Example Python API Reference Node.js API Reference

Parameter	Description	Tuning Guidance
`metric_type`	Similarity metric used to compare vectors	Choose based on how your embeddings were trained
`m`	Max neighbors per node — The maximum number of bidirectional links created for each node during graph construction	• Higher `m` → ✨ better recall and graph connectivity ⚠️ more memory usage and higher latency for both indexing and search
`ef_construction`	Index-time candidate pool size — Determines how many neighboring candidates the algorithm considers when inserting a new vector into the graph	• Higher `ef_construction` → ✨ better graph quality and higher recall ⚠️ longer index build time (does not affect query speed)
`total_bits`	RaBitQ quantization bits per dimension — Controls the precision of the binary encoding	Controls the accuracy–memory trade-off. Lower values save more memory but reduce accuracy
`num_clusters`	Number of KMeans clusters — Used during the RaBitQ training phase to partition the vector space	• More clusters can capture finer distribution patterns • Higher values increase recall slightly
`sample_count`	Training sample count — Number of vectors sampled for KMeans training (`0` = use all vectors)	Default is `0`. Set a smaller value (e.g. 5,000,000) to speed up training and reduce memory usage on very large datasets

Query-Time Parameters