HNSW-RaBitQ Index
HNSW with RaBitQ Quantization
An advanced graph-based index that combines the HNSW graph structure with RaBitQ quantization algorithm β delivering dramatically lower memory usage while maintaining state-of-the-art search quality.
Platform Requirement: HNSW-RaBitQ is currently supported only on x86_64 with AVX2 or higher instruction set support. It is not available on ARM architectures.
How It Works
HNSW-RaBitQ combines two techniques to achieve high recall with minimal memory:
- HNSW graph for navigation πͺ
- Same multi-layer graph structure as the standard HNSW index β sparse upper layers for fast long-range jumps, dense lower layers for fine-grained local search.
- RaBitQ for distance estimation π
- RaBitQ processes the data by applying a random rotation to the vectors before converting them into binary codes (1s and 0s). This approach allows the system to estimate distances using efficient bitwise operations, significantly reducing both memory usage and the computational cost compared to processing full-precision numbers.
When to Use HNSW-RaBitQ?
- β Production systems needing fast search and high recall with controlled memory budgets
- β Massive high-dimensional datasets β billion-scale vectors with 1536+ dimensions that would consume terabytes of RAM in FP32 format
- β Workloads on x86_64 servers with AVX2/AVX-512 support
Best Practice: Use HNSW-RaBitQ when you want HNSW-quality search without the memory overhead.
The total_bits parameter controls the accuracyβmemory trade-off. According to the paper, on specific datasets, 7 bits achieves ~99% recall, 5 bits ~95%, and 4 bits ~90%. Going as low as 1 bit maximizes compression at the cost of lower recall.
Advantages
- β¨ Dramatically lower memory β Quantized vectors are up to 32x smaller than FP32, reducing active index size
- β¨ Fast Distance Estimation β RaBitQ supports to estimate the similarity metrics with high efficiency based on bitwise operations
- β¨ Promising Recall Without Re-ranking β Graph construction uses original vectors, preserving graph quality and RaBitQ provides an asymptotically optimal error bound for reliable ordering and reranking.
Trade-offs
- β οΈ x86_64 only β Requires AVX2 or AVX-512; ARM is not supported
- β οΈ Training overhead β Requires a KMeans training step before index construction, adding to build time
- β οΈ Dimension constraints β Only supports vectors between 64 and 4095 dimensions
Key Parameters
Tuning Tip:
Start with the defaults (total_bits=7, num_clusters=16). Adjust ef first for recall/latency trade-offs at query time. Only reduce total_bits if you need to cut memory further and can tolerate slightly lower recall.
Index-Time Parameters
| Parameter | Description | Tuning Guidance |
|---|---|---|
metric_type | Similarity metric used to compare vectors | Choose based on how your embeddings were trained |
m | Max neighbors per node β The maximum number of bidirectional links created for each node during graph construction | β’ Higher m β β¨ better recall and graph connectivity β οΈ more memory usage and higher latency for both indexing and search |
ef_construction | Index-time candidate pool size β Determines how many neighboring candidates the algorithm considers when inserting a new vector into the graph | β’ Higher ef_construction β β¨ better graph quality and higher recall β οΈ longer index build time (does not affect query speed) |
total_bits | RaBitQ quantization bits per dimension β Controls the precision of the binary encoding | Controls the accuracyβmemory trade-off. Lower values save more memory but reduce accuracy |
num_clusters | Number of KMeans clusters β Used during the RaBitQ training phase to partition the vector space | β’ More clusters can capture finer distribution patterns β’ Higher values increase recall slightly |
sample_count | Training sample count β Number of vectors sampled for KMeans training (0 = use all vectors) | Default is 0. Set a smaller value (e.g. 5,000,000) to speed up training and reduce memory usage on very large datasets |
Query-Time Parameters
| Parameter | Description | Tuning Guidance |
|---|---|---|
ef | Query-time candidate pool size β Determines how many potential neighbors are explored at each step during graph traversal at query time | β’ Higher ef β β¨ higher recall β οΈ higher query latency |
radius | Distance (similarity) threshold for range-based filtering β only documents satisfying the threshold are returned | Example: β’ With inner product MetricType.IP, set radius=0.6 to keep only results with score > 0.6 β Use when: You want to filter out low-quality matches π« Skip when: You want all top-k results, regardless of quality |
is_linear | Forces a brute-force linear search instead of using the index | π Very slow for large datasets! β Only use for: Debugging, tiny collections, or verifying index accuracy |
is_using_refiner | Enables exact score refinement β recomputes exact FP32 distances for top candidates after quantized search | β
Turn on: When you need maximum precision β οΈ Adds latency due to full-precision re-scoring |