Zvec Logo

HNSW-RaBitQ Index

HNSW with RaBitQ Quantization

An advanced graph-based index that combines the HNSW graph structure with RaBitQ quantization algorithm β€” delivering dramatically lower memory usage while maintaining state-of-the-art search quality.

Platform Requirement: HNSW-RaBitQ is currently supported only on x86_64 with AVX2 or higher instruction set support. It is not available on ARM architectures.

How It Works

HNSW-RaBitQ combines two techniques to achieve high recall with minimal memory:

  • HNSW graph for navigation πŸͺœ
    • Same multi-layer graph structure as the standard HNSW index β€” sparse upper layers for fast long-range jumps, dense lower layers for fine-grained local search.
  • RaBitQ for distance estimation πŸ”
    • RaBitQ processes the data by applying a random rotation to the vectors before converting them into binary codes (1s and 0s). This approach allows the system to estimate distances using efficient bitwise operations, significantly reducing both memory usage and the computational cost compared to processing full-precision numbers.

When to Use HNSW-RaBitQ?

  • βœ… Production systems needing fast search and high recall with controlled memory budgets
  • βœ… Massive high-dimensional datasets β€” billion-scale vectors with 1536+ dimensions that would consume terabytes of RAM in FP32 format
  • βœ… Workloads on x86_64 servers with AVX2/AVX-512 support

Best Practice: Use HNSW-RaBitQ when you want HNSW-quality search without the memory overhead.

The total_bits parameter controls the accuracy–memory trade-off. According to the paper, on specific datasets, 7 bits achieves ~99% recall, 5 bits ~95%, and 4 bits ~90%. Going as low as 1 bit maximizes compression at the cost of lower recall.

Advantages

  1. ✨ Dramatically lower memory β€” Quantized vectors are up to 32x smaller than FP32, reducing active index size
  2. ✨ Fast Distance Estimation β€” RaBitQ supports to estimate the similarity metrics with high efficiency based on bitwise operations
  3. ✨ Promising Recall Without Re-ranking β€” Graph construction uses original vectors, preserving graph quality and RaBitQ provides an asymptotically optimal error bound for reliable ordering and reranking.

Trade-offs

  1. ⚠️ x86_64 only β€” Requires AVX2 or AVX-512; ARM is not supported
  2. ⚠️ Training overhead β€” Requires a KMeans training step before index construction, adding to build time
  3. ⚠️ Dimension constraints β€” Only supports vectors between 64 and 4095 dimensions

Key Parameters

Tuning Tip: Start with the defaults (total_bits=7, num_clusters=16). Adjust ef first for recall/latency trade-offs at query time. Only reduce total_bits if you need to cut memory further and can tolerate slightly lower recall.

Index-Time Parameters

ParameterDescriptionTuning Guidance
metric_typeSimilarity metric used to compare vectorsChoose based on how your embeddings were trained
mMax neighbors per node β€” The maximum number of bidirectional links created for each node during graph constructionβ€’ Higher m β†’
✨ better recall and graph connectivity
⚠️ more memory usage and higher latency for both indexing and search
ef_constructionIndex-time candidate pool size β€” Determines how many neighboring candidates the algorithm considers when inserting a new vector into the graphβ€’ Higher ef_construction β†’
✨ better graph quality and higher recall
⚠️ longer index build time (does not affect query speed)
total_bitsRaBitQ quantization bits per dimension β€” Controls the precision of the binary encodingControls the accuracy–memory trade-off.
Lower values save more memory but reduce accuracy
num_clustersNumber of KMeans clusters β€” Used during the RaBitQ training phase to partition the vector spaceβ€’ More clusters can capture finer distribution patterns
β€’ Higher values increase recall slightly
sample_countTraining sample count β€” Number of vectors sampled for KMeans training (0 = use all vectors)Default is 0. Set a smaller value (e.g. 5,000,000) to speed up training and reduce memory usage on very large datasets

Query-Time Parameters

ParameterDescriptionTuning Guidance
efQuery-time candidate pool size β€” Determines how many potential neighbors are explored at each step during graph traversal at query timeβ€’ Higher ef β†’
✨ higher recall
⚠️ higher query latency
radiusDistance (similarity) threshold for range-based filtering β€” only documents satisfying the threshold are returnedExample:
β€’ With inner product MetricType.IP, set radius=0.6 to keep only results with score > 0.6
βœ… Use when: You want to filter out low-quality matches
🚫 Skip when: You want all top-k results, regardless of quality
is_linearForces a brute-force linear search instead of using the index🐌 Very slow for large datasets!
βœ… Only use for: Debugging, tiny collections, or verifying index accuracy
is_using_refinerEnables exact score refinement β€” recomputes exact FP32 distances for top candidates after quantized searchβœ… Turn on: When you need maximum precision
⚠️ Adds latency due to full-precision re-scoring

On this page