Zvec Logo

HNSW Index

Hierarchical Navigable Small World

A graph-based index and the go-to choice for low-latency approximate nearest neighbor search — delivering state-of-the-art speed and recall at the cost of higher memory usage

How It Works

HNSW builds a multi-layer graph structure where each node represents a vector and edges connect nodes based on similarity.

HNSW Example

  • Layers form a hierarchy 🪜
    • Upper layers are sparse and act as "highways" for fast long-range navigation.
    • Lower layers are dense and provide fine-grained local neighborhood connectivity.
  • How search works (coarse → fine) 🔍
    1. Start from an entry point at the top layer.
    2. At the current layer, you greedily walk to neighbors that are closer to the query vector, until you can't get any closer.
    3. Then you drop down one layer at that position and repeat the same greedy search.
    4. On the lowest layer, this process is done more carefully (with a small candidate list) to refine the result.
  • Why this is fast and accurate ⚡ 🎯
    • Fast: Upper layers let you jump quickly to the right region without visiting most points.
    • Accurate: The dense bottom layer lets you explore the local neighborhood thoroughly, so recall stays high.

When to Use an HNSW Index?

  • ✅ Real-time, low-latency applications (e.g., conversational AI and live recommendations)
  • ✅ Production systems requiring consistent high recall with minimal latency

Best Practice: HNSW is our recommended default for most production use cases. It strikes an excellent balance between speed, accuracy, and robustness.

Advantages

  1. Near-logarithmic query time — Typically O(log n) for large datasets
  2. Consistently high recall across diverse data distributions
  3. Faster indexing than many alternatives (e.g., IVF-based methods)

Trade-offs

  1. ⚠️ Higher memory footprint — Graph links require additional storage (scales with m)
  2. ⚠️ Indexing complexity of O(n log n) — Slower build time than Flat index (but often faster than IVF)

Key Parameters

Tuning Tip:
Start with defaults, then adjust ef first for recall/latency trade-offs.
Only if needed, increase ef_construction or m for better accuracy — but expect slower indexing and higher memory use.

Index-Time Parameters

ParameterDescriptionTuning Guidance
metric_typeSimilarity metric used to compare vectorsChoose based on how your embeddings were trained
mMax neighbors per node — The maximum number of bidirectional links created for each node during graph construction• Higher m
✨ better recall and graph connectivity
⚠️ more memory usage and higher latency for both indexing and search
ef_constructionIndex-time candidate pool size — Determines how many neighboring candidates the algorithm considers when inserting a new vector into the graph• Higher ef_construction
✨ better graph quality and higher recall
⚠️ longer index build time (does not affect query speed)
quantize_typeVector quantization method to apply
Defaults to no quantization
See Quantization for more details

Query-Time Parameters

ParameterDescriptionTuning Guidance
efQuery-time candidate pool size — Determines how many potential neighbors are explored at each step during graph traversal at query time• Higher ef
✨ higher recall
⚠️ higher query latency
radiusDistance (similarity) threshold for range-based filtering — only documents satisfying the threshold are returnedExample:
• With inner product MetricType.IP, set radius=0.6 to keep only results with score > 0.6
✅ Use when: You want to filter out low-quality matches
🚫 Skip when: You want all top-k results, regardless of quality
is_linearForces a brute-force linear search instead of using the index🐌 Very slow for large datasets!
✅ Only use for: Debugging, tiny collections, or verifying index accuracy
is_using_refinerEnables exact score refinement (recomputes exact similarity scores) for top candidates — useful when vectors are quantized✅ Turn on: When you need higher accuracy
⚠️ Adds latency due to exact re-scoring