Hierarchical Navigable Small World Index (HNSW)
The go-to graph-based index for low-latency approximate nearest neighbor search, delivering top-tier speed and recall at the cost of higher memory.
How It Works
HNSW builds a multi-layer graph structure where each node represents a vector and edges connect nodes based on similarity.
- Layers form a hierarchy πͺ
- Upper layers are sparse with fewer nodes, acting as "highways" for fast long-range navigation.
- Lower layers are dense with more nodes, providing fine-grained local neighborhood connectivity.
- How search works (coarse β fine) π
- Start from an entry point at the top layer.
- At the current layer, you greedily walk to neighbors that are closer to the query vector, until you can't get any closer.
- Then you drop down one layer at that position and repeat the same greedy search.
- On the lowest layer, this process is done more carefully (with a candidate list) to refine the result.
- Why this is fast and accurate β‘ π―
- Fast: Upper layers let you jump quickly to the right region without visiting most points.
- Accurate: The dense bottom layer lets you explore the local neighborhood thoroughly, so recall stays high.
When to Use an HNSW Index?
- β Real-time, low-latency applications (e.g., conversational AI and live recommendations)
- β Production systems requiring consistent high recall with minimal latency
Best Practice: HNSW is our recommended default for most production use cases. It strikes an excellent balance between speed, accuracy, and robustness.
Advantages
- β¨ Near-logarithmic query time β Typically O(log n) for large datasets
- β¨ Consistently high recall across diverse data distributions
- β¨ Faster indexing than many alternatives (e.g., IVF-based methods)
Trade-offs
- β οΈ Higher memory footprint β Graph links require additional storage (scales with
m) - β οΈ Indexing complexity of O(n log n) β Slower build time than Flat index (but often faster than IVF)
Key Parameters
Tuning Tip: Start with defaults, then adjust ef first for recall/latency trade-offs. Only if needed, increase ef_construction or m for better accuracy β but expect slower indexing and higher memory use.
Index-Time Parameters
| Parameter | Description | Tuning Guidance |
|---|---|---|
metric_type | Similarity metric used to compare vectors | Choose based on how your embeddings were trained |
m | Max neighbors per node β The maximum number of bidirectional links created for each node during graph construction | β’ Higher m β β¨ better recall and graph connectivity β οΈ more memory usage and higher latency for both indexing and search |
ef_construction | Index-time candidate pool size β Determines how many neighboring candidates the algorithm considers when inserting a new vector into the graph | β’ Higher ef_construction β β¨ better graph quality and higher recall β οΈ longer index build time (does not affect query speed) |
quantize_type | Vector quantization method to apply Defaults to no quantization | See Quantization for more details |
Query-Time Parameters
| Parameter | Description | Tuning Guidance |
|---|---|---|
ef | Query-time candidate pool size β Determines how many potential neighbors are explored at each step during graph traversal at query time | β’ Higher ef β β¨ higher recall β οΈ higher query latency π‘ If you're missing results at ef=300, try bumping it up to 500 or even higher to cast a wider net. |
radius | Distance (similarity) threshold for range-based filtering β only documents satisfying the threshold are returned | Example: β’ With inner product MetricType.IP, set radius=0.6 to keep only results with score > 0.6 β Use when: You want to filter out low-quality matches π« Skip when: You want all top-k results, regardless of quality |
is_linear | Forces a brute-force linear search instead of using the index | π Very slow for large datasets! β Only use for: Debugging, tiny collections, or verifying index accuracy |
is_using_refiner | Enables exact score refinement (recomputes exact similarity scores) for top candidates β helpful for quantized vectors to recover accuracy and boost recall quality | β
Recommended: When you need higher accuracy for quantized vectors β οΈ Adds latency due to exact re-scoring |