The go-to graph-based index for low-latency approximate nearest neighbor search, delivering top-tier speed and recall at the cost of higher memory.

How It Works

HNSW builds a multi-layer graph structure where each node represents a vector and edges connect nodes based on similarity.

HNSW Example

Layers form a hierarchy 🪜
- Upper layers are sparse with fewer nodes, acting as "highways" for fast long-range navigation.
- Lower layers are dense with more nodes, providing fine-grained local neighborhood connectivity.
How search works (coarse → fine) 🔍
1. Start from an entry point at the top layer.
2. At the current layer, you greedily walk to neighbors that are closer to the query vector, until you can't get any closer.
3. Then you drop down one layer at that position and repeat the same greedy search.
4. On the lowest layer, this process is done more carefully (with a candidate list) to refine the result.
Why this is fast and accurate ⚡ 🎯
- Fast: Upper layers let you jump quickly to the right region without visiting most points.
- Accurate: The dense bottom layer lets you explore the local neighborhood thoroughly, so recall stays high.

When to Use an HNSW Index?

✅ Real-time, low-latency applications (e.g., conversational AI and live recommendations)
✅ Production systems requiring consistent high recall with minimal latency

Best Practice: HNSW is our recommended default for most production use cases. It strikes an excellent balance between speed, accuracy, and robustness.

Advantages

✨ Near-logarithmic query time — Typically O(log n) for large datasets
✨ Consistently high recall across diverse data distributions
✨ Faster indexing than many alternatives (e.g., IVF-based methods)

Trade-offs

⚠️ Higher memory footprint — Graph links require additional storage (scales with m)
⚠️ Indexing complexity of O(n log n) — Slower build time than Flat index (but often faster than IVF)

Tuning Tip: Start with defaults, then adjust ef first for recall/latency trade-offs. Only if needed, increase ef_construction or m for better accuracy — but expect slower indexing and higher memory use.

Index-Time Parameters

Code Example Python API Reference Node.js API Reference

Parameter	Description	Tuning Guidance
`metric_type`	Similarity metric used to compare vectors	Choose based on how your embeddings were trained
`m`	Max neighbors per node — The maximum number of bidirectional links created for each node during graph construction	• Higher `m` → ✨ better recall and graph connectivity ⚠️ more memory usage and higher latency for both indexing and search
`ef_construction`	Index-time candidate pool size — Determines how many neighboring candidates the algorithm considers when inserting a new vector into the graph	• Higher `ef_construction` → ✨ better graph quality and higher recall ⚠️ longer index build time (does not affect query speed)
`quantize_type`	Vector quantization method to apply Defaults to no quantization	See Quantization for more details

Query-Time Parameters