DiskANN Index (Disk-based Approximate Nearest Neighbor)
A disk-based graph index for billion-scale approximate nearest neighbor search, delivering high recall with a much smaller memory footprint at the cost of lower QPS.
A disk-based graph index designed for billion-scale vector search β keeping compressed vectors in memory and full-precision vectors on disk, enabling high-recall approximate nearest neighbor search with a dramatically smaller memory footprint.
DiskANN was introduced by Subramanya et al. in the NeurIPS 2019 paper DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node.
The trade-off: because every query path involves disk I/O, DiskANN delivers lower QPS than pure in-memory indexes (such as HNSW). It is best suited for throughput- and latency-tolerant workloads that must handle very large datasets under tight memory budgets.
How It Works
DiskANN builds a Vamana graph over the full dataset and stores it on disk along with the original vectors. At search time, only compressed PQ (Product Quantization) codes live in memory, while the graph and full-precision vectors are read from disk on demand.
- Vamana graph for navigation πͺ
- A single-layer graph where each node is connected to up to
max_degreeneighbors. - The graph is built using a greedy search-and-prune strategy with an alpha parameter that encourages long-range edges, providing fast convergence to the query's neighborhood.
- A medoid (the point closest to the dataset's centroid) serves as the fixed entry point for every search.
- A single-layer graph where each node is connected to up to
- Product Quantization (PQ) for distance estimation π
- The vector space is split into
pq_chunk_numsub-spaces, and each sub-vector is quantized into a 256-centroid codebook (8-bit PQ codes). - At query time, a PQ distance lookup table is precomputed for the query, allowing approximate distances to all candidates to be computed via fast table lookups instead of full-precision arithmetic.
- The vector space is split into
- Cached beam search π
- Search starts from the medoid and explores the graph using a beam search strategy β multiple frontier nodes are expanded concurrently via batched disk I/O.
- Frequently accessed nodes near the entry point are cached in memory (BFS-level caching), reducing disk reads for hot regions of the graph.
- For each visited node, approximate distances are computed using PQ codes, and the full-precision vector is read from disk to compute the exact distance for the top-k result candidates.
When to Use a DiskANN Index?
- β Billion-scale datasets that cannot fit entirely in memory
- β Cost-sensitive deployments where minimizing RAM is critical
- β Batch workloads and offline analytics that can tolerate slightly higher latency per query compared to in-memory indexes
Best Practice: Use DiskANN when your dataset far exceeds available RAM. It provides strong recall with memory consumption proportional only to PQ codes, not full vectors. For datasets that fit in memory, prefer HNSW or HNSW-RaBitQ for lower latency.
Advantages
- β¨ Extremely low memory footprint β Only PQ-compressed codes (1 byte per chunk per vector) reside in memory, making billion-scale search feasible on commodity hardware
- β¨ High recall β The Vamana graph preserves connectivity and diversity through its alpha-based pruning, and exact distances are recomputed for final candidates
- β¨ Scalable graph construction β Built with a simple greedy insert-and-prune algorithm that can be parallelized across threads
Trade-offs
- β οΈ Higher query latency β Each search requires disk I/O for graph traversal, making it slower than pure in-memory indexes like HNSW
- β οΈ Build-time PQ training β Requires a KMeans-based PQ training step before index construction, adding to the total build time
- β οΈ Not suited for real-time workloads β Disk access latency means DiskANN is better for latency-tolerant use cases or scenarios with relatively low QPS requirements
Key Parameters
Tuning Tip:
Start with defaults. Adjust list_size at query time first for recall/latency trade-offs. Only increase max_degree if you need better recall β but expect higher disk usage and longer build times. Reduce pq_chunk_num if you need to cut memory further and can tolerate lower recall.
Index-Time Parameters
| Parameter | Description | Tuning Guidance |
|---|---|---|
metric_type | Similarity metric used to compare vectors | Choose based on how your embeddings were trained |
max_degree | Max neighbors per node β The maximum number of edges per node in the Vamana graph | β’ Higher max_degree β β¨ better recall and graph connectivity β οΈ more disk usage and longer build time |
list_size | Build-time candidate list size β Number of candidates considered during graph construction when inserting a new vector | β’ Higher list_size β β¨ better graph quality and higher recall β οΈ longer index build time |
pq_chunk_num | Number of PQ sub-spaces β Controls how the vector dimensions are partitioned for Product Quantization | β’ More chunks β β¨ finer-grained distance approximation and better recall β οΈ more memory for PQ codes (1 byte per chunk per vector) |
Query-Time Parameters
| Parameter | Description | Tuning Guidance |
|---|---|---|
list_size | Query-time candidate list size β Determines how many candidates are maintained during beam search graph traversal | β’ Higher list_size β β¨ higher recall β οΈ more disk I/O and higher query latency |