A scalable, partition-based, memory-efficient index designed for massive, naturally clustered datasets — at the expense of slower indexing and tuning.

How It Works

IVF operates by partitioning the entire vector space into clusters. The number of clusters is controlled by the parameter n_list (short for "number of lists").

Indexing Phase ⚙️

Clustering: The algorithm first applies a clustering algorithm, creating n_list clusters. Each cluster is represented by its centroid — a central point that best represents all vectors assigned to that cluster.
Assignment: Each vector in the dataset is assigned to the cluster whose centroid is closest to it. The vector is then stored in an inverted list (also called a "bucket") associated with that centroid. The index essentially becomes a mapping from centroids to their associated vectors.

IVF indexing

Query Phase 🔍

Centroid Selection: When a query vector arrives, the system first computes distances between the query and all n_list centroids to identify the n_probe nearest centroids (n_probe is a parameter that controls how many centroids to consider).
Local Search: Instead of scanning the entire dataset of N vectors, the search is restricted to only the vectors stored in the n_probe selected buckets. A brute-force (or refined) search is then performed within those buckets to find the nearest neighbors.

IVF Query

When to Use an IVF Index

✅ Your vector dataset exhibits natural clustering or locality structure
✅ You're working with very large datasets and memory efficiency is critical
✅ You need to tune parameters to achieve optimal performance

Best Practice:

IVF works best on datasets with inherent clustering structure.
For even greater memory efficiency and scalability, combine it with Product Quantization (PQ).
The performance of IVF is highly sensitive to parameters like n_list. As such, IVF is best suited for practitioners who can systematically experiment with, validate, and optimize these settings for their specific data distribution and latency-recall requirements.

Advantages

✨ Compatibility — Often used as a base layer in composite indexes (e.g., IVF-PQ) for further optimization
✨ Scalability — Query time scales approximately as O(N / n_list × n_probe), making it highly efficient for large N when n_list is large and n_probe is small
✨ Memory Efficiency — Stores vectors in compact inverted lists with minimal overhead from centroids and list pointers, typically uses significantly less memory than graph-based methods like HNSW

Trade-offs

⚠️ Indexing Overhead — Building the index requires clustering, which can be computationally intensive and slower to build than indexes like HNSW
⚠️ Parameter Sensitivity — Accuracy and latency are highly dependent on the choice of n_list and n_probe

Key Parameters

Index-Time Parameters

Code Example Python API Reference Node.js API Reference

Parameter	Description	Tuning Guidance
`metric_type`	Similarity metric used to compare vectors	Choose based on how your embeddings were trained
`n_list`	Number of clusters (inverted lists) — the vector space is partitioned into this many clusters during indexing	• Start with `n_list` ≈ $\sqrt{N}$ , where `N` is the number of vectors • Larger `n_list` → ✨ finer partitioning, smaller buckets, faster search — but ⚠️ higher indexing cost and more centroids to manage • Smaller `n_list` → ✨ faster index construction — but ⚠️ larger buckets and slower search
`n_iters`	Centroid refinement iterations — number of passes to optimize cluster centroids during indexing	• Higher `n_iters` → ✨ better clustering quality ⚠️ longer index build time
`quantize_type`	Vector quantization method to apply Defaults to no quantization	See Quantization for more details

Query-Time Parameters