IVF Index
Inverted File Index
A partition-based, scalable, memory-efficient indexing method best suited for large datasets — especially those with natural cluster structure — at the expense of slower indexing and more involved tuning
How It Works
IVF operates by partitioning the entire vector space into clusters. The number of clusters is controlled by the parameter n_list (short for "number of lists").
Indexing Phase ⚙️
-
Clustering: The algorithm first applies a clustering algorithm, creating
n_listclusters. Each cluster is represented by its centroid — a central point that best represents all vectors assigned to that cluster. -
Assignment: Each vector in the dataset is assigned to the cluster whose centroid is closest to it. The vector is then stored in an inverted list (also called a "bucket") associated with that centroid. The index essentially becomes a mapping from centroids to their associated vectors.
Query Phase 🔍
-
Centroid Selection: When a query vector arrives, the system first computes distances between the query and all
n_listcentroids to identify then_probenearest centroids (n_probeis a parameter that controls how many centroids to consider). -
Local Search: Instead of scanning the entire dataset of
Nvectors, the search is restricted to only the vectors stored in then_probeselected buckets. A brute-force (or refined) search is then performed within those buckets to find the final nearest neighbors.
When to Use an IVF Index
- ✅ Your vector dataset exhibits natural clustering or locality structure
- ✅ You’re working with very large datasets and memory efficiency is critical
- ✅ You need to tune parameters to achieve optimal performance
Best Practice:
- IVF works best on datasets with inherent clustering structure.
- For even greater memory efficiency and scalability, combine it with Product Quantization (PQ).
- The performance of IVF is highly sensitive to parameters like
n_list. As such, IVF is best suited for practitioners who can systematically experiment with, validate, and optimize these settings for their specific data distribution and latency-recall requirements.
Advantages
- ✨ Compatibility — Often used as a base layer in composite indexes (e.g., IVF-PQ) for further optimization
- ✨ Scalability — Query time scales approximately as O(N /
n_list×n_probe), making it highly efficient for largeNwhenn_listis large andn_probeis small - ✨ Memory Efficiency — Stores vectors in compact inverted lists with minimal overhead from centroids and list pointers, typically uses significantly less memory than graph-based methods like HNSW
Trade-offs
- ⚠️ Indexing Overhead — Building the index requires clustering, which can be computationally intensive and slower to build than indexes like HNSW
- ⚠️ Parameter Sensitivity — Accuracy and latency are highly dependent on the choice of
n_listandn_probe
Key Parameters
Index-Time Parameters
| Parameter | Description | Tuning Guidance |
|---|---|---|
metric_type | Similarity metric used to compare vectors | Choose based on how your embeddings were trained |
n_list | Number of clusters (inverted lists) — the vector space is partitioned into this many clusters during indexing | • Start with n_list ≈ , where N is the number of vectors • Larger n_list → ✨ finer partitioning, smaller buckets, faster search — but ⚠️ higher indexing cost and more centroids to manage • Smaller n_list → ✨ faster index construction — but ⚠️ larger buckets and slower search |
n_iters | Centroid refinement iterations — number of passes to optimize cluster centroids during indexing | • Higher n_iters → ✨ better clustering quality ⚠️ longer index build time |
quantize_type | Vector quantization method to apply Defaults to no quantization | See Quantization for more details |
Query-Time Parameters
| Parameter | Description | Tuning Guidance |
|---|---|---|
n_probe | Number of clusters to search at query time — the system retrieves candidates only from these nearest clusters. | • Higher n_probe → ✨ higher recall ⚠️ slower queries |
radius | Distance (similarity) threshold for range-based filtering — only documents satisfying the threshold are returned | Example: With inner product metric MetricType.IP, set radius=0.6 to keep only results with score > 0.6 ✅ Use when: You want to filter out low-quality matches 🚫 Skip when: You want all top-k results, regardless of quality |
is_linear | Forces a brute-force linear search instead of using the index | 🐌 Very slow for large datasets! ✅ Only use for: Debugging, tiny collections, or verifying index accuracy |