Zvec Logo

Inverted Index

An inverted index is a data structure used to store and organize information for efficient value-based search and retrieval.

It is widely used in database systems, search engines, and analytics platforms to accelerate filtering operations and keyword matching.

When to Use an Inverted Index

Use an inverted index when your workload involves frequent lookups based on specific field values. It excels in scenarios such as:

  • ✅ Exact-value filtering:
    • status = "active"
    • category IN ("electronics", "books")
  • ✅ Range queries:
    • age > 25
  • ✅ Text pattern matching:
    • Starts with: product_name LIKE "Wireless%"
    • Ends with: email LIKE "%@engineering.company.com"
  • ✅ Array or set membership queries:
    • Contains any: tags CONTAIN_ANY ["sport", "music"]
    • Contains all: permissions CONTAIN_ALL ["read", "write"]

How Does It Work?

Imagine you're organizing a collection of recipes. Each recipe is a document with structured fields cuisine, author, and url.

Doc IDCuisineAuthorURL
1ItalianJulia Chenhttps://cooking.com/italian-pasta-carbonara
2ThaiLiam Tranhttps://cooking.com/thai-basil-42
3MexicanElena Gomezhttps://cooking.com/mexican-pork-chicken-65
4ItalianMarco Rossihttps://cooking.com/italian-pizza-37
5ItalianMarco Rossihttps://cooking.com/italian-pasta-20
6ChineseJulia Chenhttps://cooking.com/chinese-spicy-hot-pot

A regular (or "forward") view asks:

What values does document #1 contain? → Cuisine: Italian, Author: Julia Chen

But an inverted index flips this around. Instead, it answers:

Which documents contain the value Italian? → [1, 4, 5]

To enable fast lookups, we build inverted indexes for fields that are frequently searched — like cuisine and author.

Inverted Index: cuisine

CuisineDoc IDs
Italian[1, 4, 5]
Thai[2]
Mexican[3]
Chinese[6]

Inverted Index: author

AuthorDoc IDs
Julia Chen[1, 6]
Liam Tran[2]
Elena Gomez[3]
Marco Rossi[4, 5]

With these indexes in place, queries become extremely efficient ✨:

  • "Find all Italian recipes" → look up "Italian" in the cuisine index → [1, 4, 5]
  • "Show recipes by Marco Rossi" → look up "Marco Rossi" in the author index → [4, 5]
  • "Find Italian recipes by Julia Chen" → intersect [1, 4, 5] and [1, 6][1]

We do not index url, because it's rarely used in queries. Indexing it would waste storage and slow down writes, with little benefit. Once we have a document ID, we can always fetch its url directly from the original data.

Why "inverted"?

Because it inverts the standard mapping:

DirectionMapping
ForwardDocument ID → List of Terms
InvertedTerm → List of Document IDs

This inversion is what makes keyword-based search efficient. Instead of checking every document to see if it contains your query term, you jump straight to the term and get all matching documents immediately.

Trade-offs

While powerful, inverted indexes come with costs:

  • ⚠️ Storage overhead: The index requires additional storage space.
  • ⚠️ Write amplification: Every write operation — INSERT, UPSERT, and UPDATE — require index maintenance, which adds latency to writes and increases I/O load.