Upsert

upsert() works just like insert() — it adds one or more new documents (Doc) to a collection.

The key difference is that if a document with the same id already exists, it will be overwritten.

Use upsert() if you want to overwrite an existing document (or don't mind replacing it).
Use insert() if you want to avoid accidentally overwriting a document — insert() will fail if a document with the same id already exists.

Performance Tip:
New vectors are initially buffered for fast ingestion. For optimal search performance, call optimize() after upserting a large batch of documents.

Document `Doc`

Each Doc passed to upsert() must:

Have an id (if a document with the same id already exists, it will be replaced)
Provide data that matches the collection's schema:
1. Scalar fields go in the fields dictionary (field names as keys)
2. Vector embeddings go in the vectors dictionary (vector names as keys)
You can omit nullable scalar fields if a document doesn't have a value for them

Upsert a Single Document

Assume you already have a collection with the following schema:

A scalar field: text (string)
A dense vector embedding: text_embedding (4-dimensional FP32 vector)
The 4-dimensional vector is for demonstration only — real-world embeddings are usually much larger.

You've also opened the collection and have a collection object ready.

Now, upsert a document like this:

Upsert a document

import zvec

# Create a document
doc = zvec.Doc(  
    id="text_1",  # ← must be unique
    vectors={
        "text_embedding": [0.1, 0.2, 0.3, 0.4],  # ← must match the vector name
                          # ↑ list of floats; list length = dimension (4)
    },
    fields={
        "text": "This is a sample text.",  # ← must match the scalar field name
    },
)

# Upsert the document
result = collection.upsert(doc)  
print(result)  # {"code": 0} means success

The upsert() method returns a Status object for single-document upsertion.

{"code": 0} indicates success.
Non-zero codes indicate failure.

Successfully upserted documents are immediately available for querying 🚀.

Upsert a Batch of Documents

To upsert multiple documents at once, pass a list of Doc objects to upsert().
Each Doc is processed independently, and the method returns a list of Status objects — one per document.

Upsert a batch of documents

import zvec

result = collection.upsert(  
    [
        zvec.Doc(
            id="text_1",
            vectors={"text_embedding": [0.1, 0.2, 0.3, 0.4]},
            fields={"text": "This is a sample text."},
        ),
        zvec.Doc(
            id="text_2",
            vectors={"text_embedding": [0.4, 0.3, 0.2, 0.1]},
            fields={"text": "This is another sample text."},
        ),
        zvec.Doc(
            id="text_3",
            vectors={"text_embedding": [-0.1, -0.2, -0.3, -0.4]},
            fields={"text": "One more sample text."},
        ),
    ]
)

print(result)  # [{"code":0}, {"code":0}, {"code":0}]

A failure in one document (e.g., missing a required field or invalid data type) does not stop the others from being upserted.
🔍 Always check each Status in the result list.

Upsert Documents with Sparse Vectors

Assume your collection includes a sparse vector named sparse_embedding.

Upsert a document with a sparse vector like this:

Upsert a document with a sparse vector

import zvec

result = collection.upsert(  
    zvec.Doc(
        id="text_1",
        vectors={
            "sparse_embedding": {
                42: 1.25,  # ← dimension 42 has weight 1.25
                1337: 0.8,  # ← dimension 1337 has weight 0.8
                2999: 0.63,  # ← dimension 1999 has weight 0.63
            }
        },
    )
)

print(result)  # {"code":0}

A sparse vector is represented as a dictionary dict[int, float].
There is no fixed dimension size — only non-zero dimensions need to be included.

Upsert Documents with Multiple Fields and Vectors

Real-world applications often require collections with multiple scalar fields and vector embeddings.
In this example, assume your collection includes the following schema:

Scalar fields:
1. book_title (string)
2. category (array of strings)
3. publish_year (32-bit integer)
Vector embeddings:
1. dense_embedding: a 768-dimensional dense vector
2. sparse_embedding: a sparse vector

Upsert a document with multiple fields and vectors like this:

Upsert a document with multiple fields and vectors

import zvec

# Create a document
doc = zvec.Doc(  
    id="book_1",
    vectors={
        "dense_embedding": [0.1 for _ in range(768)],  # ← use real embedding in practice
        "sparse_embedding": {42: 1.25, 1337: 0.8, 1999: 0.64},  # ← use real embedding in practice
    },
    fields={
        "book_title": "Gone with the Wind",  # ← string
        "category": ["Romance", "Classic Literature"],  # ← array of strings
        "publish_year": 1936,  # ← integer
    },
)

# Upsert the document
result = collection.upsert(doc)  
print(result)  # {"code": 0} means success

Upsert Documents

Document Doc

Upsert a Single Document

Code Example

Upsert a Batch of Documents

Upsert Documents with Sparse Vectors

Code Example

Upsert Documents with Multiple Fields and Vectors

Code Example

On this page

Document `Doc`