Zvec Logo

Upsert Documents

upsert() works just like insert() — it adds one or more new documents (Doc) to a collection.

The key difference is that if a document with the same id already exists, it will be overwritten.

  • Use upsert() if you want to overwrite an existing document (or don't mind replacing it).
  • Use insert() if you want to avoid accidentally overwriting a document — insert() will fail if a document with the same id already exists.

Performance Tip:
New vectors are initially buffered for fast ingestion. For optimal search performance, call optimize() after upserting a large batch of documents.


Document Doc

Each Doc passed to upsert() must:

  • Have an id (if a document with the same id already exists, it will be replaced)
  • Provide data that matches the collection's schema:
    1. Scalar fields go in the fields dictionary (field names as keys)
    2. Vector embeddings go in the vectors dictionary (vector names as keys)
  • You can omit nullable scalar fields if a document doesn't have a value for them

Upsert a Single Document

Assume you already have a collection with the following schema:

  • A scalar field: text (string)
  • A dense vector embedding: text_embedding (4-dimensional FP32 vector)

    The 4-dimensional vector is for demonstration only — real-world embeddings are usually much larger.

You've also opened the collection and have a collection object ready.

Now, upsert a document like this:

Upsert a document
import zvec

# Create a document
doc = zvec.Doc(  
    id="text_1",  # ← must be unique
    vectors={
        "text_embedding": [0.1, 0.2, 0.3, 0.4],  # ← must match the vector name
                          # ↑ list of floats; list length = dimension (4)
    },
    fields={
        "text": "This is a sample text.",  # ← must match the scalar field name
    },
)

# Upsert the document
result = collection.upsert(doc)  
print(result)  # {"code": 0} means success

The upsert() method returns a Status object for single-document upsertion.

  • {"code": 0} indicates success.
  • Non-zero codes indicate failure.

Successfully upserted documents are immediately available for querying 🚀.


Upsert a Batch of Documents

To upsert multiple documents at once, pass a list of Doc objects to upsert().
Each Doc is processed independently, and the method returns a list of Status objects — one per document.

Upsert a batch of documents
import zvec

result = collection.upsert(  
    [
        zvec.Doc(
            id="text_1",
            vectors={"text_embedding": [0.1, 0.2, 0.3, 0.4]},
            fields={"text": "This is a sample text."},
        ),
        zvec.Doc(
            id="text_2",
            vectors={"text_embedding": [0.4, 0.3, 0.2, 0.1]},
            fields={"text": "This is another sample text."},
        ),
        zvec.Doc(
            id="text_3",
            vectors={"text_embedding": [-0.1, -0.2, -0.3, -0.4]},
            fields={"text": "One more sample text."},
        ),
    ]
)

print(result)  # [{"code":0}, {"code":0}, {"code":0}]

A failure in one document (e.g., missing a required field or invalid data type) does not stop the others from being upserted.
🔍 Always check each Status in the result list.


Upsert Documents with Sparse Vectors

Assume your collection includes a sparse vector named sparse_embedding.

Upsert a document with a sparse vector like this:

Upsert a document with a sparse vector
import zvec

result = collection.upsert(  
    zvec.Doc(
        id="text_1",
        vectors={
            "sparse_embedding": {
                42: 1.25,  # ← dimension 42 has weight 1.25
                1337: 0.8,  # ← dimension 1337 has weight 0.8
                2999: 0.63,  # ← dimension 1999 has weight 0.63
            }
        },
    )
)

print(result)  # {"code":0}

A sparse vector is represented as a dictionary dict[int, float].
There is no fixed dimension size — only non-zero dimensions need to be included.


Upsert Documents with Multiple Fields and Vectors

Real-world applications often require collections with multiple scalar fields and vector embeddings.
In this example, assume your collection includes the following schema:

  • Scalar fields:
    1. book_title (string)
    2. category (array of strings)
    3. publish_year (32-bit integer)
  • Vector embeddings:
    1. dense_embedding: a 768-dimensional dense vector
    2. sparse_embedding: a sparse vector

Upsert a document with multiple fields and vectors like this:

Upsert a document with multiple fields and vectors
import zvec

# Create a document
doc = zvec.Doc(  
    id="book_1",
    vectors={
        "dense_embedding": [0.1 for _ in range(768)],  # ← use real embedding in practice
        "sparse_embedding": {42: 1.25, 1337: 0.8, 1999: 0.64},  # ← use real embedding in practice
    },
    fields={
        "book_title": "Gone with the Wind",  # ← string
        "category": ["Romance", "Classic Literature"],  # ← array of strings
        "publish_year": 1936,  # ← integer
    },
)

# Upsert the document
result = collection.upsert(doc)  
print(result)  # {"code": 0} means success