Upsert Documents
upsert() works just like insert() — it adds one or more new documents (Doc) to a collection.
The key difference is that if a document with the same id already exists, it will be overwritten.
- Use
upsert()if you want to overwrite an existing document (or don't mind replacing it). - Use
insert()if you want to avoid accidentally overwriting a document —insert()will fail if a document with the same id already exists.
Performance Tip:
New vectors are initially buffered for fast ingestion. For optimal search performance, call optimize() after upserting a large batch of documents.
Document Doc
Each Doc passed to upsert() must:
- Have an
id(if a document with the sameidalready exists, it will be replaced) - Provide data that matches the collection's schema:
- Scalar fields go in the
fieldsdictionary (field names as keys) - Vector embeddings go in the
vectorsdictionary (vector names as keys)
- Scalar fields go in the
- You can omit
nullablescalar fields if a document doesn't have a value for them
Upsert a Single Document
Assume you already have a collection with the following schema:
- A scalar field:
text(string) - A dense vector embedding:
text_embedding(4-dimensional FP32 vector)The 4-dimensional vector is for demonstration only — real-world embeddings are usually much larger.
You've also opened the collection and have a collection object ready.
Now, upsert a document like this:
import zvec
# Create a document
doc = zvec.Doc(
id="text_1", # ← must be unique
vectors={
"text_embedding": [0.1, 0.2, 0.3, 0.4], # ← must match the vector name
# ↑ list of floats; list length = dimension (4)
},
fields={
"text": "This is a sample text.", # ← must match the scalar field name
},
)
# Upsert the document
result = collection.upsert(doc)
print(result) # {"code": 0} means successThe upsert() method returns a Status object for single-document upsertion.
{"code": 0}indicates success.- Non-zero codes indicate failure.
Successfully upserted documents are immediately available for querying 🚀.
Upsert a Batch of Documents
To upsert multiple documents at once, pass a list of Doc objects to upsert().
Each Doc is processed independently, and the method returns a list of Status objects — one per document.
import zvec
result = collection.upsert(
[
zvec.Doc(
id="text_1",
vectors={"text_embedding": [0.1, 0.2, 0.3, 0.4]},
fields={"text": "This is a sample text."},
),
zvec.Doc(
id="text_2",
vectors={"text_embedding": [0.4, 0.3, 0.2, 0.1]},
fields={"text": "This is another sample text."},
),
zvec.Doc(
id="text_3",
vectors={"text_embedding": [-0.1, -0.2, -0.3, -0.4]},
fields={"text": "One more sample text."},
),
]
)
print(result) # [{"code":0}, {"code":0}, {"code":0}]A failure in one document (e.g., missing a required field or invalid data type) does not stop the others from being upserted.
🔍 Always check each Status in the result list.
Upsert Documents with Sparse Vectors
Assume your collection includes a sparse vector named sparse_embedding.
Upsert a document with a sparse vector like this:
import zvec
result = collection.upsert(
zvec.Doc(
id="text_1",
vectors={
"sparse_embedding": {
42: 1.25, # ← dimension 42 has weight 1.25
1337: 0.8, # ← dimension 1337 has weight 0.8
2999: 0.63, # ← dimension 1999 has weight 0.63
}
},
)
)
print(result) # {"code":0}A sparse vector is represented as a dictionary dict[int, float].
There is no fixed dimension size — only non-zero dimensions need to be included.
Upsert Documents with Multiple Fields and Vectors
Real-world applications often require collections with multiple scalar fields and vector embeddings.
In this example, assume your collection includes the following schema:
- Scalar fields:
book_title(string)category(array of strings)publish_year(32-bit integer)
- Vector embeddings:
dense_embedding: a 768-dimensional dense vectorsparse_embedding: a sparse vector
Upsert a document with multiple fields and vectors like this:
import zvec
# Create a document
doc = zvec.Doc(
id="book_1",
vectors={
"dense_embedding": [0.1 for _ in range(768)], # ← use real embedding in practice
"sparse_embedding": {42: 1.25, 1337: 0.8, 1999: 0.64}, # ← use real embedding in practice
},
fields={
"book_title": "Gone with the Wind", # ← string
"category": ["Romance", "Classic Literature"], # ← array of strings
"publish_year": 1936, # ← integer
},
)
# Upsert the document
result = collection.upsert(doc)
print(result) # {"code": 0} means success