Upsert Documents
upsert() works similar to insert() — it adds one or more new documents (Doc) to a collection.
The key difference is that if a document with the same id already exists, it will be overwritten.
- Use
upsert()if you want to overwrite an existing document (or don't mind replacing it). - Use
insert()if you want to avoid accidentally overwriting a document —insert()will fail if a document with the same id already exists.
Performance Tip:
New vectors are initially buffered for fast ingestion. For optimal search performance, call optimize() after upserting a large batch of documents.
Document Doc
Each Doc passed to upsert() must:
- Have an
id(if a document with the sameidalready exists, it will be replaced) - Provide data that matches the collection's schema:
- Scalar fields: provided as key–value pairs under
fields(scalar field names as keys) - Vector embeddings: provided as key–value pairs under
vectors(vector names as keys)
- Scalar fields: provided as key–value pairs under
- You can omit
nullablescalar fields if a document doesn't have a value for them
Upsert a Single Document
Assume you already have a collection with the following schema:
- A scalar field:
text(string) - A dense vector embedding:
text_embedding(4-dimensional FP32 vector)The 4-dimensional vector is for demonstration only — real-world embeddings are usually much larger.
You've also opened the collection and have a collection object ready.
Now, upsert a document like this:
import zvec
# Create a document
doc = zvec.Doc(
id="text_1", # ← must be unique
vectors={
"text_embedding": [0.1, 0.2, 0.3, 0.4], # ← must match the vector name
# ↑ list of floats; list length = dimension (4)
},
fields={
"text": "This is a sample text.", # ← must match the scalar field name
},
)
# Upsert the document
result = collection.upsert(doc)
print(result) # {"code": 0} means successThe upsert() method validates the document first:
- Incorrect usage — such as an unknown field or wrong vector dimension — raises an exception.
- If validation passes, the method proceeds with the upsertion and returns a
Statusobject:{"code": 0}indicates success.- Non-zero codes indicate failure (e.g., insufficient disk space).
Successfully upserted documents are immediately available for querying 🚀.
import { ZVecCollection, ZVecDocInput, ZVecOpen } from "@zvec/zvec";
// Create a document
let doc: ZVecDocInput = {
id: "text_1", // ← must be unique
vectors: {
"text_embedding": [0.1, 0.2, 0.3, 0.4] // ← must match the vector name
// ↑ list of floats; list length = dimension (4)
},
fields: {
"text": "This is a sample text." // ← must match the scalar field name
}
};
// Upsert the document
let result = collection.upsertSync(doc);
console.log(result); // { ok: true } means successThe upsert() method validates the document first:
- Incorrect usage — such as an unknown field or wrong vector dimension — raises an exception.
- If validation passes, the method proceeds with the upsertion and returns a
Statusobject:{ ok: true }indicates success.{ ok: false }indicates failure (e.g., insufficient disk space).
Successfully upserted documents are immediately available for querying 🚀.
Upsert a Batch of Documents
To upsert multiple documents at once, pass a list of Doc objects to upsert().
Each Doc is processed independently, and the method returns a list of Status objects — one per document.
import zvec
result = collection.upsert(
[
zvec.Doc(
id="text_1",
vectors={"text_embedding": [0.1, 0.2, 0.3, 0.4]},
fields={"text": "This is a sample text."},
),
zvec.Doc(
id="text_2",
vectors={"text_embedding": [0.4, 0.3, 0.2, 0.1]},
fields={"text": "This is another sample text."},
),
zvec.Doc(
id="text_3",
vectors={"text_embedding": [-0.1, -0.2, -0.3, -0.4]},
fields={"text": "One more sample text."},
),
]
)
print(result) # [{"code":0}, {"code":0}, {"code":0}]If any document in the batch has incorrect usage (e.g., an unknown field or wrong vector dimension), the method raises an exception and no documents are upserted.
If all documents are valid, the method attempts to upsert every one. A failure in one (e.g., insufficient disk space) does not stop others from being upserted.
🔍 Always check each Status in the result list.
Upsert Documents with Sparse Vectors
Assume your collection includes a sparse vector named sparse_embedding.
Upsert a document with a sparse vector like this:
import zvec
result = collection.upsert(
zvec.Doc(
id="text_1",
vectors={
"sparse_embedding": {
42: 1.25, # ← dimension 42 has weight 1.25
1337: 0.8, # ← dimension 1337 has weight 0.8
2999: 0.63, # ← dimension 2999 has weight 0.63
}
},
)
)
print(result) # {"code":0}A sparse vector is represented as a mapping from dimension indices (integers) to values (floats).
There is no fixed dimension size — only non-zero dimensions need to be included.
Upsert Documents with Multiple Fields and Vectors
Real-world applications often require collections with multiple scalar fields and vector embeddings. In this example, assume your collection includes the following schema:
- Scalar fields:
book_title(string)category(array of strings)publish_year(32-bit integer)
- Vector embeddings:
dense_embedding: a 768-dimensional dense vectorsparse_embedding: a sparse vector
Upsert a document with multiple fields and vectors like this:
import zvec
# Create a document
doc = zvec.Doc(
id="book_1",
vectors={
"dense_embedding": [0.1 for _ in range(768)], # ← use real embedding in practice
"sparse_embedding": {42: 1.25, 1337: 0.8, 1999: 0.64}, # ← use real embedding in practice
},
fields={
"book_title": "Gone with the Wind", # ← string
"category": ["Romance", "Classic Literature"], # ← array of strings
"publish_year": 1936, # ← integer
},
)
# Upsert the document
result = collection.upsert(doc)
print(result) # {"code": 0} means success