Zvec Logo

Insert Documents

Use the insert() method to add one or more new documents (Doc) to a collection.

Performance Tip:
New vectors are initially buffered for fast ingestion. For optimal search performance, call optimize() after inserting a large batch of documents.


Document Doc

Each Doc passed to insert() must:

  • Have a unique id (not already present in the collection)
  • Provide data that matches the collection's schema:
    1. Scalar fields: provided as key–value pairs under fields (scalar field names as keys)
    2. Vector embeddings: provided as key–value pairs under vectors (vector names as keys)
  • You can omit nullable scalar fields if a document doesn't have a value for them

If a document with the same id already exists in the collection, the insertion will fail for that document.
To overwrite existing documents or insert without checking, use upsert() instead.


Insert a Single Document

Assume you already have a collection with the following schema:

  • A scalar field: text (string)
  • A dense vector embedding: text_embedding (4-dimensional FP32 vector)

    The 4-dimensional vector is for demonstration only — real-world embeddings are usually much larger.

You've also opened the collection and have a collection object ready.

Now, insert a document like this:

Insert a document
import zvec

# Create a document
doc = zvec.Doc(  
    id="text_1",  # ← must be unique
    vectors={
        "text_embedding": [0.1, 0.2, 0.3, 0.4],  # ← must match the vector name
                          # ↑ list of floats; list length = dimension (4)
    },
    fields={
        "text": "This is a sample text.",  # ← must match the scalar field name
    },
)

# Insert the document
result = collection.insert(doc)  
print(result)  # {"code": 0} means success

The insert() method validates the document first:

  • Incorrect usage — such as an unknown field or wrong vector dimension — raises an exception.
  • If validation passes, the method proceeds with the insertion and returns a Status object:
    • {"code": 0} indicates success.
    • Non-zero codes indicate failure (e.g., duplicate ID, insufficient disk space).

Successfully inserted documents are immediately available for querying 🚀.

Insert a document
import { ZVecCollection, ZVecDocInput, ZVecOpen } from "@zvec/zvec";

// Create a document
let doc: ZVecDocInput = { 
    id: "text_1",   // ← must be unique
    vectors: {
        "text_embedding": [0.1, 0.2, 0.3, 0.4]  // ← must match the vector name
                          // ↑ list of floats; list length = dimension (4)
    },
    fields: {
        "text": "This is a sample text."    // ← must match the scalar field name
    }
};

// Insert the document
let result = collection.insertSync(doc);  
console.log(result);  // { ok: true } means success

The insert() method validates the document first:

  • Incorrect usage — such as an unknown field or wrong vector dimension — raises an exception.
  • If validation passes, the method proceeds with the insertion and returns a Status object:
    • { ok: true } indicates success.
    • { ok: false } indicates failure (e.g., duplicate ID, insufficient disk space).

Successfully inserted documents are immediately available for querying 🚀.


Insert a Batch of Documents

To insert multiple documents at once, pass a list of Doc objects to insert().
Each Doc is processed independently, and the method returns a list of Status objects — one per document.

Insert a batch of documents
import zvec

result = collection.insert(  
    [
        zvec.Doc(
            id="text_1",
            vectors={"text_embedding": [0.1, 0.2, 0.3, 0.4]},
            fields={"text": "This is a sample text."},
        ),
        zvec.Doc(
            id="text_2",
            vectors={"text_embedding": [0.4, 0.3, 0.2, 0.1]},
            fields={"text": "This is another sample text."},
        ),
        zvec.Doc(
            id="text_3",
            vectors={"text_embedding": [-0.1, -0.2, -0.3, -0.4]},
            fields={"text": "One more sample text."},
        ),
    ]
)

print(result)  # [{"code":0}, {"code":0}, {"code":0}]

If any document in the batch has incorrect usage (e.g., an unknown field or wrong vector dimension), the method raises an exception and no documents are inserted.

If all documents are valid, the method attempts to insert every one. A failure in one (e.g., duplicate id) does not stop others from being inserted.

🔍 Always check each Status in the result list.


Insert Documents with Sparse Vectors

Assume your collection includes a sparse vector named sparse_embedding.

Insert a document with a sparse vector like this:

Insert a document with a sparse vector
import zvec

result = collection.insert(  
    zvec.Doc(
        id="text_1",
        vectors={
            "sparse_embedding": {
                42: 1.25,  # ← dimension 42 has weight 1.25
                1337: 0.8,  # ← dimension 1337 has weight 0.8
                2999: 0.63,  # ← dimension 2999 has weight 0.63
            }
        },
    )
)

print(result)  # {"code":0}

A sparse vector is represented as a mapping from dimension indices (integers) to values (floats).
There is no fixed dimension size — only non-zero dimensions need to be included.


Insert Documents with Multiple Fields and Vectors

Real-world applications often require collections with multiple scalar fields and vector embeddings. In this example, assume your collection includes the following schema:

  • Scalar fields:
    1. book_title (string)
    2. category (array of strings)
    3. publish_year (32-bit integer)
  • Vector embeddings:
    1. dense_embedding: a 768-dimensional dense vector
    2. sparse_embedding: a sparse vector

Insert a document with multiple fields and vectors like this:

Insert a document with multiple fields and vectors
import zvec

# Create a document
doc = zvec.Doc(  
    id="book_1",
    vectors={
        "dense_embedding": [0.1 for _ in range(768)],  # ← use real embedding in practice
        "sparse_embedding": {42: 1.25, 1337: 0.8, 1999: 0.64},  # ← use real embedding in practice
    },
    fields={
        "book_title": "Gone with the Wind",  # ← string
        "category": ["Romance", "Classic Literature"],  # ← array of strings
        "publish_year": 1936,  # ← integer
    },
)

# Insert the document
result = collection.insert(doc)   
print(result)  # {"code": 0} means success