Create

To create a new, empty collection in Zvec, you need to define the following:

Schema — the structural blueprint of your data, specifying scalar fields and vector embeddings.
Collection options (optional) — runtime settings that control how the collection behaves when opened (e.g., read-only mode).

Once defined, you call create_and_open() to create the collection on disk and get a ready-to-use Collection handle.

If a collection already exists at the specified path, create_and_open() will raise an error to prevent accidental overwrites.

Step 1: Define the Schema

A collection schema CollectionSchema defines the structure that every document inserted into the collection must conform to.

The schema in Zvec is dynamic: you can add or remove scalar fields and vectors at any time without rebuilding the collection.

CollectionSchema has three parts:

name: An identifier for the collection.
fields: A list of scalar fields.
vectors: A list of vector fields.

1. Collection Name

A human-readable identifier for your collection. This name is used internally for reference and logging.

2. Scalar Fields

Scalar fields store non-vector (i.e., structured) data — such as strings, numbers, booleans, or arrays.

Each field is defined using FieldSchema with the following properties:

🔤 name: A unique string identifier for the field within the collection.
🧬 data_type: The type of data stored — e.g., STRING, INT64, or array types like ARRAY_STRING.
⭕ nullable (optional): Whether the field is allowed to have no value (defaults to False).
🚀 index_param (optional): Enables fast filtering by creating an inverted index via InvertIndexParam.

Tip:
Only add an index to fields you plan to filter on. Unindexed fields save storage and write overhead.

If you do enable indexing, you can optionally activate performance-enhancing (but storage-costly) features:

enable_range_optimization=True → faster range queries (e.g., price > 100)
enable_extended_wildcard=True → complex string pattern matching (e.g., name LIKE 'abc%def')

3. Vectors (Embeddings)

A vector is defined using VectorSchema with the following properties:

🔤 name: A unique string identifier for the vector within the collection.
🧬 data_type: The numeric format of the vector.
- Dense vectors: VECTOR_FP32, VECTOR_FP16, etc.
- Sparse vectors: SPARSE_VECTOR_FP32, SPARSE_VECTOR_FP16.
📐 dimension: Required for dense vectors — the number of dimensions.
🚀 index_param: Configures the vector index type and similarity metric.

Configure the vector index via the index_param field using one of the following:

In index_param you can also specify:

metric_type:
COSINE, L2, or IP (inner product) — Ensure your metric matches how your embeddings were trained!
quantize_type (optional):
Compress vectors to reduce index size and speed up search (with slight recall trade-off)

Full Schema Example

Step 2: Configure Collection Options

The CollectionOption lets you control runtime behavior when creating the collection:

read_only: Opens the collection in read-only mode. Attempts to write will raise an error.
Note: read_only must be set to False when calling create_and_open(), since creation requires writing files to disk.
enable_mmap: Uses memory-mapped I/O for faster access (default to True). This trades slightly higher memory cache usage for improved performance.

Collection option

import zvec

collection_option = zvec.CollectionOption(read_only=False, enable_mmap=True)

Step 3: Create and Open the Collection

With your schema and options ready, call create_and_open() to create the collection at the desired path:

Create and open a collection

import zvec

collection = zvec.create_and_open(  
    path="/path/to/my/collection",
    schema=collection_schema,
    option=collection_option,
)

The returned collection object is immediately ready for inserting documents, running queries, or managing data.

Real-World Example: 🛒 Product Search

This schema models a multi-modal product search system, combining visual, textual, and structured metadata for rich retrieval:

🗂️ Scalar Fields: For Filtering & Display

category (array of strings, indexed):
Enables queries like category CONTAIN_ANY ("electronics", "headphones") to find products that belong to either "electronics" or "headphones" (or both).
price (integer, indexed with range optimization):
Supports fast range queries such as price > 100.
in_stock (boolean, indexed):
Enables instant filtering by availability (e.g., "only show items in stock").
image_url and description are stored but not indexed, since they're only used for display.

📐 Vector Embeddings: For Semantic Relevance

Two dense vectors capture semantic meaning:
- image_vec: 512-dimensional embeddings from product images (e.g., via a vision model).
- description_vec: 768-dimensional embeddings from product descriptions (e.g., from a language model), stored with quantization.
One sparse vector keywords_sparse for keyword matching, enabling hybrid sparse-dense search.

Create a New Collection

Step 1: Define the Schema

1. Collection Name

2. Scalar Fields

3. Vectors (Embeddings)

Full Schema Example

Step 2: Configure Collection Options

Step 3: Create and Open the Collection

Real-World Example: 🛒 Product Search

🗂️ Scalar Fields: For Filtering & Display

📐 Vector Embeddings: For Semantic Relevance

On this page

Create a New Collection

Code Example

Code Example

Code Example

Code Example

On this page