A collection schema CollectionSchema defines the structure that every document inserted into the collection must conform to.

The schema in Zvec is dynamic: you can add or remove scalar fields and vectors at any time without rebuilding the collection.

CollectionSchema has three parts:

Collection Name

A human-readable identifier for your collection. This name is used internally for reference and logging.

Scalar Fields

Scalar fields store non-vector (i.e., structured) data — such as strings, numbers, booleans, or arrays.

Each field is defined using FieldSchema with the following properties:

name: A unique string identifier for the field within the collection.
data_type: The type of data stored — e.g., STRING, INT64, or array types like ARRAY_STRING.
nullable (optional): Whether the field is allowed to have no value (defaults to False).
index_param (optional): Enables fast filtering via InvertIndexParam (inverted index) or full-text search via FtsIndexParam (full-text index).

A vector is defined using VectorSchema with the following properties:

name: A unique string identifier for the vector within the collection.
data_type: The numeric format of the vector.
- Dense vectors: VECTOR_FP32, VECTOR_FP16, etc.
- Sparse vectors: SPARSE_VECTOR_FP32, SPARSE_VECTOR_FP16.
dimension: Required for dense vectors — the number of dimensions.
index_param: Configures the vector index type and similarity metric.

The index_param allows you to configure the appropriate indexing strategy:

metric_type: COSINE, L2, or IP (inner product) — Ensure your metric matches how your embeddings were trained!
quantize_type (optional): Compress vectors to reduce index size and speed up search (with slight recall trade-off).