Zvec Logo

Define a Collection Schema

A collection schema CollectionSchema defines the structure that every document inserted into the collection must conform to.

The schema in Zvec is dynamic: you can add or remove scalar fields and vectors at any time without rebuilding the collection.

CollectionSchema has three parts:

  1. name: An identifier for the collection.
  2. fields: A list of scalar fields.
  3. vectors: A list of vector fields.

A human-readable identifier for your collection. This name is used internally for reference and logging.

Scalar fields store non-vector (i.e., structured) data — such as strings, numbers, booleans, or arrays.

Each field is defined using FieldSchema with the following properties:

  1. name: A unique string identifier for the field within the collection.
  2. data_type: The type of data stored — e.g., STRING, INT64, or array types like ARRAY_STRING.
  3. nullable (optional): Whether the field is allowed to have no value (defaults to False).
  4. index_param (optional): Enables fast filtering by creating an inverted index via InvertIndexParam.

A vector is defined using VectorSchema with the following properties:

  1. name: A unique string identifier for the vector within the collection.
  2. data_type: The numeric format of the vector.
  3. dimension: Required for dense vectors — the number of dimensions.
  4. index_param: Configures the vector index type and similarity metric.

Choosing Vector Index Type

The index_param allows you to configure the appropriate indexing strategy:

  • metric_type: COSINE, L2, or IP (inner product) — Ensure your metric matches how your embeddings were trained!
  • quantize_type (optional): Compress vectors to reduce index size and speed up search (with slight recall trade-off)

On this page