Extension

zvec.extension

Modules:

Name	Description
`bm25_embedding_function`
`embedding_function`
`multi_vector_reranker`
`openai_embedding_function`
`openai_function`
`qwen_embedding_function`
`qwen_function`
`qwen_rerank_function`
`rerank_function`
`sentence_transformer_embedding_function`
`sentence_transformer_function`
`sentence_transformer_rerank_function`

Classes:

Name	Description
`BM25EmbeddingFunction`	BM25-based sparse embedding function using DashText SDK.
`DenseEmbeddingFunction`	Protocol for dense vector embedding functions.
`SparseEmbeddingFunction`	Abstract base class for sparse vector embedding functions.
`RrfReRanker`	Re-ranker using Reciprocal Rank Fusion (RRF) for multi-vector search.
`WeightedReRanker`	Re-ranker that combines scores from multiple vector fields using weights.
`OpenAIDenseEmbedding`	Dense text embedding function using OpenAI API.
`OpenAIFunctionBase`	Base class for OpenAI functions.
`QwenDenseEmbedding`	Dense text embedding function using Qwen (DashScope) API.
`QwenSparseEmbedding`	Sparse text embedding function using Qwen (DashScope) API.
`QwenFunctionBase`	Base class for Qwen (DashScope) functions.
`QwenReRanker`	Re-ranker using Qwen (DashScope) cross-encoder API for semantic re-ranking.
`ReRanker`	Abstract base class for re-ranking search results.
`DefaultLocalDenseEmbedding`	Default local dense embedding using all-MiniLM-L6-v2 model.
`DefaultLocalSparseEmbedding`	Default local sparse embedding using SPLADE model.
`SentenceTransformerFunctionBase`	Base class for Sentence Transformer functions (both dense and sparse).
`DefaultLocalReRanker`	Re-ranker using Sentence Transformer cross-encoder models for semantic re-ranking.

Classes

BM25EmbeddingFunction

BM25EmbeddingFunction(
    corpus: Optional[list[str]] = None,
    encoding_type: Literal["query", "document"] = "query",
    language: Literal["zh", "en"] = "zh",
    b: float = 0.75,
    k1: float = 1.2,
    **kwargs
)

Bases: SparseEmbeddingFunction[TEXT]

BM25-based sparse embedding function using DashText SDK.

This class provides text-to-sparse-vector embedding capabilities using the DashText library with BM25 algorithm. BM25 (Best Matching 25) is a probabilistic retrieval function used for lexical search and document ranking based on term frequency and inverse document frequency.

BM25 generates sparse vectors where each dimension corresponds to a term in the vocabulary, and the value represents the BM25 score for that term. It's particularly effective for:

Lexical search and keyword matching
Document ranking and information retrieval
Combining with dense embeddings for hybrid search
Traditional IR tasks where exact term matching is important

This implementation uses DashText's SparseVectorEncoder, which provides efficient BM25 computation for Chinese and English text using either a built-in encoder or custom corpus training.

Parameters:

Name	Type	Description	Default
`corpus`	`Optional[list[str]]`	List of documents to train the BM25 encoder. If provided, creates a custom encoder trained on this corpus for better domain-specific accuracy. If `None`, uses the built-in encoder. Defaults to `None`.	`None`
`encoding_type`	`Literal['query', 'document']`	Encoding mode for text processing. Use `"query"` for search queries (default) and `"document"` for document indexing. This distinction optimizes the BM25 scoring for asymmetric retrieval tasks. Defaults to `"query"`.	`'query'`
`language`	`Literal['zh', 'en']`	Language for built-in encoder. Only used when corpus is None. `"zh"` for Chinese (trained on Chinese Wikipedia), `"en"` for English. Defaults to `"zh"`.	`'zh'`
`b`	`float`	Document length normalization parameter for BM25. Range [0, 1]. 0 means no normalization, 1 means full normalization. Only used with custom corpus. Defaults to `0.75`.	`0.75`
`k1`	`float`	Term frequency saturation parameter for BM25. Higher values give more weight to term frequency. Only used with custom corpus. Defaults to `1.2`.	`1.2`
`**kwargs`		Additional parameters for DashText encoder customization.	`{}`

Attributes:

Name	Type	Description
`corpus_size`	`int`	Number of documents in the training corpus (0 if using built-in encoder).
`encoding_type`	`str`	The encoding type being used ("query" or "document").
`language`	`str`	The language of the built-in encoder ("zh" or "en").

Raises:

Type	Description
`ValueError`	If corpus is provided but empty or contains non-string elements.
`TypeError`	If input to `embed()` is not a string.
`RuntimeError`	If DashText encoder initialization or training fails.

Note

Requires Python 3.10, 3.11, or 3.12
Requires the dashtext package: pip install dashtext
Two encoder options available:
Built-in encoder (no corpus needed): Pre-trained models for Chinese (zh) and English (en), good generalization, works out-of-the-box
Custom encoder (corpus required): Better accuracy for domain-specific terminology, requires training on your full corpus with BM25 parameters
Encoding types:
encoding_type="query": Optimized for search queries (shorter text)
encoding_type="document": Optimized for document indexing (longer text)
BM25 parameters (b, k1) only apply to custom encoder training
Output is sorted by indices (vocabulary term IDs) for consistency
Results are cached (LRU cache, maxsize=10) to reduce computation
No API key or network connectivity required (local computation)

Examples:

>>> # Option 1: Using built-in encoder for Chinese (no corpus needed)
>>> from zvec.extension import BM25EmbeddingFunction
>>>
>>> # For query encoding (Chinese)
>>> bm25_query_zh = BM25EmbeddingFunction(language="zh", encoding_type="query")
>>> query_vec = bm25_query_zh.embed("什么是机器学习")
>>> isinstance(query_vec, dict)
True
>>> # query_vec: {1169440797: 0.29, 2045788977: 0.70, ...}

>>> # For document encoding (Chinese)
>>> bm25_doc_zh = BM25EmbeddingFunction(language="zh", encoding_type="document")
>>> doc_vec = bm25_doc_zh.embed("机器学习是人工智能的一个重要分支...")
>>> isinstance(doc_vec, dict)
True

>>> # Using built-in encoder for English
>>> bm25_query_en = BM25EmbeddingFunction(language="en", encoding_type="query")
>>> query_vec_en = bm25_query_en.embed("what is vector search service")
>>> isinstance(query_vec_en, dict)
True

>>> # Option 2: Using custom corpus for domain-specific accuracy
>>> corpus = [
...     "机器学习是人工智能的一个重要分支",
...     "深度学习使用多层神经网络进行特征提取",
...     "自然语言处理技术用于理解和生成人类语言"
... ]
>>> bm25_custom = BM25EmbeddingFunction(
...     corpus=corpus,
...     encoding_type="query",
...     b=0.75,
...     k1=1.2
... )
>>> custom_vec = bm25_custom.embed("机器学习算法")
>>> isinstance(custom_vec, dict)
True

>>> # Hybrid search: combining with dense embeddings
>>> from zvec.extension import DefaultLocalDenseEmbedding
>>> dense_emb = DefaultLocalDenseEmbedding()
>>> bm25_emb = BM25EmbeddingFunction(language="zh", encoding_type="query")
>>>
>>> query = "machine learning algorithms"
>>> dense_vec = dense_emb.embed(query)  # Semantic similarity
>>> sparse_vec = bm25_emb.embed(query)  # Lexical matching
>>> # Combine scores for hybrid retrieval

>>> # Callable interface
>>> sparse_vec = bm25_query_zh("information retrieval")
>>> isinstance(sparse_vec, dict)
True

>>> # Error handling
>>> try:
...     bm25_query_zh.embed("")  # Empty query
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only

See Also

SparseEmbeddingFunction: Base class for sparse embeddings
DefaultLocalSparseEmbedding: SPLADE-based sparse embedding
QwenSparseEmbedding: API-based sparse embedding using Qwen
DefaultLocalDenseEmbedding: Dense embedding for semantic search

References

DashText Documentation: https://help.aliyun.com/zh/document_detail/2546039.html
DashText PyPI: https://pypi.org/project/dashtext/
BM25 Algorithm: Robertson & Zaragoza (2009)

Initialize the BM25 embedding function.

Parameters:

Name	Type	Description	Default
`corpus`	`Optional[list[str]]`	Optional corpus for training custom encoder. If None, uses built-in encoder. Defaults to None.	`None`
`encoding_type`	`Literal['query', 'document']`	Text encoding mode. Use "query" for search queries, "document" for indexing. Defaults to "query".	`'query'`
`language`	`Literal['zh', 'en']`	Language for built-in encoder. "zh" for Chinese, "en" for English. Defaults to "zh".	`'zh'`
`b`	`float`	Document length normalization for BM25 [0, 1]. Only used with custom corpus. Defaults to 0.75.	`0.75`
`k1`	`float`	Term frequency saturation for BM25. Only used with custom corpus. Defaults to 1.2.	`1.2`
`**kwargs`		Additional DashText encoder parameters.	`{}`

Raises:

Type	Description
`ValueError`	If corpus is provided but empty or invalid.
`ImportError`	If dashtext package is not installed.
`RuntimeError`	If encoder initialization or training fails.

Methods:

Name	Description
`__call__`	Make the embedding function callable.
`embed`	Generate BM25 sparse embedding for the input text.

Attributes

corpus_size `property`

corpus_size: int

int: Number of documents in the training corpus (0 if using built-in encoder).

encoding_type `property`

encoding_type: str

str: The encoding type being used ("query" or "document").

language `property`

language: str

str: The language of the built-in encoder ("zh" or "en").

extra_params `property`

extra_params: dict

dict: Extra parameters for DashText encoder customization.

Functions

call

__call__(input: TEXT) -> SparseVectorType

Make the embedding function callable.

Parameters:

Name	Type	Description	Default
`input`	`TEXT`	Input text to embed.	required

Returns:

Name	Type	Description
`SparseVectorType`	`SparseVectorType`	Sparse vector as dictionary.

embed `cached`

embed(input: TEXT) -> SparseVectorType

Generate BM25 sparse embedding for the input text.

This method computes BM25 scores for the input text using DashText's SparseVectorEncoder. The encoding behavior depends on the encoding_type:

encoding_type="query": Uses encode_queries() for search queries
encoding_type="document": Uses encode_documents() for documents

The result is a sparse vector where keys are term indices in the vocabulary and values are BM25 scores.

Parameters:

Name	Type	Description	Default
`input`	`TEXT`	Input text string to embed. Must be non-empty after stripping whitespace.	required

Returns:

Name	Type	Description
`SparseVectorType`	`SparseVectorType`	A dictionary mapping vocabulary term index to BM25 score. Only non-zero scores are included. The dictionary is sorted by indices (keys) in ascending order for consistent output. Example: `{1169440797: 0.29, 2045788977: 0.70, ...}`

Raises:

Type	Description
`TypeError`	If `input` is not a string.
`ValueError`	If input is empty or whitespace-only.
`RuntimeError`	If BM25 encoding fails.

Examples:

>>> bm25 = BM25EmbeddingFunction(language="zh", encoding_type="query")
>>> sparse_vec = bm25.embed("query text")
>>> isinstance(sparse_vec, dict)
True
>>> all(isinstance(k, int) and isinstance(v, float) for k, v in sparse_vec.items())
True

>>> # Verify sorted output
>>> keys = list(sparse_vec.keys())
>>> keys == sorted(keys)
True

>>> # Error: empty input
>>> bm25.embed("   ")
ValueError: Input text cannot be empty or whitespace only

>>> # Error: non-string input
>>> bm25.embed(123)
TypeError: Expected 'input' to be str, got int

Note

BM25 scores are relative to the vocabulary statistics
Output dictionary is always sorted by indices for consistency
Terms not in the vocabulary will have zero scores (not included)
This method is cached (maxsize=10) for performance
DashText automatically handles Chinese/English text segmentation

DenseEmbeddingFunction

Bases: Protocol[MD]

Protocol for dense vector embedding functions.

Dense embedding functions map multimodal input (text, image, or audio) to fixed-length real-valued vectors. This is a Protocol class that defines the interface - implementations should provide their own initialization and properties.

Class Type Parameters:

Name	Bound or Constraints	Description	Default
`MD`		The type of input data (bound to Embeddable: TEXT, IMAGE, or AUDIO).	required

Note

This is a Protocol class - it only defines the embed() interface.
Implementations are free to define their own __init__, properties, and additional methods as needed.
The embed() method is the only required interface.

Examples:

>>> # Custom text embedding implementation
>>> class MyTextEmbedding:
...     def __init__(self, dimension: int, model_name: str):
...         self.dimension = dimension
...         self.model = load_model(model_name)
...
...     def embed(self, input: str) -> list[float]:
...         return self.model.encode(input).tolist()

>>> # Custom image embedding implementation
>>> class MyImageEmbedding:
...     def __init__(self, dimension: int = 512):
...         self.dimension = dimension
...         self.model = load_image_model()
...
...     def embed(self, input: Union[str, bytes, np.ndarray]) -> list[float]:
...         if isinstance(input, str):
...             image = load_image_from_path(input)
...         else:
...             image = input
...         return self.model.extract_features(image).tolist()

>>> # Using built-in implementations
>>> from zvec.extension import QwenDenseEmbedding
>>> text_emb = QwenDenseEmbedding(dimension=768, api_key="sk-xxx")
>>> vector = text_emb.embed("Hello world")

Methods:

Name	Description
`embed`	Generate a dense embedding vector for the input data.

Functions

embed `abstractmethod`

embed(input: MD) -> DenseVectorType

Generate a dense embedding vector for the input data.

Parameters:

Name	Type	Description	Default
`input`	`MD`	Multimodal input data to embed. Can be: - TEXT (str): Text string - IMAGE (str \| bytes \| np.ndarray): Image file path, raw bytes, or array - AUDIO (str \| bytes \| np.ndarray): Audio file path, raw bytes, or array	required

Returns:

Name	Type	Description
`DenseVectorType`	`DenseVectorType`	A dense vector representing the embedding. Can be list[float], list[int], or np.ndarray. Length should match the implementation's dimension.

SparseEmbeddingFunction

Bases: Protocol[MD]

Abstract base class for sparse vector embedding functions.

Sparse embedding functions map multimodal input (text, image, or audio) to a dictionary of {index: weight}, where only non-zero dimensions are stored. You can inherit this class to create custom sparse embedding functions.

Class Type Parameters:

Name	Bound or Constraints	Description	Default
`MD`		The type of input data (bound to Embeddable: TEXT, IMAGE, or AUDIO).	required

Note

Subclasses must implement the embed() method.

Examples:

>>> # Using built-in text sparse embedding (e.g., BM25, TF-IDF)
>>> sparse_emb = SomeSparseEmbedding()
>>> vector = sparse_emb.embed("Hello world")
>>> # Returns: {0: 0.5, 42: 1.2, 100: 0.8}

>>> # Custom BM25 sparse embedding function
>>> class MyBM25Embedding(SparseEmbeddingFunction):
...     def __init__(self, vocab_size: int = 10000):
...         self.vocab_size = vocab_size
...         self.tokenizer = MyTokenizer()
...
...     def embed(self, input: str) -> dict[int, float]:
...         tokens = self.tokenizer.tokenize(input)
...         sparse_vector = {}
...         for token_id, weight in self._calculate_bm25(tokens):
...             if weight > 0:
...                 sparse_vector[token_id] = weight
...         return sparse_vector
...
...     def _calculate_bm25(self, tokens):
...         # BM25 calculation logic
...         pass

>>> # Custom sparse image feature extractor
>>> class MySparseImageEmbedding(SparseEmbeddingFunction):
...     def embed(self, input: Union[str, bytes, np.ndarray]) -> dict[int, float]:
...         image = self._load_image(input)
...         features = self._extract_sparse_features(image)
...         return {idx: val for idx, val in enumerate(features) if val != 0}

Methods:

Name	Description
`embed`	Generate a sparse embedding for the input data.

Functions

embed `abstractmethod`

embed(input: MD) -> SparseVectorType

Generate a sparse embedding for the input data.

Parameters:

Name	Type	Description	Default
`input`	`MD`	Multimodal input data to embed. Can be: - TEXT (str): Text string - IMAGE (str \| bytes \| np.ndarray): Image file path, raw bytes, or array - AUDIO (str \| bytes \| np.ndarray): Audio file path, raw bytes, or array	required

Returns:

Name	Type	Description
`SparseVectorType`	`SparseVectorType`	Mapping from dimension index to non-zero weight. Only dimensions with non-zero values are included.

RrfReRanker

RrfReRanker(topn: int = 10, rerank_field: Optional[str] = None, rank_constant: int = 60)

Bases: RerankFunction

Re-ranker using Reciprocal Rank Fusion (RRF) for multi-vector search.

RRF combines results from multiple vector queries without requiring relevance scores. It assigns higher weight to documents that appear early in multiple result lists.

The RRF score for a document at rank r is: 1 / (k + r + 1), where k is the rank constant.

Note

This re-ranker is specifically designed for multi-vector scenarios where query results from multiple vector fields need to be combined.

Parameters:

Name	Type	Description	Default
`topn`	`int`	Number of top documents to return. Defaults to 10.	`10`
`rerank_field`	`Optional[str]`	Ignored by RRF. Defaults to None.	`None`
`rank_constant`	`int`	Smoothing constant `k` in RRF formula. Larger values reduce the impact of early ranks. Defaults to 60.	`60`

Methods:

Name	Description
`rerank`	Apply Reciprocal Rank Fusion to combine multiple query results.

Attributes:

Name	Type	Description
`topn`	`int`	int: Number of top documents to return after re-ranking.
`rerank_field`	`Optional[str]`	Optional[str]: Field name used as re-ranking input.

Attributes

topn `property`

topn: int

int: Number of top documents to return after re-ranking.

rerank_field `property`

rerank_field: Optional[str]

Optional[str]: Field name used as re-ranking input.

Functions

rerank

rerank(query_results: dict[str, list[Doc]]) -> list[Doc]

Apply Reciprocal Rank Fusion to combine multiple query results.

Parameters:

Name	Type	Description	Default
`query_results`	`dict[str, list[Doc]]`	Results from one or more vector queries.	required

Returns:

Type	Description
`list[Doc]`	list[Doc]: Re-ranked documents with RRF scores in the `score` field.

WeightedReRanker

WeightedReRanker(
    topn: int = 10,
    rerank_field: Optional[str] = None,
    metric: MetricType = L2,
    weights: Optional[dict[str, float]] = None,
)

Bases: RerankFunction

Re-ranker that combines scores from multiple vector fields using weights.

Each vector field's relevance score is normalized based on its metric type, then scaled by a user-provided weight. Final scores are summed across fields.

Note

This re-ranker is specifically designed for multi-vector scenarios where query results from multiple vector fields need to be combined with configurable weights.

Parameters:

Name	Type	Description	Default
`topn`	`int`	Number of top documents to return. Defaults to 10.	`10`
`rerank_field`	`Optional[str]`	Ignored. Defaults to None.	`None`
`metric`	`MetricType`	Distance metric used for score normalization. Defaults to `MetricType.L2`.	`L2`
`weights`	`Optional[dict[str, float]]`	Weight per vector field. Fields not listed use weight 1.0. Defaults to None.	`None`

Note

Supported metrics: L2, IP, COSINE. Scores are normalized to [0, 1].

Methods:

Name	Description
`rerank`	Combine scores from multiple vector fields using weighted sum.

Attributes:

Name	Type	Description
`topn`	`int`	int: Number of top documents to return after re-ranking.
`rerank_field`	`Optional[str]`	Optional[str]: Field name used as re-ranking input.
`weights`	`dict[str, float]`	dict[str, float]: Weight mapping for vector fields.
`metric`	`MetricType`	MetricType: Distance metric used for score normalization.

Attributes

topn `property`

topn: int

int: Number of top documents to return after re-ranking.

rerank_field `property`

rerank_field: Optional[str]

Optional[str]: Field name used as re-ranking input.

weights `property`

weights: dict[str, float]

dict[str, float]: Weight mapping for vector fields.

metric `property`

metric: MetricType

MetricType: Distance metric used for score normalization.

Functions

rerank

rerank(query_results: dict[str, list[Doc]]) -> list[Doc]

Combine scores from multiple vector fields using weighted sum.

Parameters:

Name	Type	Description	Default
`query_results`	`dict[str, list[Doc]]`	Results per vector field.	required

Returns:

Type	Description
`list[Doc]`	list[Doc]: Re-ranked documents with combined scores in `score` field.

OpenAIDenseEmbedding

OpenAIDenseEmbedding(
    model: str = "text-embedding-3-small",
    dimension: Optional[int] = None,
    api_key: Optional[str] = None,
    base_url: Optional[str] = None,
    **kwargs
)

Bases: OpenAIFunctionBase, DenseEmbeddingFunction[TEXT]

Dense text embedding function using OpenAI API.

This class provides text-to-vector embedding capabilities using OpenAI's embedding models. It inherits from DenseEmbeddingFunction and implements dense text embedding via the OpenAI API.

The implementation supports various OpenAI embedding models with different dimensions and includes automatic result caching for improved performance.

Parameters:

Name	Type	Description	Default
`model`	`str`	OpenAI embedding model identifier. Defaults to `"text-embedding-3-small"`. Common options: - `"text-embedding-3-small"`: 1536 dims, cost-efficient, good performance - `"text-embedding-3-large"`: 3072 dims, highest quality - `"text-embedding-ada-002"`: 1536 dims, legacy model	`'text-embedding-3-small'`
`dimension`	`Optional[int]`	Desired output embedding dimension. If `None`, uses model's default dimension. For text-embedding-3 models, you can specify custom dimensions (e.g., 256, 512, 1024, 1536). Defaults to `None`.	`None`
`api_key`	`Optional[str]`	OpenAI API authentication key. If `None`, reads from `OPENAI_API_KEY` environment variable. Obtain your key from: https://platform.openai.com/api-keys	`None`
`base_url`	`Optional[str]`	Custom API base URL for OpenAI-compatible services. Defaults to `None` (uses official OpenAI endpoint).	`None`

Attributes:

Name	Type	Description
`dimension`	`int`	The embedding vector dimension.
`data_type`	`DataType`	Always `DataType.VECTOR_FP32` for this implementation.
`model`	`str`	The OpenAI model name being used.

Raises:

Type	Description
`ValueError`	If API key is not provided and not found in environment, or if API returns an error response.
`TypeError`	If input to `embed()` is not a string.
`RuntimeError`	If network error or OpenAI service error occurs.

Note

Requires Python 3.10, 3.11, or 3.12
Requires the openai package: pip install openai
Embedding results are cached (LRU cache, maxsize=10) to reduce API calls
Network connectivity to OpenAI API endpoints is required
API usage incurs costs based on your OpenAI subscription plan
Rate limits apply based on your OpenAI account tier

Examples:

>>> # Basic usage with default model
>>> from zvec.extension import OpenAIDenseEmbedding
>>> import os
>>> os.environ["OPENAI_API_KEY"] = "sk-..."
>>>
>>> emb_func = OpenAIDenseEmbedding()
>>> vector = emb_func.embed("Hello, world!")
>>> len(vector)
1536

>>> # Using specific model with custom dimension
>>> emb_func = OpenAIDenseEmbedding(
...     model="text-embedding-3-large",
...     dimension=1024,
...     api_key="sk-..."
... )
>>> vector = emb_func.embed("Machine learning is fascinating")
>>> len(vector)
1024

>>> # Using with custom base URL (e.g., Azure OpenAI)
>>> emb_func = OpenAIDenseEmbedding(
...     model="text-embedding-ada-002",
...     api_key="your-azure-key",
...     base_url="https://your-resource.openai.azure.com/"
... )
>>> vector = emb_func("Natural language processing")
>>> isinstance(vector, list)
True

>>> # Batch processing with caching benefit
>>> texts = ["First text", "Second text", "First text"]
>>> vectors = [emb_func.embed(text) for text in texts]
>>> # Third call uses cached result for "First text"

>>> # Error handling
>>> try:
...     emb_func.embed("")  # Empty string
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only

See Also

DenseEmbeddingFunction: Base class for dense embeddings
QwenDenseEmbedding: Alternative using Qwen/DashScope API
DefaultDenseEmbedding: Local model without API calls
SparseEmbeddingFunction: Base class for sparse embeddings

Initialize the OpenAI dense embedding function.

Parameters:

Name	Type	Description	Default
`model`	`str`	OpenAI model name. Defaults to "text-embedding-3-small".	`'text-embedding-3-small'`
`dimension`	`Optional[int]`	Target embedding dimension or None for default.	`None`
`api_key`	`Optional[str]`	API key or None to use environment variable.	`None`
`base_url`	`Optional[str]`	Custom API base URL or None for default.	`None`
`**kwargs`		Additional parameters for API calls. Examples: - `encoding_format` (str): Format of embeddings, "float" or "base64". - `user` (str): User identifier for tracking.	`{}`

Raises:

Type	Description
`ValueError`	If API key is not provided and not in environment.

Methods:

Name	Description
`__call__`	Make the embedding function callable.
`embed`	Generate dense embedding vector for the input text.

Attributes

model `property`

model: str

str: The OpenAI model name currently in use.

dimension `property`

dimension: int

int: The expected dimensionality of the embedding vector.

extra_params `property`

extra_params: dict

dict: Extra parameters for model-specific customization.

Functions

call

__call__(input: TEXT) -> DenseVectorType

Make the embedding function callable.

embed `cached`

embed(input: TEXT) -> DenseVectorType

Generate dense embedding vector for the input text.

This method calls the OpenAI Embeddings API to convert input text into a dense vector representation. Results are cached to improve performance for repeated inputs.

Parameters:

Name	Type	Description	Default
`input`	`TEXT`	Input text string to embed. Must be non-empty after stripping whitespace. Maximum length is 8191 tokens for most models.	required

Returns:

Name	Type	Description
`DenseVectorType`	`DenseVectorType`	A list of floats representing the embedding vector. Length equals `self.dimension`. Example: `[0.123, -0.456, 0.789, ...]`

Raises:

Type	Description
`TypeError`	If `input` is not a string.
`ValueError`	If input is empty/whitespace-only, or if the API returns an error or malformed response.
`RuntimeError`	If network connectivity issues or OpenAI service errors occur.

Examples:

>>> emb = OpenAIDenseEmbedding()
>>> vector = emb.embed("Natural language processing")
>>> len(vector)
1536
>>> isinstance(vector[0], float)
True

>>> # Error: empty input
>>> emb.embed("   ")
ValueError: Input text cannot be empty or whitespace only

>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int

Note

This method is cached (maxsize=10). Identical inputs return cached results.
The cache is based on exact string match (case-sensitive).
Consider pre-processing text (lowercasing, normalization) for better caching.

OpenAIFunctionBase

OpenAIFunctionBase(model: str, api_key: Optional[str] = None, base_url: Optional[str] = None)

Base class for OpenAI functions.

This base class provides common functionality for calling OpenAI APIs and handling responses. It supports embeddings (dense) operations.

This class is not meant to be used directly. Use concrete implementations: - OpenAIDenseEmbedding for dense embeddings

Parameters:

Name	Type	Description	Default
`model`	`str`	OpenAI model identifier.	required
`api_key`	`Optional[str]`	OpenAI API authentication key.	`None`
`base_url`	`Optional[str]`	Custom API base URL.	`None`

Note

This is an internal base class for code reuse across OpenAI features
Subclasses should inherit from appropriate Protocol
Provides unified API connection and response handling

Initialize the base OpenAI functionality.

Parameters:

Name	Type	Description	Default
`model`	`str`	OpenAI model name.	required
`api_key`	`Optional[str]`	API key or None to use environment variable.	`None`
`base_url`	`Optional[str]`	Custom API base URL or None for default.	`None`

Raises:

Type	Description
`ValueError`	If API key is not provided and not in environment.

Attributes:

Name	Type	Description
`model`	`str`	str: The OpenAI model name currently in use.

Attributes

model `property`

model: str

str: The OpenAI model name currently in use.

Functions

QwenDenseEmbedding

QwenDenseEmbedding(
    dimension: int, model: str = "text-embedding-v4", api_key: Optional[str] = None, **kwargs
)

Bases: QwenFunctionBase, DenseEmbeddingFunction[TEXT]

Dense text embedding function using Qwen (DashScope) API.

This class provides text-to-vector embedding capabilities using Alibaba Cloud's DashScope service and Qwen embedding models. It inherits from DenseEmbeddingFunction and implements dense text embedding.

The implementation supports various Qwen embedding models with configurable dimensions and includes automatic result caching for improved performance.

Parameters:

Name	Type	Description	Default
`dimension`	`int`	Desired output embedding dimension. Common values: - 512: Balanced performance and accuracy - 1024: Higher accuracy, larger storage - 1536: Maximum accuracy for supported models	required
`model`	`str`	DashScope embedding model identifier. Defaults to `"text-embedding-v4"`. Other options include: - `"text-embedding-v3"` - `"text-embedding-v2"` - `"text-embedding-v1"`	`'text-embedding-v4'`
`api_key`	`Optional[str]`	DashScope API authentication key. If `None`, reads from `DASHSCOPE_API_KEY` environment variable. Obtain your key from: https://dashscope.console.aliyun.com/	`None`
`**kwargs`		Additional DashScope API parameters. Supported options: - `text_type` (str): Specifies the text role in retrieval tasks. Options: `"query"` (search query) or `"document"` (indexed content). This parameter optimizes embeddings for asymmetric search scenarios. Reference: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api	`{}`

Attributes:

Name	Type	Description
`dimension`	`int`	The embedding vector dimension.
`data_type`	`DataType`	Always `DataType.VECTOR_FP32` for this implementation.
`model`	`str`	The DashScope model name being used.

Raises:

Type	Description
`ValueError`	If API key is not provided and not found in environment, or if API returns an error response.
`TypeError`	If input to `embed()` is not a string.
`RuntimeError`	If network error or DashScope service error occurs.

Note

Requires Python 3.10, 3.11, or 3.12
Requires the dashscope package: pip install dashscope
Embedding results are cached (LRU cache, maxsize=10) to reduce API calls
Network connectivity to DashScope API endpoints is required
API usage may incur costs based on your DashScope subscription plan

Parameter Guidelines:

Use text_type="query" for search queries and text_type="document" for indexed content to optimize asymmetric retrieval tasks.
For detailed API specifications and parameter usage, refer to: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api

Examples:

>>> # Basic usage with default model
>>> from zvec.extension import QwenDenseEmbedding
>>> import os
>>> os.environ["DASHSCOPE_API_KEY"] = "your-api-key"
>>>
>>> emb_func = QwenDenseEmbedding(dimension=1024)
>>> vector = emb_func.embed("Hello, world!")
>>> len(vector)
1024

>>> # Using specific model with explicit API key
>>> emb_func = QwenDenseEmbedding(
...     dimension=512,
...     model="text-embedding-v3",
...     api_key="sk-xxxxx"
... )
>>> vector = emb_func("Machine learning is fascinating")
>>> isinstance(vector, list)
True

>>> # Using with custom parameters (text_type)
>>> # For search queries - optimize for query-document matching
>>> emb_func = QwenDenseEmbedding(
...     dimension=1024,
...     text_type="query"
... )
>>> query_vector = emb_func.embed("What is machine learning?")
>>>
>>> # For document embeddings - optimize for being matched by queries
>>> doc_emb_func = QwenDenseEmbedding(
...     dimension=1024,
...     text_type="document"
... )
>>> doc_vector = doc_emb_func.embed(
...     "Machine learning is a subset of artificial intelligence..."
... )

>>> # Batch processing with caching benefit
>>> texts = ["First text", "Second text", "First text"]
>>> vectors = [emb_func.embed(text) for text in texts]
>>> # Third call uses cached result for "First text"

>>> # Error handling
>>> try:
...     emb_func.embed("")  # Empty string
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only

See Also

DenseEmbeddingFunction: Base class for dense embeddings
SparseEmbeddingFunction: Base class for sparse embeddings

Initialize the Qwen dense embedding function.

Parameters:

Name	Type	Description	Default
`dimension`	`int`	Target embedding dimension.	required
`model`	`str`	DashScope model name. Defaults to "text-embedding-v4".	`'text-embedding-v4'`
`api_key`	`Optional[str]`	API key or None to use environment variable.	`None`
`**kwargs`		Additional DashScope API parameters. Supported options: - `text_type` (str): Text role in asymmetric retrieval. * `"query"`: Optimize for search queries (short, question-like). * `"document"`: Optimize for indexed documents (longer content). Using appropriate text_type improves retrieval accuracy by optimizing the embedding space for query-document matching. For detailed API documentation, see: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api	`{}`

Raises:

Type	Description
`ValueError`	If API key is not provided and not in environment.

Methods:

Name	Description
`__call__`	Make the embedding function callable.
`embed`	Generate dense embedding vector for the input text.

Attributes

model `property`

model: str

str: The DashScope embedding model name currently in use.

dimension `property`

dimension: int

int: The expected dimensionality of the embedding vector.

extra_params `property`

extra_params: dict

dict: Extra parameters for model-specific customization.

Functions

call

__call__(input: TEXT) -> DenseVectorType

Make the embedding function callable.

embed `cached`

embed(input: TEXT) -> DenseVectorType

Generate dense embedding vector for the input text.

This method calls the DashScope TextEmbedding API to convert input text into a dense vector representation. Results are cached to improve performance for repeated inputs.

Parameters:

Name	Type	Description	Default
`input`	`TEXT`	Input text string to embed. Must be non-empty after stripping whitespace. Maximum length depends on the model used (typically 2048-8192 tokens).	required

Returns:

Name	Type	Description
`DenseVectorType`	`DenseVectorType`	A list of floats representing the embedding vector. Length equals `self.dimension`. Example: `[0.123, -0.456, 0.789, ...]`

Raises:

Type	Description
`TypeError`	If `input` is not a string.
`ValueError`	If input is empty/whitespace-only, or if the API returns an error or malformed response.
`RuntimeError`	If network connectivity issues or DashScope service errors occur.

Examples:

>>> emb = QwenDenseEmbedding(dimension=1024)
>>> vector = emb.embed("Natural language processing")
>>> len(vector)
1024
>>> isinstance(vector[0], float)
True

>>> # Error: empty input
>>> emb.embed("   ")
ValueError: Input text cannot be empty or whitespace only

>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int

Note

This method is cached (maxsize=10). Identical inputs return cached results.
The cache is based on exact string match (case-sensitive).
Consider pre-processing text (lowercasing, normalization) for better caching.

QwenSparseEmbedding

QwenSparseEmbedding(
    dimension: int, model: str = "text-embedding-v4", api_key: Optional[str] = None, **kwargs
)

Bases: QwenFunctionBase, SparseEmbeddingFunction[TEXT]

Sparse text embedding function using Qwen (DashScope) API.

This class provides text-to-sparse-vector embedding capabilities using Alibaba Cloud's DashScope service and Qwen embedding models. It generates sparse keyword-weighted vectors suitable for lexical matching and BM25-style retrieval scenarios.

Sparse embeddings are particularly useful for: - Keyword-based search and exact matching - Hybrid retrieval (combining with dense embeddings) - Interpretable search results (weights show term importance)

Parameters:

Name	Type	Description	Default
`dimension`	`int`	Desired output embedding dimension. Common values: - 512: Balanced performance and accuracy - 1024: Higher accuracy, larger storage - 1536: Maximum accuracy for supported models	required
`model`	`str`	DashScope embedding model identifier. Defaults to `"text-embedding-v4"`. Other options include: - `"text-embedding-v3"` - `"text-embedding-v2"`	`'text-embedding-v4'`
`api_key`	`Optional[str]`	DashScope API authentication key. If `None`, reads from `DASHSCOPE_API_KEY` environment variable. Obtain your key from: https://dashscope.console.aliyun.com/	`None`
`**kwargs`		Additional DashScope API parameters. Supported options: - `encoding_type` (Literal["query", "document"]): Encoding type. * `"query"`: Optimize for search queries (default). * `"document"`: Optimize for indexed documents. This distinction is important for asymmetric retrieval tasks.	`{}`

Attributes:

Name	Type	Description
`model`	`str`	The DashScope model name being used.
`encoding_type`	`str`	The encoding type ("query" or "document").

Raises:

Type	Description
`ValueError`	If API key is not provided and not found in environment, or if API returns an error response.
`TypeError`	If input to `embed()` is not a string.
`RuntimeError`	If network error or DashScope service error occurs.

Note

Requires Python 3.10, 3.11, or 3.12
Requires the dashscope package: pip install dashscope
Embedding results are cached (LRU cache, maxsize=10) to reduce API calls
Network connectivity to DashScope API endpoints is required
API usage may incur costs based on your DashScope subscription plan
Sparse vectors have only non-zero dimensions stored as dict
Output is sorted by indices (keys) in ascending order

Parameter Guidelines:

Use encoding_type="query" for search queries and encoding_type="document" for indexed content to optimize asymmetric retrieval tasks.
For detailed API specifications, refer to: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api

Examples:

>>> # Basic usage for query embedding
>>> from zvec.extension import QwenSparseEmbedding
>>> import os
>>> os.environ["DASHSCOPE_API_KEY"] = "your-api-key"
>>>
>>> query_emb = QwenSparseEmbedding(dimension=1024, encoding_type="query")
>>> query_vec = query_emb.embed("machine learning")
>>> type(query_vec)
<class 'dict'>
>>> len(query_vec)  # Only non-zero dimensions
156

>>> # Document embedding
>>> doc_emb = QwenSparseEmbedding(dimension=1024, encoding_type="document")
>>> doc_vec = doc_emb.embed("Machine learning is a subset of AI")
>>> isinstance(doc_vec, dict)
True

>>> # Asymmetric retrieval example
>>> query_vec = query_emb.embed("what causes aging fast")
>>> doc_vec = doc_emb.embed(
...     "UV-A light causes tanning, skin aging, and cataracts..."
... )
>>>
>>> # Calculate similarity (dot product for sparse vectors)
>>> similarity = sum(
...     query_vec.get(k, 0) * doc_vec.get(k, 0)
...     for k in set(query_vec) | set(doc_vec)
... )

>>> # Output is sorted by indices
>>> list(query_vec.items())[:5]  # First 5 dimensions (by index)
[(10, 0.45), (23, 0.87), (56, 0.32), (89, 1.12), (120, 0.65)]

>>> # Hybrid retrieval (combining dense + sparse)
>>> from zvec.extension import QwenDenseEmbedding
>>> dense_emb = QwenDenseEmbedding(dimension=1024)
>>> sparse_emb = QwenSparseEmbedding(dimension=1024)
>>>
>>> query = "deep learning neural networks"
>>> dense_vec = dense_emb.embed(query)   # [0.1, -0.3, 0.5, ...]
>>> sparse_vec = sparse_emb.embed(query)  # {12: 0.8, 45: 1.2, ...}

>>> # Error handling
>>> try:
...     sparse_emb.embed("")  # Empty string
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only

See Also

SparseEmbeddingFunction: Base class for sparse embeddings
QwenDenseEmbedding: Dense embedding using Qwen API
DefaultSparseEmbedding: Sparse embedding with SPLADE model

Initialize the Qwen sparse embedding function.

Parameters:

Name	Type	Description	Default
`dimension`	`int`	Target embedding dimension.	required
`model`	`str`	DashScope model name. Defaults to "text-embedding-v4".	`'text-embedding-v4'`
`api_key`	`Optional[str]`	API key or None to use environment variable.	`None`
`**kwargs`		Additional DashScope API parameters. Supported options: - `encoding_type` (Literal["query", "document"]): Encoding type. * `"query"`: Optimize for search queries (default). * `"document"`: Optimize for indexed documents. This distinction is important for asymmetric retrieval tasks.	`{}`

Raises:

Type	Description
`ValueError`	If API key is not provided and not in environment.

Methods:

Name	Description
`__call__`	Make the embedding function callable.
`embed`	Generate sparse embedding vector for the input text.

Attributes

model `property`

model: str

str: The DashScope embedding model name currently in use.

extra_params `property`

extra_params: dict

dict: Extra parameters for model-specific customization.

Functions

call

__call__(input: TEXT) -> SparseVectorType

Make the embedding function callable.

embed `cached`

embed(input: TEXT) -> SparseVectorType

Generate sparse embedding vector for the input text.

This method calls the DashScope TextEmbedding API with sparse output type to convert input text into a sparse vector representation. The result is a dictionary where keys are dimension indices and values are importance weights (only non-zero values included).

The embedding is optimized based on the encoding_type specified during initialization: "query" for search queries or "document" for indexed content.

Parameters:

Name	Type	Description	Default
`input`	`TEXT`	Input text string to embed. Must be non-empty after stripping whitespace. Maximum length depends on the model used (typically 2048-8192 tokens).	required

Returns:

Name	Type	Description
`SparseVectorType`	`SparseVectorType`	A dictionary mapping dimension index to weight. Only non-zero dimensions are included. The dictionary is sorted by indices (keys) in ascending order for consistent output. Example: `{10: 0.5, 245: 0.8, 1023: 1.2, 5678: 0.5}`

Raises:

Type	Description
`TypeError`	If `input` is not a string.
`ValueError`	If input is empty/whitespace-only, or if the API returns an error or malformed response.
`RuntimeError`	If network connectivity issues or DashScope service errors occur.

Examples:

>>> emb = QwenSparseEmbedding(dimension=1024, encoding_type="query")
>>> sparse_vec = emb.embed("machine learning")
>>> isinstance(sparse_vec, dict)
True
>>>
>>> # Verify sorted output
>>> keys = list(sparse_vec.keys())
>>> keys == sorted(keys)
True

>>> # Error: empty input
>>> emb.embed("   ")
ValueError: Input text cannot be empty or whitespace only

>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int

Note

This method is cached (maxsize=10). Identical inputs return cached results.
The cache is based on exact string match (case-sensitive).
Output dictionary is always sorted by indices for consistency.

QwenFunctionBase

QwenFunctionBase(model: str, api_key: Optional[str] = None)

Base class for Qwen (DashScope) functions.

This base class provides common functionality for calling DashScope APIs and handling responses. It supports embeddings (dense and sparse) and re-ranking operations.

This class is not meant to be used directly. Use concrete implementations: - QwenDenseEmbedding for dense embeddings - QwenSparseEmbedding for sparse embeddings - QwenReRanker for semantic re-ranking

Parameters:

Name	Type	Description	Default
`model`	`str`	DashScope model identifier.	required
`api_key`	`Optional[str]`	DashScope API authentication key.	`None`

Note

This is an internal base class for code reuse across Qwen features
Subclasses should inherit from appropriate Protocol/ABC
Provides unified API connection and response handling

Initialize the base Qwen embedding functionality.

Parameters:

Name	Type	Description	Default
`model`	`str`	DashScope model name.	required
`api_key`	`Optional[str]`	API key or None to use environment variable.	`None`

Raises:

Type	Description
`ValueError`	If API key is not provided and not in environment.

Attributes:

Name	Type	Description
`model`	`str`	str: The DashScope embedding model name currently in use.

Attributes

model `property`

model: str

str: The DashScope embedding model name currently in use.

Functions

QwenReRanker

QwenReRanker(
    query: Optional[str] = None,
    topn: int = 10,
    rerank_field: Optional[str] = None,
    model: str = "gte-rerank-v2",
    api_key: Optional[str] = None,
)

Bases: QwenFunctionBase, RerankFunction

Re-ranker using Qwen (DashScope) cross-encoder API for semantic re-ranking.

This re-ranker leverages DashScope's TextReRank service to perform cross-encoder style re-ranking. It sends query and document pairs to the API and receives relevance scores based on deep semantic understanding.

The re-ranker is suitable for single-vector or multi-vector search scenarios where semantic relevance to a specific query is required.

Parameters:

Name	Type	Description	Default
`query`	`str`	Query text for semantic re-ranking. Required.	`None`
`topn`	`int`	Maximum number of documents to return after re-ranking. Defaults to 10.	`10`
`rerank_field`	`str`	Document field name to use as re-ranking input text. Required (e.g., "content", "title", "body").	`None`
`model`	`str`	DashScope re-ranking model identifier. Defaults to `"gte-rerank-v2"`.	`'gte-rerank-v2'`
`api_key`	`Optional[str]`	DashScope API authentication key. If not provided, reads from `DASHSCOPE_API_KEY` environment variable.	`None`

Raises:

Type	Description
`ValueError`	If `query` is empty/None, `rerank_field` is None, or API key is not available.

Note

Requires dashscope Python package installed
Documents without valid content in rerank_field are skipped
API rate limits and quotas apply per DashScope subscription

Example

reranker = QwenReRanker( ... query="machine learning algorithms", ... topn=5, ... rerank_field="content", ... model="gte-rerank-v2", ... api_key="your-api-key" ... )

Use in collection.query(reranker=reranker)

Initialize QwenReRanker with query and configuration.

Parameters:

Name	Type	Description	Default
`query`	`Optional[str]`	Query text for semantic matching. Required.	`None`
`topn`	`int`	Number of top results to return.	`10`
`rerank_field`	`Optional[str]`	Document field for re-ranking input.	`None`
`model`	`str`	DashScope model name.	`'gte-rerank-v2'`
`api_key`	`Optional[str]`	API key or None to use environment variable.	`None`

Raises:

Type	Description
`ValueError`	If query is empty or API key is unavailable.

Methods:

Name	Description
`rerank`	Re-rank documents using Qwen's TextReRank API.

Attributes:

Name	Type	Description
`topn`	`int`	int: Number of top documents to return after re-ranking.
`rerank_field`	`Optional[str]`	Optional[str]: Field name used as re-ranking input.
`model`	`str`	str: The DashScope embedding model name currently in use.
`query`	`str`	str: Query text used for semantic re-ranking.

Attributes

topn `property`

topn: int

int: Number of top documents to return after re-ranking.

rerank_field `property`

rerank_field: Optional[str]

Optional[str]: Field name used as re-ranking input.

model `property`

model: str

str: The DashScope embedding model name currently in use.

query `property`

query: str

str: Query text used for semantic re-ranking.

Functions

rerank

rerank(query_results: dict[str, list[Doc]]) -> list[Doc]

Re-rank documents using Qwen's TextReRank API.

Sends document texts to DashScope TextReRank service along with the query. Returns documents sorted by relevance scores from the cross-encoder model.

Parameters:

Name	Type	Description	Default
`query_results`	`dict[str, list[Doc]]`	Mapping from vector field names to lists of retrieved documents. Documents from all fields are deduplicated and re-ranked together.	required

Returns:

Type	Description
`list[Doc]`	list[Doc]: Re-ranked documents (up to `topn`) with updated `score` fields containing relevance scores from the API.

Raises:

Type	Description
`ValueError`	If no valid documents are found or API call fails.

Note

Duplicate documents (same ID) across fields are processed once
Documents with empty/missing rerank_field content are skipped
Returned scores are relevance scores from the cross-encoder model

ReRanker

ReRanker(topn: int = 10, rerank_field: Optional[str] = None)

Bases: ABC

Abstract base class for re-ranking search results.

Re-rankers refine the output of one or more vector queries by applying a secondary scoring strategy. They are used in the query() method of Collection via the reranker parameter.

Parameters:

Name	Type	Description	Default
`topn`	`int`	Number of top documents to return after re-ranking. Defaults to 10.	`10`
`rerank_field`	`Optional[str]`	Field name used as input for re-ranking (e.g., document title or body). Defaults to None.	`None`

Note

Subclasses must implement the rerank() method.

Methods:

Name	Description
`rerank`	Re-rank documents from one or more vector queries.

Attributes:

Name	Type	Description
`topn`	`int`	int: Number of top documents to return after re-ranking.
`rerank_field`	`Optional[str]`	Optional[str]: Field name used as re-ranking input.

Attributes

topn `property`

topn: int

int: Number of top documents to return after re-ranking.

rerank_field `property`

rerank_field: Optional[str]

Optional[str]: Field name used as re-ranking input.

Functions

rerank `abstractmethod`

rerank(query_results: dict[str, list[Doc]]) -> list[Doc]

Re-rank documents from one or more vector queries.

Parameters:

Name	Type	Description	Default
`query_results`	`dict[str, list[Doc]]`	Mapping from vector field name to list of retrieved documents (sorted by relevance).	required

Returns:

Type	Description
`list[Doc]`	list[Doc]: Re-ranked list of documents (length ≤ `topn`), with updated `score` fields.

DefaultLocalDenseEmbedding

DefaultLocalDenseEmbedding(
    model_source: Literal["huggingface", "modelscope"] = "huggingface",
    device: Optional[str] = None,
    normalize_embeddings: bool = True,
    batch_size: int = 32,
    **kwargs
)

Bases: SentenceTransformerFunctionBase, DenseEmbeddingFunction[TEXT]

Default local dense embedding using all-MiniLM-L6-v2 model.

This is the default implementation for dense text embedding that uses the all-MiniLM-L6-v2 model from Hugging Face by default. This model provides a good balance between speed and quality for general-purpose text embedding.

The class provides text-to-vector dense embedding capabilities using the sentence-transformers library. It supports models from Hugging Face Hub and ModelScope, runs locally without API calls, and supports CPU/GPU acceleration.

The model produces 384-dimensional embeddings and is optimized for semantic similarity tasks. It runs locally without requiring API keys.

Parameters:

Name	Type	Description	Default
`model_source`	`Literal['huggingface', 'modelscope']`	Model source. - `"huggingface"`: Use Hugging Face Hub (default, for international users) - `"modelscope"`: Use ModelScope (recommended for users in China) Defaults to `"huggingface"`.	`'huggingface'`
`device`	`Optional[str]`	Device to run the model on. Options: `"cpu"`, `"cuda"`, `"mps"` (for Apple Silicon), or `None` for automatic detection. Defaults to `None`.	`None`
`normalize_embeddings`	`bool`	Whether to normalize embeddings to unit length (L2 normalization). Useful for cosine similarity. Defaults to `True`.	`True`
`batch_size`	`int`	Batch size for encoding. Defaults to `32`.	`32`
`**kwargs`		Additional parameters for future extension.	`{}`

Attributes:

Name	Type	Description
`dimension`	`int`	Always 384 for both models.
`model_name`	`str`	"all-MiniLM-L6-v2" (HF) or "iic/nlp_gte_sentence-embedding_chinese-small" (MS).
`model_source`	`str`	The model source being used.
`device`	`str`	The device the model is running on.

Raises:

Type	Description
`ValueError`	If the model cannot be loaded or input is invalid.
`TypeError`	If input to `embed()` is not a string.
`RuntimeError`	If model inference fails.

Note

Requires Python 3.10, 3.11, or 3.12
Requires the sentence-transformers package: pip install sentence-transformers
For ModelScope, also requires: pip install modelscope
First run downloads the model (~50-80MB) from chosen source
Hugging Face cache: ~/.cache/torch/sentence_transformers/
ModelScope cache: ~/.cache/modelscope/hub/
No API keys or network required after initial download
Inference speed: ~1000 sentences/sec on CPU, ~10000 on GPU

For users in China:

If you encounter Hugging Face access issues, use ModelScope instead:

.. code-block:: python

# Recommended for users in China
emb = DefaultLocalDenseEmbedding(model_source="modelscope")

Alternatively, use Hugging Face mirror:

.. code-block:: bash

export HF_ENDPOINT=https://hf-mirror.com
# Then use default Hugging Face mode

Examples:

>>> # Basic usage with Hugging Face (default)
>>> from zvec.extension import DefaultLocalDenseEmbedding
>>>
>>> emb_func = DefaultLocalDenseEmbedding()
>>> vector = emb_func.embed("Hello, world!")
>>> len(vector)
384
>>> isinstance(vector, list)
True

>>> # Recommended for users in China (uses ModelScope)
>>> emb_func = DefaultLocalDenseEmbedding(model_source="modelscope")
>>> vector = emb_func.embed("你好，世界！")  # Works well with Chinese text
>>> len(vector)
384

>>> # Alternative for China users: Use Hugging Face mirror
>>> import os
>>> os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
>>> emb_func = DefaultLocalDenseEmbedding()  # Uses HF mirror
>>> vector = emb_func.embed("Hello, world!")

>>> # Using GPU for faster inference
>>> emb_func = DefaultLocalDenseEmbedding(device="cuda")
>>> vector = emb_func("Machine learning is fascinating")
>>> # Normalized vector has unit length
>>> import numpy as np
>>> np.linalg.norm(vector)
1.0

>>> # Batch processing
>>> texts = ["First text", "Second text", "Third text"]
>>> vectors = [emb_func.embed(text) for text in texts]
>>> len(vectors)
3
>>> all(len(v) == 384 for v in vectors)
True

>>> # Semantic similarity
>>> v1 = emb_func.embed("The cat sits on the mat")
>>> v2 = emb_func.embed("A feline rests on a rug")
>>> v3 = emb_func.embed("Python programming")
>>> similarity_high = np.dot(v1, v2)  # Similar sentences
>>> similarity_low = np.dot(v1, v3)   # Different topics
>>> similarity_high > similarity_low
True

>>> # Error handling
>>> try:
...     emb_func.embed("")  # Empty string
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only

See Also

DenseEmbeddingFunction: Base class for dense embeddings
DefaultLocalSparseEmbedding: Sparse embedding with SPLADE
QwenDenseEmbedding: Alternative using Qwen API

Initialize with all-MiniLM-L6-v2 model.

Parameters:

Name	Type	Description	Default
`model_source`	`Literal['huggingface', 'modelscope']`	Model source. Defaults to "huggingface".	`'huggingface'`
`device`	`Optional[str]`	Target device ("cpu", "cuda", "mps", or None). Defaults to None (automatic detection).	`None`
`normalize_embeddings`	`bool`	Whether to L2-normalize output vectors. Defaults to True.	`True`
`batch_size`	`int`	Batch size for encoding. Defaults to 32.	`32`
`**kwargs`		Additional parameters for future extension.	`{}`

Raises:

Type	Description
`ImportError`	If sentence-transformers or modelscope is not installed.
`ValueError`	If model cannot be loaded.

Methods:

Name	Description
`__call__`	Make the embedding function callable.
`embed`	Generate dense embedding vector for the input text.

Attributes

model_name `property`

model_name: str

str: The Sentence Transformer model name currently in use.

model_source `property`

model_source: str

str: The model source being used ("huggingface" or "modelscope").

device `property`

device: str

str: The device the model is running on.

dimension `property`

dimension: int

int: The expected dimensionality of the embedding vector.

extra_params `property`

extra_params: dict

dict: Extra parameters for model-specific customization.

Functions

call

__call__(input: str) -> DenseVectorType

Make the embedding function callable.

embed

embed(input: str) -> DenseVectorType

Generate dense embedding vector for the input text.

This method uses the Sentence Transformer model to convert input text into a dense vector representation. The model runs locally without requiring API calls.

Parameters:

Name	Type	Description	Default
`input`	`str`	Input text string to embed. Must be non-empty after stripping whitespace. Maximum length depends on the model used (typically 128-512 tokens for most models).	required

Returns:

Name	Type	Description
`DenseVectorType`	`DenseVectorType`	A list of floats representing the embedding vector. Length equals `self.dimension`. If `normalize_embeddings=True`, the vector has unit length. Example: `[0.123, -0.456, 0.789, ...]`

Raises:

Type	Description
`TypeError`	If `input` is not a string.
`ValueError`	If input is empty or whitespace-only.
`RuntimeError`	If model inference fails.

Examples:

>>> emb = DefaultLocalDenseEmbedding()
>>> vector = emb.embed("Natural language processing")
>>> len(vector)
384
>>> isinstance(vector[0], float)
True

>>> # Normalized vectors have unit length
>>> import numpy as np
>>> emb = DefaultLocalDenseEmbedding(normalize_embeddings=True)
>>> vector = emb.embed("Test sentence")
>>> np.linalg.norm(vector)
1.0

>>> # Error: empty input
>>> emb.embed("   ")
ValueError: Input text cannot be empty or whitespace only

>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int

>>> # Semantic similarity example
>>> v1 = emb.embed("The cat sits on the mat")
>>> v2 = emb.embed("A feline rests on a rug")
>>> similarity = np.dot(v1, v2)  # High similarity due to semantic meaning
>>> similarity > 0.7
True

Note

First call may be slower due to model loading
Subsequent calls are much faster as the model stays in memory
For batch processing, consider encoding multiple texts together (though this method handles single texts only)
GPU acceleration provides 5-10x speedup over CPU

DefaultLocalSparseEmbedding

DefaultLocalSparseEmbedding(
    model_source: Literal["huggingface", "modelscope"] = "huggingface",
    device: Optional[str] = None,
    encoding_type: Literal["query", "document"] = "query",
    **kwargs
)

Bases: SentenceTransformerFunctionBase, SparseEmbeddingFunction[TEXT]

Default local sparse embedding using SPLADE model.

This class provides sparse vector embedding using the SPLADE (SParse Lexical AnD Expansion) model. SPLADE generates sparse, interpretable representations where each dimension corresponds to a vocabulary term with learned importance weights. It's ideal for lexical matching, BM25-style retrieval, and hybrid search scenarios.

The default model is naver/splade-cocondenser-ensembledistil, which is publicly available without authentication. It produces sparse vectors with thousands of dimensions but only hundreds of non-zero values, making them efficient for storage and retrieval while maintaining strong lexical matching.

Model Caching:

This class uses class-level caching to share the SPLADE model across all instances with the same configuration (model_source, device). This significantly reduces memory usage when creating multiple instances for different encoding types (query vs document).

Cache Management:

The class provides methods to manage the model cache:

clear_cache(): Clear all cached models to free memory
get_cache_info(): Get information about cached models
remove_from_cache(model_source, device): Remove a specific model from cache

.. note:: Why not use splade-v3?

The newer ``naver/splade-v3`` model is gated (requires access approval).
We use ``naver/splade-cocondenser-ensembledistil`` instead.

**To use splade-v3 (if you have access):**

1. Request access at https://huggingface.co/naver/splade-v3
2. Get your Hugging Face token from https://huggingface.co/settings/tokens
3. Set environment variable:

   .. code-block:: bash

       export HF_TOKEN="your_huggingface_token"

4. Or login programmatically:

   .. code-block:: python

       from huggingface_hub import login
       login(token="your_huggingface_token")

5. To use a custom SPLADE model, you can subclass this class and override
   the model_name in ``__init__``, or create your own implementation
   inheriting from ``SentenceTransformerFunctionBase`` and
   ``SparseEmbeddingFunction``.

Parameters:

Name	Type	Description	Default
`model_source`	`Literal['huggingface', 'modelscope']`	Model source. Defaults to `"huggingface"`. ModelScope support may vary for SPLADE models.	`'huggingface'`
`device`	`Optional[str]`	Device to run the model on. Options: `"cpu"`, `"cuda"`, `"mps"` (for Apple Silicon), or `None` for automatic detection. Defaults to `None`.	`None`
`encoding_type`	`Literal['query', 'document']`	Encoding type. - `"query"`: Optimize for search queries (default) - `"document"`: Optimize for indexed documents	`'query'`
`**kwargs`		Additional parameters (currently unused, for future extension).	`{}`

Attributes:

Name	Type	Description
`model_name`	`str`	Model identifier.
`model_source`	`str`	The model source being used.
`device`	`str`	The device the model is running on.

Raises:

Type	Description
`ValueError`	If the model cannot be loaded or input is invalid.
`TypeError`	If input to `embed()` is not a string.
`RuntimeError`	If model inference fails.

Note

Requires Python 3.10, 3.11, or 3.12
Requires the sentence-transformers package: pip install sentence-transformers
First run downloads the model (~100MB) from Hugging Face
Cache location: ~/.cache/torch/sentence_transformers/
No API keys or authentication required
Sparse vectors have ~30k dimensions but only ~100-200 non-zero values
Best combined with dense embeddings for hybrid retrieval

SPLADE vs Dense Embeddings:

Dense: Continuous semantic vectors, good for semantic similarity
Sparse: Lexical keyword-based, interpretable, good for exact matching
Hybrid: Combine both for best retrieval performance

Examples:

>>> # Memory-efficient: both instances share the same model (~200MB)
>>> from zvec.extension import DefaultLocalSparseEmbedding
>>>
>>> # Query embedding
>>> query_emb = DefaultLocalSparseEmbedding(encoding_type="query")
>>> query_vec = query_emb.embed("machine learning algorithms")
>>> type(query_vec)
<class 'dict'>
>>> len(query_vec)  # Only non-zero dimensions
156

>>> # Document embedding (shares model with query_emb)
>>> doc_emb = DefaultLocalSparseEmbedding(encoding_type="document")
>>> doc_vec = doc_emb.embed("Machine learning is a subset of AI")
>>> # Total memory: ~200MB (not 400MB) thanks to model caching

>>> # Asymmetric retrieval example
>>> query_vec = query_emb.embed("what causes aging fast")
>>> doc_vec = doc_emb.embed(
...     "UV-A light causes tanning, skin aging, and cataracts..."
... )
>>>
>>> # Calculate similarity (dot product for sparse vectors)
>>> similarity = sum(
...     query_vec.get(k, 0) * doc_vec.get(k, 0)
...     for k in set(query_vec) | set(doc_vec)
... )

>>> # Batch processing
>>> queries = ["query 1", "query 2", "query 3"]
>>> query_vecs = [query_emb.embed(q) for q in queries]
>>>
>>> documents = ["doc 1", "doc 2", "doc 3"]
>>> doc_vecs = [doc_emb.embed(d) for d in documents]

>>> # Inspecting sparse dimensions (output is sorted by indices)
>>> query_vec = query_emb.embed("machine learning")
>>> list(query_vec.items())[:5]  # First 5 dimensions (by index)
[(10, 0.45), (23, 0.87), (56, 0.32), (89, 1.12), (120, 0.65)]
>>>
>>> # Sort by weight to find most important terms
>>> sorted_by_weight = sorted(query_vec.items(), key=lambda x: x[1], reverse=True)
>>> top_5 = sorted_by_weight[:5]  # Top 5 most important terms
>>> top_5
[(1023, 1.45), (245, 1.23), (8901, 0.98), (5678, 0.87), (12034, 0.76)]

>>> # Using GPU for faster inference
>>> sparse_emb = DefaultLocalSparseEmbedding(device="cuda")
>>> vector = sparse_emb.embed("natural language processing")

>>> # Hybrid retrieval example (combining dense + sparse)
>>> from zvec.extension import DefaultDenseEmbedding
>>> dense_emb = DefaultDenseEmbedding()
>>> sparse_emb = DefaultLocalSparseEmbedding()
>>>
>>> query = "deep learning neural networks"
>>> dense_vec = dense_emb.embed(query)   # [0.1, -0.3, 0.5, ...]
>>> sparse_vec = sparse_emb.embed(query)  # {12: 0.8, 45: 1.2, ...}

>>> # Error handling
>>> try:
...     sparse_emb.embed("")  # Empty string
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only

>>> # Cache management
>>> # Check cache status
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 1
>>>
>>> # Clear cache to free memory
>>> DefaultLocalSparseEmbedding.clear_cache()
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 0
>>>
>>> # Remove specific model from cache
>>> query_emb = DefaultLocalSparseEmbedding()  # Creates CPU model
>>> cuda_emb = DefaultLocalSparseEmbedding(device="cuda")  # Creates CUDA model
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 2
>>>
>>> # Remove only CPU model
>>> removed = DefaultLocalSparseEmbedding.remove_from_cache(device=None)
>>> print(f"Removed: {removed}")
True
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 1

See Also

SparseEmbeddingFunction: Base class for sparse embeddings
DefaultDenseEmbedding: Dense embedding with all-MiniLM-L6-v2
QwenDenseEmbedding: Alternative using Qwen API

References

SPLADE Paper: https://arxiv.org/abs/2109.10086
Model: https://huggingface.co/naver/splade-cocondenser-ensembledistil

Initialize with SPLADE model.

Parameters:

Name	Type	Description	Default
`model_source`	`Literal['huggingface', 'modelscope']`	Model source. Defaults to "huggingface".	`'huggingface'`
`device`	`Optional[str]`	Target device ("cpu", "cuda", "mps", or None). Defaults to None (automatic detection).	`None`
`encoding_type`	`Literal['query', 'document']`	Encoding type for embeddings. - "query": Optimize for search queries (default) - "document": Optimize for indexed documents This distinction is important for asymmetric retrieval tasks.	`'query'`
`**kwargs`		Additional parameters (reserved for future use).	`{}`

Raises:

Type	Description
`ImportError`	If sentence-transformers is not installed.
`ValueError`	If model cannot be loaded.

Note

Multiple instances with the same (model_source, device) configuration will share the same underlying model to save memory. Different instances can use different encoding_type settings while sharing the model.

Model Selection:

Uses naver/splade-cocondenser-ensembledistil instead of the newer naver/splade-v3 because splade-v3 is a gated model requiring Hugging Face authentication. The cocondenser-ensembledistil variant:

Does not require authentication or API tokens
Is immediately available for all users
Provides comparable retrieval performance (~2% difference)
Avoids "Access to model is restricted" errors

If you need splade-v3 and have obtained access, you can subclass this class and override the model_name parameter.

Examples:

>>> # Both instances share the same model (saves memory)
>>> query_emb = DefaultLocalSparseEmbedding(encoding_type="query")
>>> doc_emb = DefaultLocalSparseEmbedding(encoding_type="document")
>>> # Only one model is loaded in memory

Methods:

Name	Description
`clear_cache`	Clear all cached SPLADE models from memory.
`get_cache_info`	Get information about currently cached models.
`remove_from_cache`	Remove a specific model from cache.
`__call__`	Make the embedding function callable.
`embed`	Generate sparse embedding vector for the input text.

Attributes

model_name `property`

model_name: str

str: The Sentence Transformer model name currently in use.

model_source `property`

model_source: str

str: The model source being used ("huggingface" or "modelscope").

device `property`

device: str

str: The device the model is running on.

extra_params `property`

extra_params: dict

dict: Extra parameters for model-specific customization.

Functions

clear_cache `classmethod`

clear_cache() -> None

Clear all cached SPLADE models from memory.

This is useful for: - Freeing memory when models are no longer needed - Forcing a fresh model reload - Testing and debugging Examples: >>> # Clear cache to free memory >>> DefaultLocalSparseEmbedding.clear_cache()

>>> # Or in tests to ensure fresh model loading
>>> def test_something():
...     DefaultLocalSparseEmbedding.clear_cache()
...     emb = DefaultLocalSparseEmbedding()
...     # Test with fresh model

get_cache_info `classmethod`

get_cache_info() -> dict

Get information about currently cached models.

Returns:

Name	Type	Description
`dict`	`dict`	Dictionary with cache statistics: - cached_models (int): Number of cached model instances - cache_keys (list): List of cache keys (model_name, model_source, device)

Examples:

>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 2
>>> print(f"Cache keys: {info['cache_keys']}")
Cache keys: [('naver/splade-cocondenser-ensembledistil', 'huggingface', None),
            ('naver/splade-cocondenser-ensembledistil', 'huggingface', 'cuda')]

remove_from_cache `classmethod`

remove_from_cache(model_source: str = 'huggingface', device: Optional[str] = None) -> bool

Remove a specific model from cache.

Parameters:

Name	Type	Description	Default
`model_source`	`str`	Model source ("huggingface" or "modelscope"). Defaults to "huggingface".	`'huggingface'`
`device`	`Optional[str]`	Device identifier. Defaults to None.	`None`

Returns:

Name	Type	Description
`bool`	`bool`	True if model was found and removed, False otherwise.

Examples:

>>> # Remove CPU model from cache
>>> removed = DefaultLocalSparseEmbedding.remove_from_cache()
>>> print(f"Removed: {removed}")
True

>>> # Remove CUDA model from cache
>>> removed = DefaultLocalSparseEmbedding.remove_from_cache(device="cuda")
>>> print(f"Removed: {removed}")
True

call

__call__(input: str) -> SparseVectorType

Make the embedding function callable.

embed

embed(input: str) -> SparseVectorType

Generate sparse embedding vector for the input text.

This method uses the SPLADE model to convert input text into a sparse vector representation. The result is a dictionary where keys are dimension indices and values are importance weights (only non-zero values included).

The embedding is optimized based on the encoding_type specified during initialization: "query" for search queries or "document" for indexed content.

Parameters:

Name	Type	Description	Default
`input`	`str`	Input text string to embed. Must be non-empty after stripping whitespace.	required

Returns:

Name	Type	Description
`SparseVectorType`	`SparseVectorType`	A dictionary mapping dimension index to weight. Only non-zero dimensions are included. The dictionary is sorted by indices (keys) in ascending order for consistent output. Example: `{10: 0.5, 245: 0.8, 1023: 1.2, 5678: 0.5}`

Raises:

Type	Description
`TypeError`	If `input` is not a string.
`ValueError`	If input is empty or whitespace-only.
`RuntimeError`	If model inference fails.

Examples:

>>> # Query embedding
>>> query_emb = DefaultLocalSparseEmbedding(encoding_type="query")
>>> query_vec = query_emb.embed("machine learning")
>>> isinstance(query_vec, dict)
True

Note

First call may be slower due to model loading
Subsequent calls are much faster as the model stays in memory
GPU acceleration provides significant speedup
Sparse vectors are memory-efficient (only store non-zero values)

SentenceTransformerFunctionBase

SentenceTransformerFunctionBase(
    model_name: str,
    model_source: Literal["huggingface", "modelscope"] = "huggingface",
    device: Optional[str] = None,
)

Base class for Sentence Transformer functions (both dense and sparse).

This base class provides common functionality for loading and managing sentence-transformers models from Hugging Face or ModelScope. It supports both dense models (e.g., all-MiniLM-L6-v2) and sparse models (e.g., SPLADE).

This class is not meant to be used directly. Use concrete implementations: - SentenceTransformerEmbeddingFunction for dense embeddings - SentenceTransformerSparseEmbeddingFunction for sparse embeddings - DefaultDenseEmbedding for default dense embeddings - DefaultSparseEmbedding for default sparse embeddings

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Model identifier or local path.	required
`model_source`	`Literal['huggingface', 'modelscope']`	Model source.	`'huggingface'`
`device`	`Optional[str]`	Device to run the model on.	`None`

Note

This is an internal base class for code reuse
Subclasses should inherit from appropriate Protocol (Dense/Sparse)
Provides model loading and management functionality

Initialize the base Sentence Transformer functionality.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Model identifier or local path.	required
`model_source`	`Literal['huggingface', 'modelscope']`	Model source.	`'huggingface'`
`device`	`Optional[str]`	Device to run the model on.	`None`

Raises:

Type	Description
`ValueError`	If model_source is invalid.

Attributes:

Name	Type	Description
`model_name`	`str`	str: The Sentence Transformer model name currently in use.
`model_source`	`str`	str: The model source being used ("huggingface" or "modelscope").
`device`	`str`	str: The device the model is running on.

Attributes

model_name `property`

model_name: str

str: The Sentence Transformer model name currently in use.

model_source `property`

model_source: str

str: The model source being used ("huggingface" or "modelscope").

device `property`

device: str

str: The device the model is running on.

Functions

DefaultLocalReRanker

DefaultLocalReRanker(
    query: Optional[str] = None,
    topn: int = 10,
    rerank_field: Optional[str] = None,
    model_name: str = "cross-encoder/ms-marco-MiniLM-L6-v2",
    model_source: Literal["huggingface", "modelscope"] = "huggingface",
    device: Optional[str] = None,
    batch_size: int = 32,
)

Bases: SentenceTransformerFunctionBase, RerankFunction

Re-ranker using Sentence Transformer cross-encoder models for semantic re-ranking.

This re-ranker leverages pre-trained cross-encoder models to perform deep semantic re-ranking of search results. It runs locally without API calls, supports GPU acceleration, and works with models from Hugging Face or ModelScope.

Cross-encoder models evaluate query-document pairs jointly, providing more accurate relevance scores than bi-encoder (embedding-based) similarity.

Parameters:

Name	Type	Description	Default
`query`	`str`	Query text for semantic re-ranking. Required.	`None`
`topn`	`int`	Maximum number of documents to return after re-ranking. Defaults to 10.	`10`
`rerank_field`	`Optional[str]`	Document field name to use as re-ranking input text. Required (e.g., "content", "title", "body").	`None`
`model_name`	`str`	Cross-encoder model identifier or local path. Defaults to `"cross-encoder/ms-marco-MiniLM-L6-v2"` (MS MARCO MiniLM). Common options: - `"cross-encoder/ms-marco-MiniLM-L6-v2"`: Lightweight, fast (~80MB, recommended) - `"cross-encoder/ms-marco-MiniLM-L12-v2"`: Better accuracy (~120MB) - `"BAAI/bge-reranker-base"`: BGE Reranker Base (~280MB) - `"BAAI/bge-reranker-large"`: BGE Reranker Large (highest quality, ~560MB)	`'cross-encoder/ms-marco-MiniLM-L6-v2'`
`model_source`	`Literal['huggingface', 'modelscope']`	Model source. Defaults to `"huggingface"`. - `"huggingface"`: Load from Hugging Face Hub - `"modelscope"`: Load from ModelScope (recommended for users in China)	`'huggingface'`
`device`	`Optional[str]`	Device to run the model on. Options: `"cpu"`, `"cuda"`, `"mps"` (for Apple Silicon), or `None` for automatic detection. Defaults to `None`.	`None`
`batch_size`	`int`	Batch size for processing query-document pairs. Larger values speed up processing but use more memory. Defaults to `32`.	`32`

Attributes:

Name	Type	Description
`query`	`str`	The query text used for re-ranking.
`topn`	`int`	Maximum number of documents to return.
`rerank_field`	`Optional[str]`	Field name used for re-ranking input.
`model_name`	`str`	The cross-encoder model being used.
`model_source`	`str`	The model source ("huggingface" or "modelscope").
`device`	`str`	The device the model is running on.

Raises:

Type	Description
`ValueError`	If `query` is empty/None, `rerank_field` is None, or model cannot be loaded.
`TypeError`	If input types are invalid.
`RuntimeError`	If model inference fails.

Note

Requires Python 3.10, 3.11, or 3.12
Requires sentence-transformers package: pip install sentence-transformers
For ModelScope support, also requires: pip install modelscope
First run downloads the model (~80-560MB depending on model) from chosen source
No API keys or network required after initial download
Cross-encoders are slower than bi-encoders but more accurate
GPU acceleration provides significant speedup (5-10x)

MS MARCO MiniLM-L6-v2 Model (Default):

The default model cross-encoder/ms-marco-MiniLM-L6-v2 is a lightweight and efficient cross-encoder trained on MS MARCO dataset. It provides:

Fast inference speed (suitable for real-time applications)
Small model size (~80MB, quick to download)
Good balance between speed and accuracy
Trained on 500K+ query-document pairs
Public availability without authentication

For users in China:

If you encounter Hugging Face access issues, use ModelScope instead:

.. code-block:: python

# Recommended for users in China
reranker = SentenceTransformerReRanker(
    query="机器学习算法",
    rerank_field="content",
    model_source="modelscope"
)

Alternatively, use Hugging Face mirror:

.. code-block:: bash

export HF_ENDPOINT=https://hf-mirror.com

Examples:

>>> # Basic usage with default MS MARCO MiniLM model
>>> from zvec.extension import SentenceTransformerReRanker
>>>
>>> reranker = SentenceTransformerReRanker(
...     query="machine learning algorithms",
...     topn=5,
...     rerank_field="content"
... )
>>>
>>> # Use in collection.query()
>>> results = collection.query(
...     data={"vector_field": query_vector},
...     reranker=reranker,
...     topk=20
... )

>>> # Using ModelScope for users in China
>>> reranker = SentenceTransformerReRanker(
...     query="深度学习",
...     topn=10,
...     rerank_field="content",
...     model_source="modelscope"
... )

>>> # Using larger model for better quality
>>> reranker = SentenceTransformerReRanker(
...     query="neural networks",
...     topn=5,
...     rerank_field="content",
...     model_name="BAAI/bge-reranker-large",
...     device="cuda",
...     batch_size=64
... )

>>> # Direct rerank call (for testing)
>>> query_results = {
...     "vector1": [
...         Doc(id="1", score=0.9, fields={"content": "Machine learning is..."}),
...         Doc(id="2", score=0.8, fields={"content": "Deep learning is..."}),
...     ]
... }
>>> reranked = reranker.rerank(query_results)
>>> for doc in reranked:
...     print(f"ID: {doc.id}, Score: {doc.score:.4f}")
ID: 2, Score: 0.9234
ID: 1, Score: 0.8567

See Also

RerankFunction: Abstract base class for re-rankers
QwenReRanker: Re-ranker using Qwen API
RrfReRanker: Multi-vector re-ranker using RRF
WeightedReRanker: Multi-vector re-ranker using weighted scores

References

MS MARCO Cross-Encoder: https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2
BGE Reranker: https://huggingface.co/BAAI/bge-reranker-base
Cross-Encoder vs Bi-Encoder: https://www.sbert.net/examples/applications/cross-encoder/README.html

Initialize SentenceTransformerReRanker with query and configuration.

Parameters:

Name	Type	Description	Default
`query`	`Optional[str]`	Query text for semantic matching. Required.	`None`
`topn`	`int`	Number of top results to return.	`10`
`rerank_field`	`Optional[str]`	Document field for re-ranking input.	`None`
`model_name`	`str`	Cross-encoder model identifier.	`'cross-encoder/ms-marco-MiniLM-L6-v2'`
`model_source`	`Literal['huggingface', 'modelscope']`	Model source.	`'huggingface'`
`device`	`Optional[str]`	Target device ("cpu", "cuda", "mps", or None).	`None`
`batch_size`	`int`	Batch size for processing query-document pairs.	`32`

Raises:

Type	Description
`ValueError`	If query is empty or model cannot be loaded.

Methods:

Name	Description
`rerank`	Re-rank documents using Sentence Transformer cross-encoder model.

Attributes

topn `property`

topn: int

int: Number of top documents to return after re-ranking.

rerank_field `property`

rerank_field: Optional[str]

Optional[str]: Field name used as re-ranking input.

model_name `property`

model_name: str

str: The Sentence Transformer model name currently in use.

model_source `property`

model_source: str

str: The model source being used ("huggingface" or "modelscope").

device `property`

device: str

str: The device the model is running on.

query `property`

query: str

str: Query text used for semantic re-ranking.

batch_size `property`

batch_size: int

int: Batch size for processing query-document pairs.

Functions

rerank

rerank(query_results: dict[str, list[Doc]]) -> list[Doc]

Re-rank documents using Sentence Transformer cross-encoder model.

Evaluates each query-document pair using the cross-encoder model to compute relevance scores. Documents are then sorted by these scores and the top-k results are returned.

Parameters:

Name	Type	Description	Default
`query_results`	`dict[str, list[Doc]]`	Mapping from vector field names to lists of retrieved documents. Documents from all fields are deduplicated and re-ranked together.	required

Returns:

Type	Description
`list[Doc]`	list[Doc]: Re-ranked documents (up to `topn`) with updated `score` fields containing relevance scores from the cross-encoder model.

Raises:

Type	Description
`ValueError`	If no valid documents are found or model inference fails.

Note

Duplicate documents (same ID) across fields are processed once
Documents with empty/missing rerank_field content are skipped
Returned scores are logits from the cross-encoder model
Higher scores indicate higher relevance
Processing time is O(n) where n is the number of documents

Examples:

>>> reranker = SentenceTransformerReRanker(
...     query="machine learning",
...     topn=3,
...     rerank_field="content"
... )
>>> query_results = {
...     "vector1": [
...         Doc(id="1", score=0.9, fields={"content": "ML basics"}),
...         Doc(id="2", score=0.8, fields={"content": "DL tutorial"}),
...     ]
... }
>>> reranked = reranker.rerank(query_results)
>>> len(reranked) <= 3
True

Extension

zvec.extension

Classes

BM25EmbeddingFunction

corpus

encoding_type

language

b

k1

**kwargs

corpus

encoding_type

language

b

k1

**kwargs

Attributes

corpus_size property

encoding_type property

language property

extra_params property

Functions

__call__

embed cached

DenseEmbeddingFunction

MD

Functions

embed abstractmethod

SparseEmbeddingFunction

MD

Functions

embed abstractmethod

RrfReRanker

topn

rerank_field

rank_constant

Attributes

topn property

rerank_field property

Functions

rerank

WeightedReRanker

topn

rerank_field

metric

weights

Attributes

topn property

rerank_field property

weights property

metric property

Functions

rerank

OpenAIDenseEmbedding

model

dimension

api_key

base_url

model

dimension

api_key

base_url

**kwargs

Attributes

model property

dimension property

extra_params property

Functions

__call__

embed cached

OpenAIFunctionBase

model

api_key

base_url

model

api_key

base_url

Attributes

model property

Functions

`corpus`

`encoding_type`

`language`

`b`

`k1`

`**kwargs`

`corpus`

`encoding_type`

`language`

`b`

`k1`

`**kwargs`

corpus_size `property`

encoding_type `property`

language `property`

extra_params `property`

call

embed `cached`

`MD`

embed `abstractmethod`

`MD`

embed `abstractmethod`

`topn`

`rerank_field`

`rank_constant`

topn `property`

rerank_field `property`

`topn`

`rerank_field`

`metric`

`weights`

topn `property`

rerank_field `property`

weights `property`

metric `property`

`model`

`dimension`

`api_key`

`base_url`

`model`

`dimension`

`api_key`

`base_url`

`**kwargs`

model `property`

dimension `property`

extra_params `property`

call

embed `cached`

`model`

`api_key`

`base_url`

`model`

`api_key`

`base_url`

model `property`

`dimension`

`model`

`api_key`

`**kwargs`

`dimension`

`model`

`api_key`

`**kwargs`

model `property`

dimension `property`

extra_params `property`

call

embed `cached`

`dimension`

`model`

`api_key`

`**kwargs`

`dimension`

`model`

`api_key`

`**kwargs`

model `property`

extra_params `property`

call