Extension
zvec.extension
Modules:
| Name | Description |
|---|---|
bm25_embedding_function |
|
embedding_function |
|
multi_vector_reranker |
|
openai_embedding_function |
|
openai_function |
|
qwen_embedding_function |
|
qwen_function |
|
qwen_rerank_function |
|
rerank_function |
|
sentence_transformer_embedding_function |
|
sentence_transformer_function |
|
sentence_transformer_rerank_function |
|
Classes:
| Name | Description |
|---|---|
BM25EmbeddingFunction |
BM25-based sparse embedding function using DashText SDK. |
DenseEmbeddingFunction |
Protocol for dense vector embedding functions. |
SparseEmbeddingFunction |
Abstract base class for sparse vector embedding functions. |
RrfReRanker |
Re-ranker using Reciprocal Rank Fusion (RRF) for multi-vector search. |
WeightedReRanker |
Re-ranker that combines scores from multiple vector fields using weights. |
OpenAIDenseEmbedding |
Dense text embedding function using OpenAI API. |
OpenAIFunctionBase |
Base class for OpenAI functions. |
QwenDenseEmbedding |
Dense text embedding function using Qwen (DashScope) API. |
QwenSparseEmbedding |
Sparse text embedding function using Qwen (DashScope) API. |
QwenFunctionBase |
Base class for Qwen (DashScope) functions. |
QwenReRanker |
Re-ranker using Qwen (DashScope) cross-encoder API for semantic re-ranking. |
ReRanker |
Abstract base class for re-ranking search results. |
DefaultLocalDenseEmbedding |
Default local dense embedding using all-MiniLM-L6-v2 model. |
DefaultLocalSparseEmbedding |
Default local sparse embedding using SPLADE model. |
SentenceTransformerFunctionBase |
Base class for Sentence Transformer functions (both dense and sparse). |
DefaultLocalReRanker |
Re-ranker using Sentence Transformer cross-encoder models for semantic re-ranking. |
Classes
BM25EmbeddingFunction
BM25EmbeddingFunction(
corpus: Optional[list[str]] = None,
encoding_type: Literal["query", "document"] = "query",
language: Literal["zh", "en"] = "zh",
b: float = 0.75,
k1: float = 1.2,
**kwargs
)
Bases: SparseEmbeddingFunction[TEXT]
BM25-based sparse embedding function using DashText SDK.
This class provides text-to-sparse-vector embedding capabilities using the DashText library with BM25 algorithm. BM25 (Best Matching 25) is a probabilistic retrieval function used for lexical search and document ranking based on term frequency and inverse document frequency.
BM25 generates sparse vectors where each dimension corresponds to a term in the vocabulary, and the value represents the BM25 score for that term. It's particularly effective for:
- Lexical search and keyword matching
- Document ranking and information retrieval
- Combining with dense embeddings for hybrid search
- Traditional IR tasks where exact term matching is important
This implementation uses DashText's SparseVectorEncoder, which provides efficient BM25 computation for Chinese and English text using either a built-in encoder or custom corpus training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Optional[list[str]]
|
List of documents to train the
BM25 encoder. If provided, creates a custom encoder trained on this
corpus for better domain-specific accuracy. If |
None
|
|
Literal['query', 'document']
|
Encoding mode
for text processing. Use |
'query'
|
|
Literal['zh', 'en']
|
Language for built-in encoder.
Only used when corpus is None. |
'zh'
|
|
float
|
Document length normalization parameter for BM25.
Range [0, 1]. 0 means no normalization, 1 means full normalization.
Only used with custom corpus. Defaults to |
0.75
|
|
float
|
Term frequency saturation parameter for BM25.
Higher values give more weight to term frequency. Only used with
custom corpus. Defaults to |
1.2
|
|
Additional parameters for DashText encoder customization. |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
corpus_size |
int
|
Number of documents in the training corpus (0 if using built-in encoder). |
encoding_type |
str
|
The encoding type being used ("query" or "document"). |
language |
str
|
The language of the built-in encoder ("zh" or "en"). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If corpus is provided but empty or contains non-string elements. |
TypeError
|
If input to |
RuntimeError
|
If DashText encoder initialization or training fails. |
Note
- Requires Python 3.10, 3.11, or 3.12
- Requires the
dashtextpackage:pip install dashtext -
Two encoder options available:
-
Built-in encoder (no corpus needed): Pre-trained models for Chinese (zh) and English (en), good generalization, works out-of-the-box
-
Custom encoder (corpus required): Better accuracy for domain-specific terminology, requires training on your full corpus with BM25 parameters
-
Encoding types:
-
encoding_type="query": Optimized for search queries (shorter text) -
encoding_type="document": Optimized for document indexing (longer text) -
BM25 parameters (b, k1) only apply to custom encoder training
- Output is sorted by indices (vocabulary term IDs) for consistency
- Results are cached (LRU cache, maxsize=10) to reduce computation
- No API key or network connectivity required (local computation)
Examples:
>>> # Option 1: Using built-in encoder for Chinese (no corpus needed)
>>> from zvec.extension import BM25EmbeddingFunction
>>>
>>> # For query encoding (Chinese)
>>> bm25_query_zh = BM25EmbeddingFunction(language="zh", encoding_type="query")
>>> query_vec = bm25_query_zh.embed("什么是机器学习")
>>> isinstance(query_vec, dict)
True
>>> # query_vec: {1169440797: 0.29, 2045788977: 0.70, ...}
>>> # For document encoding (Chinese)
>>> bm25_doc_zh = BM25EmbeddingFunction(language="zh", encoding_type="document")
>>> doc_vec = bm25_doc_zh.embed("机器学习是人工智能的一个重要分支...")
>>> isinstance(doc_vec, dict)
True
>>> # Using built-in encoder for English
>>> bm25_query_en = BM25EmbeddingFunction(language="en", encoding_type="query")
>>> query_vec_en = bm25_query_en.embed("what is vector search service")
>>> isinstance(query_vec_en, dict)
True
>>> # Option 2: Using custom corpus for domain-specific accuracy
>>> corpus = [
... "机器学习是人工智能的一个重要分支",
... "深度学习使用多层神经网络进行特征提取",
... "自然语言处理技术用于理解和生成人类语言"
... ]
>>> bm25_custom = BM25EmbeddingFunction(
... corpus=corpus,
... encoding_type="query",
... b=0.75,
... k1=1.2
... )
>>> custom_vec = bm25_custom.embed("机器学习算法")
>>> isinstance(custom_vec, dict)
True
>>> # Hybrid search: combining with dense embeddings
>>> from zvec.extension import DefaultLocalDenseEmbedding
>>> dense_emb = DefaultLocalDenseEmbedding()
>>> bm25_emb = BM25EmbeddingFunction(language="zh", encoding_type="query")
>>>
>>> query = "machine learning algorithms"
>>> dense_vec = dense_emb.embed(query) # Semantic similarity
>>> sparse_vec = bm25_emb.embed(query) # Lexical matching
>>> # Combine scores for hybrid retrieval
>>> # Callable interface
>>> sparse_vec = bm25_query_zh("information retrieval")
>>> isinstance(sparse_vec, dict)
True
>>> # Error handling
>>> try:
... bm25_query_zh.embed("") # Empty query
... except ValueError as e:
... print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
See Also
SparseEmbeddingFunction: Base class for sparse embeddingsDefaultLocalSparseEmbedding: SPLADE-based sparse embeddingQwenSparseEmbedding: API-based sparse embedding using QwenDefaultLocalDenseEmbedding: Dense embedding for semantic search
References
- DashText Documentation: https://help.aliyun.com/zh/document_detail/2546039.html
- DashText PyPI: https://pypi.org/project/dashtext/
- BM25 Algorithm: Robertson & Zaragoza (2009)
Initialize the BM25 embedding function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Optional[list[str]]
|
Optional corpus for training custom encoder. If None, uses built-in encoder. Defaults to None. |
None
|
|
Literal['query', 'document']
|
Text encoding mode. Use "query" for search queries, "document" for indexing. Defaults to "query". |
'query'
|
|
Literal['zh', 'en']
|
Language for built-in encoder. "zh" for Chinese, "en" for English. Defaults to "zh". |
'zh'
|
|
float
|
Document length normalization for BM25 [0, 1]. Only used with custom corpus. Defaults to 0.75. |
0.75
|
|
float
|
Term frequency saturation for BM25. Only used with custom corpus. Defaults to 1.2. |
1.2
|
|
Additional DashText encoder parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If corpus is provided but empty or invalid. |
ImportError
|
If dashtext package is not installed. |
RuntimeError
|
If encoder initialization or training fails. |
Methods:
| Name | Description |
|---|---|
__call__ |
Make the embedding function callable. |
embed |
Generate BM25 sparse embedding for the input text. |
Attributes
corpus_size
property
corpus_size: int
int: Number of documents in the training corpus (0 if using built-in encoder).
encoding_type
property
encoding_type: str
str: The encoding type being used ("query" or "document").
language
property
language: str
str: The language of the built-in encoder ("zh" or "en").
extra_params
property
extra_params: dict
dict: Extra parameters for DashText encoder customization.
Functions
__call__
__call__(input: TEXT) -> SparseVectorType
Make the embedding function callable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
TEXT
|
Input text to embed. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SparseVectorType |
SparseVectorType
|
Sparse vector as dictionary. |
embed
cached
embed(input: TEXT) -> SparseVectorType
Generate BM25 sparse embedding for the input text.
This method computes BM25 scores for the input text using DashText's SparseVectorEncoder. The encoding behavior depends on the encoding_type:
encoding_type="query": Usesencode_queries()for search queriesencoding_type="document": Usesencode_documents()for documents
The result is a sparse vector where keys are term indices in the vocabulary and values are BM25 scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
TEXT
|
Input text string to embed. Must be non-empty after stripping whitespace. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SparseVectorType |
SparseVectorType
|
A dictionary mapping vocabulary term index to BM25 score.
Only non-zero scores are included. The dictionary is sorted by indices
(keys) in ascending order for consistent output.
Example: |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
ValueError
|
If input is empty or whitespace-only. |
RuntimeError
|
If BM25 encoding fails. |
Examples:
>>> bm25 = BM25EmbeddingFunction(language="zh", encoding_type="query")
>>> sparse_vec = bm25.embed("query text")
>>> isinstance(sparse_vec, dict)
True
>>> all(isinstance(k, int) and isinstance(v, float) for k, v in sparse_vec.items())
True
>>> # Verify sorted output
>>> keys = list(sparse_vec.keys())
>>> keys == sorted(keys)
True
>>> # Error: empty input
>>> bm25.embed(" ")
ValueError: Input text cannot be empty or whitespace only
>>> # Error: non-string input
>>> bm25.embed(123)
TypeError: Expected 'input' to be str, got int
Note
- BM25 scores are relative to the vocabulary statistics
- Output dictionary is always sorted by indices for consistency
- Terms not in the vocabulary will have zero scores (not included)
- This method is cached (maxsize=10) for performance
- DashText automatically handles Chinese/English text segmentation
DenseEmbeddingFunction
Bases: Protocol[MD]
Protocol for dense vector embedding functions.
Dense embedding functions map multimodal input (text, image, or audio) to fixed-length real-valued vectors. This is a Protocol class that defines the interface - implementations should provide their own initialization and properties.
Class Type Parameters:
| Name | Bound or Constraints | Description | Default |
|---|---|---|---|
|
The type of input data (bound to Embeddable: TEXT, IMAGE, or AUDIO). |
required |
Note
- This is a Protocol class - it only defines the
embed()interface. - Implementations are free to define their own
__init__, properties, and additional methods as needed. - The
embed()method is the only required interface.
Examples:
>>> # Custom text embedding implementation
>>> class MyTextEmbedding:
... def __init__(self, dimension: int, model_name: str):
... self.dimension = dimension
... self.model = load_model(model_name)
...
... def embed(self, input: str) -> list[float]:
... return self.model.encode(input).tolist()
>>> # Custom image embedding implementation
>>> class MyImageEmbedding:
... def __init__(self, dimension: int = 512):
... self.dimension = dimension
... self.model = load_image_model()
...
... def embed(self, input: Union[str, bytes, np.ndarray]) -> list[float]:
... if isinstance(input, str):
... image = load_image_from_path(input)
... else:
... image = input
... return self.model.extract_features(image).tolist()
>>> # Using built-in implementations
>>> from zvec.extension import QwenDenseEmbedding
>>> text_emb = QwenDenseEmbedding(dimension=768, api_key="sk-xxx")
>>> vector = text_emb.embed("Hello world")
Methods:
| Name | Description |
|---|---|
embed |
Generate a dense embedding vector for the input data. |
Functions
embed
abstractmethod
embed(input: MD) -> DenseVectorType
Generate a dense embedding vector for the input data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
MD
|
Multimodal input data to embed. Can be: - TEXT (str): Text string - IMAGE (str | bytes | np.ndarray): Image file path, raw bytes, or array - AUDIO (str | bytes | np.ndarray): Audio file path, raw bytes, or array |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DenseVectorType |
DenseVectorType
|
A dense vector representing the embedding. Can be list[float], list[int], or np.ndarray. Length should match the implementation's dimension. |
SparseEmbeddingFunction
Bases: Protocol[MD]
Abstract base class for sparse vector embedding functions.
Sparse embedding functions map multimodal input (text, image, or audio) to a dictionary of {index: weight}, where only non-zero dimensions are stored. You can inherit this class to create custom sparse embedding functions.
Class Type Parameters:
| Name | Bound or Constraints | Description | Default |
|---|---|---|---|
|
The type of input data (bound to Embeddable: TEXT, IMAGE, or AUDIO). |
required |
Note
Subclasses must implement the embed() method.
Examples:
>>> # Using built-in text sparse embedding (e.g., BM25, TF-IDF)
>>> sparse_emb = SomeSparseEmbedding()
>>> vector = sparse_emb.embed("Hello world")
>>> # Returns: {0: 0.5, 42: 1.2, 100: 0.8}
>>> # Custom BM25 sparse embedding function
>>> class MyBM25Embedding(SparseEmbeddingFunction):
... def __init__(self, vocab_size: int = 10000):
... self.vocab_size = vocab_size
... self.tokenizer = MyTokenizer()
...
... def embed(self, input: str) -> dict[int, float]:
... tokens = self.tokenizer.tokenize(input)
... sparse_vector = {}
... for token_id, weight in self._calculate_bm25(tokens):
... if weight > 0:
... sparse_vector[token_id] = weight
... return sparse_vector
...
... def _calculate_bm25(self, tokens):
... # BM25 calculation logic
... pass
>>> # Custom sparse image feature extractor
>>> class MySparseImageEmbedding(SparseEmbeddingFunction):
... def embed(self, input: Union[str, bytes, np.ndarray]) -> dict[int, float]:
... image = self._load_image(input)
... features = self._extract_sparse_features(image)
... return {idx: val for idx, val in enumerate(features) if val != 0}
Methods:
| Name | Description |
|---|---|
embed |
Generate a sparse embedding for the input data. |
Functions
embed
abstractmethod
embed(input: MD) -> SparseVectorType
Generate a sparse embedding for the input data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
MD
|
Multimodal input data to embed. Can be: - TEXT (str): Text string - IMAGE (str | bytes | np.ndarray): Image file path, raw bytes, or array - AUDIO (str | bytes | np.ndarray): Audio file path, raw bytes, or array |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SparseVectorType |
SparseVectorType
|
Mapping from dimension index to non-zero weight. Only dimensions with non-zero values are included. |
RrfReRanker
RrfReRanker(topn: int = 10, rerank_field: Optional[str] = None, rank_constant: int = 60)
Bases: RerankFunction
Re-ranker using Reciprocal Rank Fusion (RRF) for multi-vector search.
RRF combines results from multiple vector queries without requiring relevance scores. It assigns higher weight to documents that appear early in multiple result lists.
The RRF score for a document at rank r is: 1 / (k + r + 1),
where k is the rank constant.
Note
This re-ranker is specifically designed for multi-vector scenarios where query results from multiple vector fields need to be combined.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
int
|
Number of top documents to return. Defaults to 10. |
10
|
|
Optional[str]
|
Ignored by RRF. Defaults to None. |
None
|
|
int
|
Smoothing constant |
60
|
Methods:
| Name | Description |
|---|---|
rerank |
Apply Reciprocal Rank Fusion to combine multiple query results. |
Attributes:
| Name | Type | Description |
|---|---|---|
topn |
int
|
int: Number of top documents to return after re-ranking. |
rerank_field |
Optional[str]
|
Optional[str]: Field name used as re-ranking input. |
Attributes
topn
property
topn: int
int: Number of top documents to return after re-ranking.
rerank_field
property
rerank_field: Optional[str]
Optional[str]: Field name used as re-ranking input.
Functions
rerank
rerank(query_results: dict[str, list[Doc]]) -> list[Doc]
WeightedReRanker
WeightedReRanker(
topn: int = 10,
rerank_field: Optional[str] = None,
metric: MetricType = L2,
weights: Optional[dict[str, float]] = None,
)
Bases: RerankFunction
Re-ranker that combines scores from multiple vector fields using weights.
Each vector field's relevance score is normalized based on its metric type, then scaled by a user-provided weight. Final scores are summed across fields.
Note
This re-ranker is specifically designed for multi-vector scenarios where query results from multiple vector fields need to be combined with configurable weights.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
int
|
Number of top documents to return. Defaults to 10. |
10
|
|
Optional[str]
|
Ignored. Defaults to None. |
None
|
|
MetricType
|
Distance metric used for score normalization.
Defaults to |
L2
|
|
Optional[dict[str, float]]
|
Weight per vector field. Fields not listed use weight 1.0. Defaults to None. |
None
|
Note
Supported metrics: L2, IP, COSINE. Scores are normalized to [0, 1].
Methods:
| Name | Description |
|---|---|
rerank |
Combine scores from multiple vector fields using weighted sum. |
Attributes:
| Name | Type | Description |
|---|---|---|
topn |
int
|
int: Number of top documents to return after re-ranking. |
rerank_field |
Optional[str]
|
Optional[str]: Field name used as re-ranking input. |
weights |
dict[str, float]
|
dict[str, float]: Weight mapping for vector fields. |
metric |
MetricType
|
MetricType: Distance metric used for score normalization. |
Attributes
topn
property
topn: int
int: Number of top documents to return after re-ranking.
rerank_field
property
rerank_field: Optional[str]
Optional[str]: Field name used as re-ranking input.
weights
property
weights: dict[str, float]
dict[str, float]: Weight mapping for vector fields.
Functions
rerank
rerank(query_results: dict[str, list[Doc]]) -> list[Doc]
OpenAIDenseEmbedding
OpenAIDenseEmbedding(
model: str = "text-embedding-3-small",
dimension: Optional[int] = None,
api_key: Optional[str] = None,
base_url: Optional[str] = None,
**kwargs
)
Bases: OpenAIFunctionBase, DenseEmbeddingFunction[TEXT]
Dense text embedding function using OpenAI API.
This class provides text-to-vector embedding capabilities using OpenAI's
embedding models. It inherits from DenseEmbeddingFunction and implements
dense text embedding via the OpenAI API.
The implementation supports various OpenAI embedding models with different dimensions and includes automatic result caching for improved performance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
OpenAI embedding model identifier.
Defaults to |
'text-embedding-3-small'
|
|
Optional[int]
|
Desired output embedding dimension.
If |
None
|
|
Optional[str]
|
OpenAI API authentication key.
If |
None
|
|
Optional[str]
|
Custom API base URL for OpenAI-compatible
services. Defaults to |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
dimension |
int
|
The embedding vector dimension. |
data_type |
DataType
|
Always |
model |
str
|
The OpenAI model name being used. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If API key is not provided and not found in environment, or if API returns an error response. |
TypeError
|
If input to |
RuntimeError
|
If network error or OpenAI service error occurs. |
Note
- Requires Python 3.10, 3.11, or 3.12
- Requires the
openaipackage:pip install openai - Embedding results are cached (LRU cache, maxsize=10) to reduce API calls
- Network connectivity to OpenAI API endpoints is required
- API usage incurs costs based on your OpenAI subscription plan
- Rate limits apply based on your OpenAI account tier
Examples:
>>> # Basic usage with default model
>>> from zvec.extension import OpenAIDenseEmbedding
>>> import os
>>> os.environ["OPENAI_API_KEY"] = "sk-..."
>>>
>>> emb_func = OpenAIDenseEmbedding()
>>> vector = emb_func.embed("Hello, world!")
>>> len(vector)
1536
>>> # Using specific model with custom dimension
>>> emb_func = OpenAIDenseEmbedding(
... model="text-embedding-3-large",
... dimension=1024,
... api_key="sk-..."
... )
>>> vector = emb_func.embed("Machine learning is fascinating")
>>> len(vector)
1024
>>> # Using with custom base URL (e.g., Azure OpenAI)
>>> emb_func = OpenAIDenseEmbedding(
... model="text-embedding-ada-002",
... api_key="your-azure-key",
... base_url="https://your-resource.openai.azure.com/"
... )
>>> vector = emb_func("Natural language processing")
>>> isinstance(vector, list)
True
>>> # Batch processing with caching benefit
>>> texts = ["First text", "Second text", "First text"]
>>> vectors = [emb_func.embed(text) for text in texts]
>>> # Third call uses cached result for "First text"
>>> # Error handling
>>> try:
... emb_func.embed("") # Empty string
... except ValueError as e:
... print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
See Also
DenseEmbeddingFunction: Base class for dense embeddingsQwenDenseEmbedding: Alternative using Qwen/DashScope APIDefaultDenseEmbedding: Local model without API callsSparseEmbeddingFunction: Base class for sparse embeddings
Initialize the OpenAI dense embedding function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
OpenAI model name. Defaults to "text-embedding-3-small". |
'text-embedding-3-small'
|
|
Optional[int]
|
Target embedding dimension or None for default. |
None
|
|
Optional[str]
|
API key or None to use environment variable. |
None
|
|
Optional[str]
|
Custom API base URL or None for default. |
None
|
|
Additional parameters for API calls. Examples:
- |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If API key is not provided and not in environment. |
Methods:
| Name | Description |
|---|---|
__call__ |
Make the embedding function callable. |
embed |
Generate dense embedding vector for the input text. |
Attributes
model
property
model: str
str: The OpenAI model name currently in use.
dimension
property
dimension: int
int: The expected dimensionality of the embedding vector.
extra_params
property
extra_params: dict
dict: Extra parameters for model-specific customization.
Functions
__call__
__call__(input: TEXT) -> DenseVectorType
Make the embedding function callable.
embed
cached
embed(input: TEXT) -> DenseVectorType
Generate dense embedding vector for the input text.
This method calls the OpenAI Embeddings API to convert input text into a dense vector representation. Results are cached to improve performance for repeated inputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
TEXT
|
Input text string to embed. Must be non-empty after stripping whitespace. Maximum length is 8191 tokens for most models. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DenseVectorType |
DenseVectorType
|
A list of floats representing the embedding vector.
Length equals |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
ValueError
|
If input is empty/whitespace-only, or if the API returns an error or malformed response. |
RuntimeError
|
If network connectivity issues or OpenAI service errors occur. |
Examples:
>>> emb = OpenAIDenseEmbedding()
>>> vector = emb.embed("Natural language processing")
>>> len(vector)
1536
>>> isinstance(vector[0], float)
True
>>> # Error: empty input
>>> emb.embed(" ")
ValueError: Input text cannot be empty or whitespace only
>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int
Note
- This method is cached (maxsize=10). Identical inputs return cached results.
- The cache is based on exact string match (case-sensitive).
- Consider pre-processing text (lowercasing, normalization) for better caching.
OpenAIFunctionBase
Base class for OpenAI functions.
This base class provides common functionality for calling OpenAI APIs and handling responses. It supports embeddings (dense) operations.
This class is not meant to be used directly. Use concrete implementations:
- OpenAIDenseEmbedding for dense embeddings
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
OpenAI model identifier. |
required |
|
Optional[str]
|
OpenAI API authentication key. |
None
|
|
Optional[str]
|
Custom API base URL. |
None
|
Note
- This is an internal base class for code reuse across OpenAI features
- Subclasses should inherit from appropriate Protocol
- Provides unified API connection and response handling
Initialize the base OpenAI functionality.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
OpenAI model name. |
required |
|
Optional[str]
|
API key or None to use environment variable. |
None
|
|
Optional[str]
|
Custom API base URL or None for default. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If API key is not provided and not in environment. |
Attributes:
| Name | Type | Description |
|---|---|---|
model |
str
|
str: The OpenAI model name currently in use. |
Attributes
model
property
model: str
str: The OpenAI model name currently in use.
Functions
QwenDenseEmbedding
QwenDenseEmbedding(
dimension: int, model: str = "text-embedding-v4", api_key: Optional[str] = None, **kwargs
)
Bases: QwenFunctionBase, DenseEmbeddingFunction[TEXT]
Dense text embedding function using Qwen (DashScope) API.
This class provides text-to-vector embedding capabilities using Alibaba Cloud's
DashScope service and Qwen embedding models. It inherits from
DenseEmbeddingFunction and implements dense text embedding.
The implementation supports various Qwen embedding models with configurable dimensions and includes automatic result caching for improved performance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
int
|
Desired output embedding dimension. Common values: - 512: Balanced performance and accuracy - 1024: Higher accuracy, larger storage - 1536: Maximum accuracy for supported models |
required |
|
str
|
DashScope embedding model identifier.
Defaults to |
'text-embedding-v4'
|
|
Optional[str]
|
DashScope API authentication key.
If |
None
|
|
Additional DashScope API parameters. Supported options:
- Reference: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
dimension |
int
|
The embedding vector dimension. |
data_type |
DataType
|
Always |
model |
str
|
The DashScope model name being used. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If API key is not provided and not found in environment, or if API returns an error response. |
TypeError
|
If input to |
RuntimeError
|
If network error or DashScope service error occurs. |
Note
- Requires Python 3.10, 3.11, or 3.12
- Requires the
dashscopepackage:pip install dashscope - Embedding results are cached (LRU cache, maxsize=10) to reduce API calls
- Network connectivity to DashScope API endpoints is required
- API usage may incur costs based on your DashScope subscription plan
Parameter Guidelines:
- Use
text_type="query"for search queries andtext_type="document"for indexed content to optimize asymmetric retrieval tasks. - For detailed API specifications and parameter usage, refer to: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api
Examples:
>>> # Basic usage with default model
>>> from zvec.extension import QwenDenseEmbedding
>>> import os
>>> os.environ["DASHSCOPE_API_KEY"] = "your-api-key"
>>>
>>> emb_func = QwenDenseEmbedding(dimension=1024)
>>> vector = emb_func.embed("Hello, world!")
>>> len(vector)
1024
>>> # Using specific model with explicit API key
>>> emb_func = QwenDenseEmbedding(
... dimension=512,
... model="text-embedding-v3",
... api_key="sk-xxxxx"
... )
>>> vector = emb_func("Machine learning is fascinating")
>>> isinstance(vector, list)
True
>>> # Using with custom parameters (text_type)
>>> # For search queries - optimize for query-document matching
>>> emb_func = QwenDenseEmbedding(
... dimension=1024,
... text_type="query"
... )
>>> query_vector = emb_func.embed("What is machine learning?")
>>>
>>> # For document embeddings - optimize for being matched by queries
>>> doc_emb_func = QwenDenseEmbedding(
... dimension=1024,
... text_type="document"
... )
>>> doc_vector = doc_emb_func.embed(
... "Machine learning is a subset of artificial intelligence..."
... )
>>> # Batch processing with caching benefit
>>> texts = ["First text", "Second text", "First text"]
>>> vectors = [emb_func.embed(text) for text in texts]
>>> # Third call uses cached result for "First text"
>>> # Error handling
>>> try:
... emb_func.embed("") # Empty string
... except ValueError as e:
... print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
See Also
DenseEmbeddingFunction: Base class for dense embeddingsSparseEmbeddingFunction: Base class for sparse embeddings
Initialize the Qwen dense embedding function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
int
|
Target embedding dimension. |
required |
|
str
|
DashScope model name. Defaults to "text-embedding-v4". |
'text-embedding-v4'
|
|
Optional[str]
|
API key or None to use environment variable. |
None
|
|
Additional DashScope API parameters. Supported options:
- For detailed API documentation, see: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If API key is not provided and not in environment. |
Methods:
| Name | Description |
|---|---|
__call__ |
Make the embedding function callable. |
embed |
Generate dense embedding vector for the input text. |
Attributes
model
property
model: str
str: The DashScope embedding model name currently in use.
dimension
property
dimension: int
int: The expected dimensionality of the embedding vector.
extra_params
property
extra_params: dict
dict: Extra parameters for model-specific customization.
Functions
__call__
__call__(input: TEXT) -> DenseVectorType
Make the embedding function callable.
embed
cached
embed(input: TEXT) -> DenseVectorType
Generate dense embedding vector for the input text.
This method calls the DashScope TextEmbedding API to convert input text into a dense vector representation. Results are cached to improve performance for repeated inputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
TEXT
|
Input text string to embed. Must be non-empty after stripping whitespace. Maximum length depends on the model used (typically 2048-8192 tokens). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DenseVectorType |
DenseVectorType
|
A list of floats representing the embedding vector.
Length equals |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
ValueError
|
If input is empty/whitespace-only, or if the API returns an error or malformed response. |
RuntimeError
|
If network connectivity issues or DashScope service errors occur. |
Examples:
>>> emb = QwenDenseEmbedding(dimension=1024)
>>> vector = emb.embed("Natural language processing")
>>> len(vector)
1024
>>> isinstance(vector[0], float)
True
>>> # Error: empty input
>>> emb.embed(" ")
ValueError: Input text cannot be empty or whitespace only
>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int
Note
- This method is cached (maxsize=10). Identical inputs return cached results.
- The cache is based on exact string match (case-sensitive).
- Consider pre-processing text (lowercasing, normalization) for better caching.
QwenSparseEmbedding
QwenSparseEmbedding(
dimension: int, model: str = "text-embedding-v4", api_key: Optional[str] = None, **kwargs
)
Bases: QwenFunctionBase, SparseEmbeddingFunction[TEXT]
Sparse text embedding function using Qwen (DashScope) API.
This class provides text-to-sparse-vector embedding capabilities using Alibaba Cloud's DashScope service and Qwen embedding models. It generates sparse keyword-weighted vectors suitable for lexical matching and BM25-style retrieval scenarios.
Sparse embeddings are particularly useful for: - Keyword-based search and exact matching - Hybrid retrieval (combining with dense embeddings) - Interpretable search results (weights show term importance)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
int
|
Desired output embedding dimension. Common values: - 512: Balanced performance and accuracy - 1024: Higher accuracy, larger storage - 1536: Maximum accuracy for supported models |
required |
|
str
|
DashScope embedding model identifier.
Defaults to |
'text-embedding-v4'
|
|
Optional[str]
|
DashScope API authentication key.
If |
None
|
|
Additional DashScope API parameters. Supported options:
- |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
model |
str
|
The DashScope model name being used. |
encoding_type |
str
|
The encoding type ("query" or "document"). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If API key is not provided and not found in environment, or if API returns an error response. |
TypeError
|
If input to |
RuntimeError
|
If network error or DashScope service error occurs. |
Note
- Requires Python 3.10, 3.11, or 3.12
- Requires the
dashscopepackage:pip install dashscope - Embedding results are cached (LRU cache, maxsize=10) to reduce API calls
- Network connectivity to DashScope API endpoints is required
- API usage may incur costs based on your DashScope subscription plan
- Sparse vectors have only non-zero dimensions stored as dict
- Output is sorted by indices (keys) in ascending order
Parameter Guidelines:
- Use
encoding_type="query"for search queries andencoding_type="document"for indexed content to optimize asymmetric retrieval tasks. - For detailed API specifications, refer to: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api
Examples:
>>> # Basic usage for query embedding
>>> from zvec.extension import QwenSparseEmbedding
>>> import os
>>> os.environ["DASHSCOPE_API_KEY"] = "your-api-key"
>>>
>>> query_emb = QwenSparseEmbedding(dimension=1024, encoding_type="query")
>>> query_vec = query_emb.embed("machine learning")
>>> type(query_vec)
<class 'dict'>
>>> len(query_vec) # Only non-zero dimensions
156
>>> # Document embedding
>>> doc_emb = QwenSparseEmbedding(dimension=1024, encoding_type="document")
>>> doc_vec = doc_emb.embed("Machine learning is a subset of AI")
>>> isinstance(doc_vec, dict)
True
>>> # Asymmetric retrieval example
>>> query_vec = query_emb.embed("what causes aging fast")
>>> doc_vec = doc_emb.embed(
... "UV-A light causes tanning, skin aging, and cataracts..."
... )
>>>
>>> # Calculate similarity (dot product for sparse vectors)
>>> similarity = sum(
... query_vec.get(k, 0) * doc_vec.get(k, 0)
... for k in set(query_vec) | set(doc_vec)
... )
>>> # Output is sorted by indices
>>> list(query_vec.items())[:5] # First 5 dimensions (by index)
[(10, 0.45), (23, 0.87), (56, 0.32), (89, 1.12), (120, 0.65)]
>>> # Hybrid retrieval (combining dense + sparse)
>>> from zvec.extension import QwenDenseEmbedding
>>> dense_emb = QwenDenseEmbedding(dimension=1024)
>>> sparse_emb = QwenSparseEmbedding(dimension=1024)
>>>
>>> query = "deep learning neural networks"
>>> dense_vec = dense_emb.embed(query) # [0.1, -0.3, 0.5, ...]
>>> sparse_vec = sparse_emb.embed(query) # {12: 0.8, 45: 1.2, ...}
>>> # Error handling
>>> try:
... sparse_emb.embed("") # Empty string
... except ValueError as e:
... print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
See Also
SparseEmbeddingFunction: Base class for sparse embeddingsQwenDenseEmbedding: Dense embedding using Qwen APIDefaultSparseEmbedding: Sparse embedding with SPLADE model
Initialize the Qwen sparse embedding function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
int
|
Target embedding dimension. |
required |
|
str
|
DashScope model name. Defaults to "text-embedding-v4". |
'text-embedding-v4'
|
|
Optional[str]
|
API key or None to use environment variable. |
None
|
|
Additional DashScope API parameters. Supported options:
- |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If API key is not provided and not in environment. |
Methods:
| Name | Description |
|---|---|
__call__ |
Make the embedding function callable. |
embed |
Generate sparse embedding vector for the input text. |
Attributes
model
property
model: str
str: The DashScope embedding model name currently in use.
extra_params
property
extra_params: dict
dict: Extra parameters for model-specific customization.
Functions
__call__
__call__(input: TEXT) -> SparseVectorType
Make the embedding function callable.
embed
cached
embed(input: TEXT) -> SparseVectorType
Generate sparse embedding vector for the input text.
This method calls the DashScope TextEmbedding API with sparse output type to convert input text into a sparse vector representation. The result is a dictionary where keys are dimension indices and values are importance weights (only non-zero values included).
The embedding is optimized based on the encoding_type specified during
initialization: "query" for search queries or "document" for indexed content.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
TEXT
|
Input text string to embed. Must be non-empty after stripping whitespace. Maximum length depends on the model used (typically 2048-8192 tokens). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SparseVectorType |
SparseVectorType
|
A dictionary mapping dimension index to weight.
Only non-zero dimensions are included. The dictionary is sorted
by indices (keys) in ascending order for consistent output.
Example: |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
ValueError
|
If input is empty/whitespace-only, or if the API returns an error or malformed response. |
RuntimeError
|
If network connectivity issues or DashScope service errors occur. |
Examples:
>>> emb = QwenSparseEmbedding(dimension=1024, encoding_type="query")
>>> sparse_vec = emb.embed("machine learning")
>>> isinstance(sparse_vec, dict)
True
>>>
>>> # Verify sorted output
>>> keys = list(sparse_vec.keys())
>>> keys == sorted(keys)
True
>>> # Error: empty input
>>> emb.embed(" ")
ValueError: Input text cannot be empty or whitespace only
>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int
Note
- This method is cached (maxsize=10). Identical inputs return cached results.
- The cache is based on exact string match (case-sensitive).
- Output dictionary is always sorted by indices for consistency.
QwenFunctionBase
Base class for Qwen (DashScope) functions.
This base class provides common functionality for calling DashScope APIs and handling responses. It supports embeddings (dense and sparse) and re-ranking operations.
This class is not meant to be used directly. Use concrete implementations:
- QwenDenseEmbedding for dense embeddings
- QwenSparseEmbedding for sparse embeddings
- QwenReRanker for semantic re-ranking
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
DashScope model identifier. |
required |
|
Optional[str]
|
DashScope API authentication key. |
None
|
Note
- This is an internal base class for code reuse across Qwen features
- Subclasses should inherit from appropriate Protocol/ABC
- Provides unified API connection and response handling
Initialize the base Qwen embedding functionality.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
DashScope model name. |
required |
|
Optional[str]
|
API key or None to use environment variable. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If API key is not provided and not in environment. |
Attributes:
| Name | Type | Description |
|---|---|---|
model |
str
|
str: The DashScope embedding model name currently in use. |
Attributes
model
property
model: str
str: The DashScope embedding model name currently in use.
Functions
QwenReRanker
QwenReRanker(
query: Optional[str] = None,
topn: int = 10,
rerank_field: Optional[str] = None,
model: str = "gte-rerank-v2",
api_key: Optional[str] = None,
)
Bases: QwenFunctionBase, RerankFunction
Re-ranker using Qwen (DashScope) cross-encoder API for semantic re-ranking.
This re-ranker leverages DashScope's TextReRank service to perform cross-encoder style re-ranking. It sends query and document pairs to the API and receives relevance scores based on deep semantic understanding.
The re-ranker is suitable for single-vector or multi-vector search scenarios where semantic relevance to a specific query is required.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
Query text for semantic re-ranking. Required. |
None
|
|
int
|
Maximum number of documents to return after re-ranking. Defaults to 10. |
10
|
|
str
|
Document field name to use as re-ranking input text. Required (e.g., "content", "title", "body"). |
None
|
|
str
|
DashScope re-ranking model identifier.
Defaults to |
'gte-rerank-v2'
|
|
Optional[str]
|
DashScope API authentication key.
If not provided, reads from |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Note
- Requires
dashscopePython package installed - Documents without valid content in
rerank_fieldare skipped - API rate limits and quotas apply per DashScope subscription
Example
reranker = QwenReRanker( ... query="machine learning algorithms", ... topn=5, ... rerank_field="content", ... model="gte-rerank-v2", ... api_key="your-api-key" ... )
Use in collection.query(reranker=reranker)
Initialize QwenReRanker with query and configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Optional[str]
|
Query text for semantic matching. Required. |
None
|
|
int
|
Number of top results to return. |
10
|
|
Optional[str]
|
Document field for re-ranking input. |
None
|
|
str
|
DashScope model name. |
'gte-rerank-v2'
|
|
Optional[str]
|
API key or None to use environment variable. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If query is empty or API key is unavailable. |
Methods:
| Name | Description |
|---|---|
rerank |
Re-rank documents using Qwen's TextReRank API. |
Attributes:
| Name | Type | Description |
|---|---|---|
topn |
int
|
int: Number of top documents to return after re-ranking. |
rerank_field |
Optional[str]
|
Optional[str]: Field name used as re-ranking input. |
model |
str
|
str: The DashScope embedding model name currently in use. |
query |
str
|
str: Query text used for semantic re-ranking. |
Attributes
topn
property
topn: int
int: Number of top documents to return after re-ranking.
rerank_field
property
rerank_field: Optional[str]
Optional[str]: Field name used as re-ranking input.
model
property
model: str
str: The DashScope embedding model name currently in use.
query
property
query: str
str: Query text used for semantic re-ranking.
Functions
rerank
rerank(query_results: dict[str, list[Doc]]) -> list[Doc]
Re-rank documents using Qwen's TextReRank API.
Sends document texts to DashScope TextReRank service along with the query. Returns documents sorted by relevance scores from the cross-encoder model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_results
|
dict[str, list[Doc]]
|
Mapping from vector field names to lists of retrieved documents. Documents from all fields are deduplicated and re-ranked together. |
required |
Returns:
| Type | Description |
|---|---|
list[Doc]
|
list[Doc]: Re-ranked documents (up to |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no valid documents are found or API call fails. |
Note
- Duplicate documents (same ID) across fields are processed once
- Documents with empty/missing
rerank_fieldcontent are skipped - Returned scores are relevance scores from the cross-encoder model
ReRanker
ReRanker(topn: int = 10, rerank_field: Optional[str] = None)
Bases: ABC
Abstract base class for re-ranking search results.
Re-rankers refine the output of one or more vector queries by applying
a secondary scoring strategy. They are used in the query() method of
Collection via the reranker parameter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
int
|
Number of top documents to return after re-ranking. Defaults to 10. |
10
|
|
Optional[str]
|
Field name used as input for re-ranking (e.g., document title or body). Defaults to None. |
None
|
Note
Subclasses must implement the rerank() method.
Methods:
| Name | Description |
|---|---|
rerank |
Re-rank documents from one or more vector queries. |
Attributes:
| Name | Type | Description |
|---|---|---|
topn |
int
|
int: Number of top documents to return after re-ranking. |
rerank_field |
Optional[str]
|
Optional[str]: Field name used as re-ranking input. |
Attributes
topn
property
topn: int
int: Number of top documents to return after re-ranking.
rerank_field
property
rerank_field: Optional[str]
Optional[str]: Field name used as re-ranking input.
Functions
rerank
abstractmethod
rerank(query_results: dict[str, list[Doc]]) -> list[Doc]
Re-rank documents from one or more vector queries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_results
|
dict[str, list[Doc]]
|
Mapping from vector field name to list of retrieved documents (sorted by relevance). |
required |
Returns:
| Type | Description |
|---|---|
list[Doc]
|
list[Doc]: Re-ranked list of documents (length ≤ |
DefaultLocalDenseEmbedding
DefaultLocalDenseEmbedding(
model_source: Literal["huggingface", "modelscope"] = "huggingface",
device: Optional[str] = None,
normalize_embeddings: bool = True,
batch_size: int = 32,
**kwargs
)
Bases: SentenceTransformerFunctionBase, DenseEmbeddingFunction[TEXT]
Default local dense embedding using all-MiniLM-L6-v2 model.
This is the default implementation for dense text embedding that uses the
all-MiniLM-L6-v2 model from Hugging Face by default. This model provides
a good balance between speed and quality for general-purpose text embedding.
The class provides text-to-vector dense embedding capabilities using the sentence-transformers library. It supports models from Hugging Face Hub and ModelScope, runs locally without API calls, and supports CPU/GPU acceleration.
The model produces 384-dimensional embeddings and is optimized for semantic similarity tasks. It runs locally without requiring API keys.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Literal['huggingface', 'modelscope']
|
Model source.
- |
'huggingface'
|
|
Optional[str]
|
Device to run the model on.
Options: |
None
|
|
bool
|
Whether to normalize embeddings to
unit length (L2 normalization). Useful for cosine similarity.
Defaults to |
True
|
|
int
|
Batch size for encoding. Defaults to |
32
|
|
Additional parameters for future extension. |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
dimension |
int
|
Always 384 for both models. |
model_name |
str
|
"all-MiniLM-L6-v2" (HF) or "iic/nlp_gte_sentence-embedding_chinese-small" (MS). |
model_source |
str
|
The model source being used. |
device |
str
|
The device the model is running on. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the model cannot be loaded or input is invalid. |
TypeError
|
If input to |
RuntimeError
|
If model inference fails. |
Note
- Requires Python 3.10, 3.11, or 3.12
- Requires the
sentence-transformerspackage:pip install sentence-transformers - For ModelScope, also requires:
pip install modelscope - First run downloads the model (~50-80MB) from chosen source
- Hugging Face cache:
~/.cache/torch/sentence_transformers/ - ModelScope cache:
~/.cache/modelscope/hub/ - No API keys or network required after initial download
- Inference speed: ~1000 sentences/sec on CPU, ~10000 on GPU
For users in China:
If you encounter Hugging Face access issues, use ModelScope instead:
.. code-block:: python
# Recommended for users in China
emb = DefaultLocalDenseEmbedding(model_source="modelscope")
Alternatively, use Hugging Face mirror:
.. code-block:: bash
export HF_ENDPOINT=https://hf-mirror.com
# Then use default Hugging Face mode
Examples:
>>> # Basic usage with Hugging Face (default)
>>> from zvec.extension import DefaultLocalDenseEmbedding
>>>
>>> emb_func = DefaultLocalDenseEmbedding()
>>> vector = emb_func.embed("Hello, world!")
>>> len(vector)
384
>>> isinstance(vector, list)
True
>>> # Recommended for users in China (uses ModelScope)
>>> emb_func = DefaultLocalDenseEmbedding(model_source="modelscope")
>>> vector = emb_func.embed("你好,世界!") # Works well with Chinese text
>>> len(vector)
384
>>> # Alternative for China users: Use Hugging Face mirror
>>> import os
>>> os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
>>> emb_func = DefaultLocalDenseEmbedding() # Uses HF mirror
>>> vector = emb_func.embed("Hello, world!")
>>> # Using GPU for faster inference
>>> emb_func = DefaultLocalDenseEmbedding(device="cuda")
>>> vector = emb_func("Machine learning is fascinating")
>>> # Normalized vector has unit length
>>> import numpy as np
>>> np.linalg.norm(vector)
1.0
>>> # Batch processing
>>> texts = ["First text", "Second text", "Third text"]
>>> vectors = [emb_func.embed(text) for text in texts]
>>> len(vectors)
3
>>> all(len(v) == 384 for v in vectors)
True
>>> # Semantic similarity
>>> v1 = emb_func.embed("The cat sits on the mat")
>>> v2 = emb_func.embed("A feline rests on a rug")
>>> v3 = emb_func.embed("Python programming")
>>> similarity_high = np.dot(v1, v2) # Similar sentences
>>> similarity_low = np.dot(v1, v3) # Different topics
>>> similarity_high > similarity_low
True
>>> # Error handling
>>> try:
... emb_func.embed("") # Empty string
... except ValueError as e:
... print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
See Also
DenseEmbeddingFunction: Base class for dense embeddingsDefaultLocalSparseEmbedding: Sparse embedding with SPLADEQwenDenseEmbedding: Alternative using Qwen API
Initialize with all-MiniLM-L6-v2 model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Literal['huggingface', 'modelscope']
|
Model source. Defaults to "huggingface". |
'huggingface'
|
|
Optional[str]
|
Target device ("cpu", "cuda", "mps", or None). Defaults to None (automatic detection). |
None
|
|
bool
|
Whether to L2-normalize output vectors. Defaults to True. |
True
|
|
int
|
Batch size for encoding. Defaults to 32. |
32
|
|
Additional parameters for future extension. |
{}
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If sentence-transformers or modelscope is not installed. |
ValueError
|
If model cannot be loaded. |
Methods:
| Name | Description |
|---|---|
__call__ |
Make the embedding function callable. |
embed |
Generate dense embedding vector for the input text. |
Attributes
model_name
property
model_name: str
str: The Sentence Transformer model name currently in use.
model_source
property
model_source: str
str: The model source being used ("huggingface" or "modelscope").
device
property
device: str
str: The device the model is running on.
dimension
property
dimension: int
int: The expected dimensionality of the embedding vector.
extra_params
property
extra_params: dict
dict: Extra parameters for model-specific customization.
Functions
__call__
__call__(input: str) -> DenseVectorType
Make the embedding function callable.
embed
embed(input: str) -> DenseVectorType
Generate dense embedding vector for the input text.
This method uses the Sentence Transformer model to convert input text into a dense vector representation. The model runs locally without requiring API calls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
str
|
Input text string to embed. Must be non-empty after stripping whitespace. Maximum length depends on the model used (typically 128-512 tokens for most models). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DenseVectorType |
DenseVectorType
|
A list of floats representing the embedding vector.
Length equals |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
ValueError
|
If input is empty or whitespace-only. |
RuntimeError
|
If model inference fails. |
Examples:
>>> emb = DefaultLocalDenseEmbedding()
>>> vector = emb.embed("Natural language processing")
>>> len(vector)
384
>>> isinstance(vector[0], float)
True
>>> # Normalized vectors have unit length
>>> import numpy as np
>>> emb = DefaultLocalDenseEmbedding(normalize_embeddings=True)
>>> vector = emb.embed("Test sentence")
>>> np.linalg.norm(vector)
1.0
>>> # Error: empty input
>>> emb.embed(" ")
ValueError: Input text cannot be empty or whitespace only
>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int
>>> # Semantic similarity example
>>> v1 = emb.embed("The cat sits on the mat")
>>> v2 = emb.embed("A feline rests on a rug")
>>> similarity = np.dot(v1, v2) # High similarity due to semantic meaning
>>> similarity > 0.7
True
Note
- First call may be slower due to model loading
- Subsequent calls are much faster as the model stays in memory
- For batch processing, consider encoding multiple texts together (though this method handles single texts only)
- GPU acceleration provides 5-10x speedup over CPU
DefaultLocalSparseEmbedding
DefaultLocalSparseEmbedding(
model_source: Literal["huggingface", "modelscope"] = "huggingface",
device: Optional[str] = None,
encoding_type: Literal["query", "document"] = "query",
**kwargs
)
Bases: SentenceTransformerFunctionBase, SparseEmbeddingFunction[TEXT]
Default local sparse embedding using SPLADE model.
This class provides sparse vector embedding using the SPLADE (SParse Lexical AnD Expansion) model. SPLADE generates sparse, interpretable representations where each dimension corresponds to a vocabulary term with learned importance weights. It's ideal for lexical matching, BM25-style retrieval, and hybrid search scenarios.
The default model is naver/splade-cocondenser-ensembledistil, which is
publicly available without authentication. It produces sparse vectors with
thousands of dimensions but only hundreds of non-zero values, making them
efficient for storage and retrieval while maintaining strong lexical matching.
Model Caching:
This class uses class-level caching to share the SPLADE model across all instances with the same configuration (model_source, device). This significantly reduces memory usage when creating multiple instances for different encoding types (query vs document).
Cache Management:
The class provides methods to manage the model cache:
clear_cache(): Clear all cached models to free memoryget_cache_info(): Get information about cached modelsremove_from_cache(model_source, device): Remove a specific model from cache
.. note:: Why not use splade-v3?
The newer ``naver/splade-v3`` model is gated (requires access approval).
We use ``naver/splade-cocondenser-ensembledistil`` instead.
**To use splade-v3 (if you have access):**
1. Request access at https://huggingface.co/naver/splade-v3
2. Get your Hugging Face token from https://huggingface.co/settings/tokens
3. Set environment variable:
.. code-block:: bash
export HF_TOKEN="your_huggingface_token"
4. Or login programmatically:
.. code-block:: python
from huggingface_hub import login
login(token="your_huggingface_token")
5. To use a custom SPLADE model, you can subclass this class and override
the model_name in ``__init__``, or create your own implementation
inheriting from ``SentenceTransformerFunctionBase`` and
``SparseEmbeddingFunction``.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Literal['huggingface', 'modelscope']
|
Model source.
Defaults to |
'huggingface'
|
|
Optional[str]
|
Device to run the model on.
Options: |
None
|
|
Literal['query', 'document']
|
Encoding type.
- |
'query'
|
|
Additional parameters (currently unused, for future extension). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
model_name |
str
|
Model identifier. |
model_source |
str
|
The model source being used. |
device |
str
|
The device the model is running on. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the model cannot be loaded or input is invalid. |
TypeError
|
If input to |
RuntimeError
|
If model inference fails. |
Note
- Requires Python 3.10, 3.11, or 3.12
- Requires the
sentence-transformerspackage:pip install sentence-transformers - First run downloads the model (~100MB) from Hugging Face
- Cache location:
~/.cache/torch/sentence_transformers/ - No API keys or authentication required
- Sparse vectors have ~30k dimensions but only ~100-200 non-zero values
- Best combined with dense embeddings for hybrid retrieval
SPLADE vs Dense Embeddings:
- Dense: Continuous semantic vectors, good for semantic similarity
- Sparse: Lexical keyword-based, interpretable, good for exact matching
- Hybrid: Combine both for best retrieval performance
Examples:
>>> # Memory-efficient: both instances share the same model (~200MB)
>>> from zvec.extension import DefaultLocalSparseEmbedding
>>>
>>> # Query embedding
>>> query_emb = DefaultLocalSparseEmbedding(encoding_type="query")
>>> query_vec = query_emb.embed("machine learning algorithms")
>>> type(query_vec)
<class 'dict'>
>>> len(query_vec) # Only non-zero dimensions
156
>>> # Document embedding (shares model with query_emb)
>>> doc_emb = DefaultLocalSparseEmbedding(encoding_type="document")
>>> doc_vec = doc_emb.embed("Machine learning is a subset of AI")
>>> # Total memory: ~200MB (not 400MB) thanks to model caching
>>> # Asymmetric retrieval example
>>> query_vec = query_emb.embed("what causes aging fast")
>>> doc_vec = doc_emb.embed(
... "UV-A light causes tanning, skin aging, and cataracts..."
... )
>>>
>>> # Calculate similarity (dot product for sparse vectors)
>>> similarity = sum(
... query_vec.get(k, 0) * doc_vec.get(k, 0)
... for k in set(query_vec) | set(doc_vec)
... )
>>> # Batch processing
>>> queries = ["query 1", "query 2", "query 3"]
>>> query_vecs = [query_emb.embed(q) for q in queries]
>>>
>>> documents = ["doc 1", "doc 2", "doc 3"]
>>> doc_vecs = [doc_emb.embed(d) for d in documents]
>>> # Inspecting sparse dimensions (output is sorted by indices)
>>> query_vec = query_emb.embed("machine learning")
>>> list(query_vec.items())[:5] # First 5 dimensions (by index)
[(10, 0.45), (23, 0.87), (56, 0.32), (89, 1.12), (120, 0.65)]
>>>
>>> # Sort by weight to find most important terms
>>> sorted_by_weight = sorted(query_vec.items(), key=lambda x: x[1], reverse=True)
>>> top_5 = sorted_by_weight[:5] # Top 5 most important terms
>>> top_5
[(1023, 1.45), (245, 1.23), (8901, 0.98), (5678, 0.87), (12034, 0.76)]
>>> # Using GPU for faster inference
>>> sparse_emb = DefaultLocalSparseEmbedding(device="cuda")
>>> vector = sparse_emb.embed("natural language processing")
>>> # Hybrid retrieval example (combining dense + sparse)
>>> from zvec.extension import DefaultDenseEmbedding
>>> dense_emb = DefaultDenseEmbedding()
>>> sparse_emb = DefaultLocalSparseEmbedding()
>>>
>>> query = "deep learning neural networks"
>>> dense_vec = dense_emb.embed(query) # [0.1, -0.3, 0.5, ...]
>>> sparse_vec = sparse_emb.embed(query) # {12: 0.8, 45: 1.2, ...}
>>> # Error handling
>>> try:
... sparse_emb.embed("") # Empty string
... except ValueError as e:
... print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
>>> # Cache management
>>> # Check cache status
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 1
>>>
>>> # Clear cache to free memory
>>> DefaultLocalSparseEmbedding.clear_cache()
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 0
>>>
>>> # Remove specific model from cache
>>> query_emb = DefaultLocalSparseEmbedding() # Creates CPU model
>>> cuda_emb = DefaultLocalSparseEmbedding(device="cuda") # Creates CUDA model
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 2
>>>
>>> # Remove only CPU model
>>> removed = DefaultLocalSparseEmbedding.remove_from_cache(device=None)
>>> print(f"Removed: {removed}")
True
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 1
See Also
SparseEmbeddingFunction: Base class for sparse embeddingsDefaultDenseEmbedding: Dense embedding with all-MiniLM-L6-v2QwenDenseEmbedding: Alternative using Qwen API
References
- SPLADE Paper: https://arxiv.org/abs/2109.10086
- Model: https://huggingface.co/naver/splade-cocondenser-ensembledistil
Initialize with SPLADE model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Literal['huggingface', 'modelscope']
|
Model source. Defaults to "huggingface". |
'huggingface'
|
|
Optional[str]
|
Target device ("cpu", "cuda", "mps", or None). Defaults to None (automatic detection). |
None
|
|
Literal['query', 'document']
|
Encoding type for embeddings. - "query": Optimize for search queries (default) - "document": Optimize for indexed documents This distinction is important for asymmetric retrieval tasks. |
'query'
|
|
Additional parameters (reserved for future use). |
{}
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If sentence-transformers is not installed. |
ValueError
|
If model cannot be loaded. |
Note
Multiple instances with the same (model_source, device) configuration will share the same underlying model to save memory. Different instances can use different encoding_type settings while sharing the model.
Model Selection:
Uses naver/splade-cocondenser-ensembledistil instead of the newer
naver/splade-v3 because splade-v3 is a gated model requiring
Hugging Face authentication. The cocondenser-ensembledistil variant:
- Does not require authentication or API tokens
- Is immediately available for all users
- Provides comparable retrieval performance (~2% difference)
- Avoids "Access to model is restricted" errors
If you need splade-v3 and have obtained access, you can subclass this class and override the model_name parameter.
Examples:
>>> # Both instances share the same model (saves memory)
>>> query_emb = DefaultLocalSparseEmbedding(encoding_type="query")
>>> doc_emb = DefaultLocalSparseEmbedding(encoding_type="document")
>>> # Only one model is loaded in memory
Methods:
| Name | Description |
|---|---|
clear_cache |
Clear all cached SPLADE models from memory. |
get_cache_info |
Get information about currently cached models. |
remove_from_cache |
Remove a specific model from cache. |
__call__ |
Make the embedding function callable. |
embed |
Generate sparse embedding vector for the input text. |
Attributes
model_name
property
model_name: str
str: The Sentence Transformer model name currently in use.
model_source
property
model_source: str
str: The model source being used ("huggingface" or "modelscope").
device
property
device: str
str: The device the model is running on.
extra_params
property
extra_params: dict
dict: Extra parameters for model-specific customization.
Functions
clear_cache
classmethod
clear_cache() -> None
Clear all cached SPLADE models from memory.
This is useful for: - Freeing memory when models are no longer needed - Forcing a fresh model reload - Testing and debugging Examples: >>> # Clear cache to free memory >>> DefaultLocalSparseEmbedding.clear_cache()
>>> # Or in tests to ensure fresh model loading
>>> def test_something():
... DefaultLocalSparseEmbedding.clear_cache()
... emb = DefaultLocalSparseEmbedding()
... # Test with fresh model
get_cache_info
classmethod
get_cache_info() -> dict
Get information about currently cached models.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Dictionary with cache statistics: - cached_models (int): Number of cached model instances - cache_keys (list): List of cache keys (model_name, model_source, device) |
Examples:
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 2
>>> print(f"Cache keys: {info['cache_keys']}")
Cache keys: [('naver/splade-cocondenser-ensembledistil', 'huggingface', None),
('naver/splade-cocondenser-ensembledistil', 'huggingface', 'cuda')]
remove_from_cache
classmethod
remove_from_cache(model_source: str = 'huggingface', device: Optional[str] = None) -> bool
Remove a specific model from cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_source
|
str
|
Model source ("huggingface" or "modelscope"). Defaults to "huggingface". |
'huggingface'
|
device
|
Optional[str]
|
Device identifier. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if model was found and removed, False otherwise. |
Examples:
>>> # Remove CPU model from cache
>>> removed = DefaultLocalSparseEmbedding.remove_from_cache()
>>> print(f"Removed: {removed}")
True
>>> # Remove CUDA model from cache
>>> removed = DefaultLocalSparseEmbedding.remove_from_cache(device="cuda")
>>> print(f"Removed: {removed}")
True
__call__
__call__(input: str) -> SparseVectorType
Make the embedding function callable.
embed
embed(input: str) -> SparseVectorType
Generate sparse embedding vector for the input text.
This method uses the SPLADE model to convert input text into a sparse vector representation. The result is a dictionary where keys are dimension indices and values are importance weights (only non-zero values included).
The embedding is optimized based on the encoding_type specified during
initialization: "query" for search queries or "document" for indexed content.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
str
|
Input text string to embed. Must be non-empty after stripping whitespace. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SparseVectorType |
SparseVectorType
|
A dictionary mapping dimension index to weight.
Only non-zero dimensions are included. The dictionary is sorted
by indices (keys) in ascending order for consistent output.
Example: |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
ValueError
|
If input is empty or whitespace-only. |
RuntimeError
|
If model inference fails. |
Examples:
>>> # Query embedding
>>> query_emb = DefaultLocalSparseEmbedding(encoding_type="query")
>>> query_vec = query_emb.embed("machine learning")
>>> isinstance(query_vec, dict)
True
Note
- First call may be slower due to model loading
- Subsequent calls are much faster as the model stays in memory
- GPU acceleration provides significant speedup
- Sparse vectors are memory-efficient (only store non-zero values)
SentenceTransformerFunctionBase
SentenceTransformerFunctionBase(
model_name: str,
model_source: Literal["huggingface", "modelscope"] = "huggingface",
device: Optional[str] = None,
)
Base class for Sentence Transformer functions (both dense and sparse).
This base class provides common functionality for loading and managing sentence-transformers models from Hugging Face or ModelScope. It supports both dense models (e.g., all-MiniLM-L6-v2) and sparse models (e.g., SPLADE).
This class is not meant to be used directly. Use concrete implementations:
- SentenceTransformerEmbeddingFunction for dense embeddings
- SentenceTransformerSparseEmbeddingFunction for sparse embeddings
- DefaultDenseEmbedding for default dense embeddings
- DefaultSparseEmbedding for default sparse embeddings
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
Model identifier or local path. |
required |
|
Literal['huggingface', 'modelscope']
|
Model source. |
'huggingface'
|
|
Optional[str]
|
Device to run the model on. |
None
|
Note
- This is an internal base class for code reuse
- Subclasses should inherit from appropriate Protocol (Dense/Sparse)
- Provides model loading and management functionality
Initialize the base Sentence Transformer functionality.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
Model identifier or local path. |
required |
|
Literal['huggingface', 'modelscope']
|
Model source. |
'huggingface'
|
|
Optional[str]
|
Device to run the model on. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If model_source is invalid. |
Attributes:
| Name | Type | Description |
|---|---|---|
model_name |
str
|
str: The Sentence Transformer model name currently in use. |
model_source |
str
|
str: The model source being used ("huggingface" or "modelscope"). |
device |
str
|
str: The device the model is running on. |
Attributes
model_name
property
model_name: str
str: The Sentence Transformer model name currently in use.
model_source
property
model_source: str
str: The model source being used ("huggingface" or "modelscope").
device
property
device: str
str: The device the model is running on.
Functions
DefaultLocalReRanker
DefaultLocalReRanker(
query: Optional[str] = None,
topn: int = 10,
rerank_field: Optional[str] = None,
model_name: str = "cross-encoder/ms-marco-MiniLM-L6-v2",
model_source: Literal["huggingface", "modelscope"] = "huggingface",
device: Optional[str] = None,
batch_size: int = 32,
)
Bases: SentenceTransformerFunctionBase, RerankFunction
Re-ranker using Sentence Transformer cross-encoder models for semantic re-ranking.
This re-ranker leverages pre-trained cross-encoder models to perform deep semantic re-ranking of search results. It runs locally without API calls, supports GPU acceleration, and works with models from Hugging Face or ModelScope.
Cross-encoder models evaluate query-document pairs jointly, providing more accurate relevance scores than bi-encoder (embedding-based) similarity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
Query text for semantic re-ranking. Required. |
None
|
|
int
|
Maximum number of documents to return after re-ranking. Defaults to 10. |
10
|
|
Optional[str]
|
Document field name to use as re-ranking input text. Required (e.g., "content", "title", "body"). |
None
|
|
str
|
Cross-encoder model identifier or local path.
Defaults to |
'cross-encoder/ms-marco-MiniLM-L6-v2'
|
|
Literal['huggingface', 'modelscope']
|
Model source.
Defaults to |
'huggingface'
|
|
Optional[str]
|
Device to run the model on.
Options: |
None
|
|
int
|
Batch size for processing query-document pairs.
Larger values speed up processing but use more memory. Defaults to |
32
|
Attributes:
| Name | Type | Description |
|---|---|---|
query |
str
|
The query text used for re-ranking. |
topn |
int
|
Maximum number of documents to return. |
rerank_field |
Optional[str]
|
Field name used for re-ranking input. |
model_name |
str
|
The cross-encoder model being used. |
model_source |
str
|
The model source ("huggingface" or "modelscope"). |
device |
str
|
The device the model is running on. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
TypeError
|
If input types are invalid. |
RuntimeError
|
If model inference fails. |
Note
- Requires Python 3.10, 3.11, or 3.12
- Requires
sentence-transformerspackage:pip install sentence-transformers - For ModelScope support, also requires:
pip install modelscope - First run downloads the model (~80-560MB depending on model) from chosen source
- No API keys or network required after initial download
- Cross-encoders are slower than bi-encoders but more accurate
- GPU acceleration provides significant speedup (5-10x)
MS MARCO MiniLM-L6-v2 Model (Default):
The default model cross-encoder/ms-marco-MiniLM-L6-v2 is a lightweight and
efficient cross-encoder trained on MS MARCO dataset. It provides:
- Fast inference speed (suitable for real-time applications)
- Small model size (~80MB, quick to download)
- Good balance between speed and accuracy
- Trained on 500K+ query-document pairs
- Public availability without authentication
For users in China:
If you encounter Hugging Face access issues, use ModelScope instead:
.. code-block:: python
# Recommended for users in China
reranker = SentenceTransformerReRanker(
query="机器学习算法",
rerank_field="content",
model_source="modelscope"
)
Alternatively, use Hugging Face mirror:
.. code-block:: bash
export HF_ENDPOINT=https://hf-mirror.com
Examples:
>>> # Basic usage with default MS MARCO MiniLM model
>>> from zvec.extension import SentenceTransformerReRanker
>>>
>>> reranker = SentenceTransformerReRanker(
... query="machine learning algorithms",
... topn=5,
... rerank_field="content"
... )
>>>
>>> # Use in collection.query()
>>> results = collection.query(
... data={"vector_field": query_vector},
... reranker=reranker,
... topk=20
... )
>>> # Using ModelScope for users in China
>>> reranker = SentenceTransformerReRanker(
... query="深度学习",
... topn=10,
... rerank_field="content",
... model_source="modelscope"
... )
>>> # Using larger model for better quality
>>> reranker = SentenceTransformerReRanker(
... query="neural networks",
... topn=5,
... rerank_field="content",
... model_name="BAAI/bge-reranker-large",
... device="cuda",
... batch_size=64
... )
>>> # Direct rerank call (for testing)
>>> query_results = {
... "vector1": [
... Doc(id="1", score=0.9, fields={"content": "Machine learning is..."}),
... Doc(id="2", score=0.8, fields={"content": "Deep learning is..."}),
... ]
... }
>>> reranked = reranker.rerank(query_results)
>>> for doc in reranked:
... print(f"ID: {doc.id}, Score: {doc.score:.4f}")
ID: 2, Score: 0.9234
ID: 1, Score: 0.8567
See Also
RerankFunction: Abstract base class for re-rankersQwenReRanker: Re-ranker using Qwen APIRrfReRanker: Multi-vector re-ranker using RRFWeightedReRanker: Multi-vector re-ranker using weighted scores
References
- MS MARCO Cross-Encoder: https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2
- BGE Reranker: https://huggingface.co/BAAI/bge-reranker-base
- Cross-Encoder vs Bi-Encoder: https://www.sbert.net/examples/applications/cross-encoder/README.html
Initialize SentenceTransformerReRanker with query and configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Optional[str]
|
Query text for semantic matching. Required. |
None
|
|
int
|
Number of top results to return. |
10
|
|
Optional[str]
|
Document field for re-ranking input. |
None
|
|
str
|
Cross-encoder model identifier. |
'cross-encoder/ms-marco-MiniLM-L6-v2'
|
|
Literal['huggingface', 'modelscope']
|
Model source. |
'huggingface'
|
|
Optional[str]
|
Target device ("cpu", "cuda", "mps", or None). |
None
|
|
int
|
Batch size for processing query-document pairs. |
32
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If query is empty or model cannot be loaded. |
Methods:
| Name | Description |
|---|---|
rerank |
Re-rank documents using Sentence Transformer cross-encoder model. |
Attributes
topn
property
topn: int
int: Number of top documents to return after re-ranking.
rerank_field
property
rerank_field: Optional[str]
Optional[str]: Field name used as re-ranking input.
model_name
property
model_name: str
str: The Sentence Transformer model name currently in use.
model_source
property
model_source: str
str: The model source being used ("huggingface" or "modelscope").
device
property
device: str
str: The device the model is running on.
query
property
query: str
str: Query text used for semantic re-ranking.
batch_size
property
batch_size: int
int: Batch size for processing query-document pairs.
Functions
rerank
rerank(query_results: dict[str, list[Doc]]) -> list[Doc]
Re-rank documents using Sentence Transformer cross-encoder model.
Evaluates each query-document pair using the cross-encoder model to compute relevance scores. Documents are then sorted by these scores and the top-k results are returned.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_results
|
dict[str, list[Doc]]
|
Mapping from vector field names to lists of retrieved documents. Documents from all fields are deduplicated and re-ranked together. |
required |
Returns:
| Type | Description |
|---|---|
list[Doc]
|
list[Doc]: Re-ranked documents (up to |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no valid documents are found or model inference fails. |
Note
- Duplicate documents (same ID) across fields are processed once
- Documents with empty/missing
rerank_fieldcontent are skipped - Returned scores are logits from the cross-encoder model
- Higher scores indicate higher relevance
- Processing time is O(n) where n is the number of documents
Examples:
>>> reranker = SentenceTransformerReRanker(
... query="machine learning",
... topn=3,
... rerank_field="content"
... )
>>> query_results = {
... "vector1": [
... Doc(id="1", score=0.9, fields={"content": "ML basics"}),
... Doc(id="2", score=0.8, fields={"content": "DL tutorial"}),
... ]
... }
>>> reranked = reranker.rerank(query_results)
>>> len(reranked) <= 3
True