Skip to content

Extension

zvec.extension

Modules:

Name Description
bm25_embedding_function
embedding_function
multi_vector_reranker
openai_embedding_function
openai_function
qwen_embedding_function
qwen_function
qwen_rerank_function
rerank_function
sentence_transformer_embedding_function
sentence_transformer_function
sentence_transformer_rerank_function

Classes:

Name Description
BM25EmbeddingFunction

BM25-based sparse embedding function using DashText SDK.

DenseEmbeddingFunction

Protocol for dense vector embedding functions.

SparseEmbeddingFunction

Abstract base class for sparse vector embedding functions.

RrfReRanker

Re-ranker using Reciprocal Rank Fusion (RRF) for multi-vector search.

WeightedReRanker

Re-ranker that combines scores from multiple vector fields using weights.

OpenAIDenseEmbedding

Dense text embedding function using OpenAI API.

OpenAIFunctionBase

Base class for OpenAI functions.

QwenDenseEmbedding

Dense text embedding function using Qwen (DashScope) API.

QwenSparseEmbedding

Sparse text embedding function using Qwen (DashScope) API.

QwenFunctionBase

Base class for Qwen (DashScope) functions.

QwenReRanker

Re-ranker using Qwen (DashScope) cross-encoder API for semantic re-ranking.

ReRanker

Abstract base class for re-ranking search results.

DefaultLocalDenseEmbedding

Default local dense embedding using all-MiniLM-L6-v2 model.

DefaultLocalSparseEmbedding

Default local sparse embedding using SPLADE model.

SentenceTransformerFunctionBase

Base class for Sentence Transformer functions (both dense and sparse).

DefaultLocalReRanker

Re-ranker using Sentence Transformer cross-encoder models for semantic re-ranking.

Classes

BM25EmbeddingFunction

BM25EmbeddingFunction(
    corpus: Optional[list[str]] = None,
    encoding_type: Literal["query", "document"] = "query",
    language: Literal["zh", "en"] = "zh",
    b: float = 0.75,
    k1: float = 1.2,
    **kwargs
)

Bases: SparseEmbeddingFunction[TEXT]

BM25-based sparse embedding function using DashText SDK.

This class provides text-to-sparse-vector embedding capabilities using the DashText library with BM25 algorithm. BM25 (Best Matching 25) is a probabilistic retrieval function used for lexical search and document ranking based on term frequency and inverse document frequency.

BM25 generates sparse vectors where each dimension corresponds to a term in the vocabulary, and the value represents the BM25 score for that term. It's particularly effective for:

  • Lexical search and keyword matching
  • Document ranking and information retrieval
  • Combining with dense embeddings for hybrid search
  • Traditional IR tasks where exact term matching is important

This implementation uses DashText's SparseVectorEncoder, which provides efficient BM25 computation for Chinese and English text using either a built-in encoder or custom corpus training.

Parameters:

Name Type Description Default
corpus
Optional[list[str]]

List of documents to train the BM25 encoder. If provided, creates a custom encoder trained on this corpus for better domain-specific accuracy. If None, uses the built-in encoder. Defaults to None.

None
encoding_type
Literal['query', 'document']

Encoding mode for text processing. Use "query" for search queries (default) and "document" for document indexing. This distinction optimizes the BM25 scoring for asymmetric retrieval tasks. Defaults to "query".

'query'
language
Literal['zh', 'en']

Language for built-in encoder. Only used when corpus is None. "zh" for Chinese (trained on Chinese Wikipedia), "en" for English. Defaults to "zh".

'zh'
b
float

Document length normalization parameter for BM25. Range [0, 1]. 0 means no normalization, 1 means full normalization. Only used with custom corpus. Defaults to 0.75.

0.75
k1
float

Term frequency saturation parameter for BM25. Higher values give more weight to term frequency. Only used with custom corpus. Defaults to 1.2.

1.2
**kwargs

Additional parameters for DashText encoder customization.

{}

Attributes:

Name Type Description
corpus_size int

Number of documents in the training corpus (0 if using built-in encoder).

encoding_type str

The encoding type being used ("query" or "document").

language str

The language of the built-in encoder ("zh" or "en").

Raises:

Type Description
ValueError

If corpus is provided but empty or contains non-string elements.

TypeError

If input to embed() is not a string.

RuntimeError

If DashText encoder initialization or training fails.

Note
  • Requires Python 3.10, 3.11, or 3.12
  • Requires the dashtext package: pip install dashtext
  • Two encoder options available:

  • Built-in encoder (no corpus needed): Pre-trained models for Chinese (zh) and English (en), good generalization, works out-of-the-box

  • Custom encoder (corpus required): Better accuracy for domain-specific terminology, requires training on your full corpus with BM25 parameters

  • Encoding types:

  • encoding_type="query": Optimized for search queries (shorter text)

  • encoding_type="document": Optimized for document indexing (longer text)

  • BM25 parameters (b, k1) only apply to custom encoder training

  • Output is sorted by indices (vocabulary term IDs) for consistency
  • Results are cached (LRU cache, maxsize=10) to reduce computation
  • No API key or network connectivity required (local computation)

Examples:

>>> # Option 1: Using built-in encoder for Chinese (no corpus needed)
>>> from zvec.extension import BM25EmbeddingFunction
>>>
>>> # For query encoding (Chinese)
>>> bm25_query_zh = BM25EmbeddingFunction(language="zh", encoding_type="query")
>>> query_vec = bm25_query_zh.embed("什么是机器学习")
>>> isinstance(query_vec, dict)
True
>>> # query_vec: {1169440797: 0.29, 2045788977: 0.70, ...}
>>> # For document encoding (Chinese)
>>> bm25_doc_zh = BM25EmbeddingFunction(language="zh", encoding_type="document")
>>> doc_vec = bm25_doc_zh.embed("机器学习是人工智能的一个重要分支...")
>>> isinstance(doc_vec, dict)
True
>>> # Using built-in encoder for English
>>> bm25_query_en = BM25EmbeddingFunction(language="en", encoding_type="query")
>>> query_vec_en = bm25_query_en.embed("what is vector search service")
>>> isinstance(query_vec_en, dict)
True
>>> # Option 2: Using custom corpus for domain-specific accuracy
>>> corpus = [
...     "机器学习是人工智能的一个重要分支",
...     "深度学习使用多层神经网络进行特征提取",
...     "自然语言处理技术用于理解和生成人类语言"
... ]
>>> bm25_custom = BM25EmbeddingFunction(
...     corpus=corpus,
...     encoding_type="query",
...     b=0.75,
...     k1=1.2
... )
>>> custom_vec = bm25_custom.embed("机器学习算法")
>>> isinstance(custom_vec, dict)
True
>>> # Hybrid search: combining with dense embeddings
>>> from zvec.extension import DefaultLocalDenseEmbedding
>>> dense_emb = DefaultLocalDenseEmbedding()
>>> bm25_emb = BM25EmbeddingFunction(language="zh", encoding_type="query")
>>>
>>> query = "machine learning algorithms"
>>> dense_vec = dense_emb.embed(query)  # Semantic similarity
>>> sparse_vec = bm25_emb.embed(query)  # Lexical matching
>>> # Combine scores for hybrid retrieval
>>> # Callable interface
>>> sparse_vec = bm25_query_zh("information retrieval")
>>> isinstance(sparse_vec, dict)
True
>>> # Error handling
>>> try:
...     bm25_query_zh.embed("")  # Empty query
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
See Also
  • SparseEmbeddingFunction: Base class for sparse embeddings
  • DefaultLocalSparseEmbedding: SPLADE-based sparse embedding
  • QwenSparseEmbedding: API-based sparse embedding using Qwen
  • DefaultLocalDenseEmbedding: Dense embedding for semantic search
References
  • DashText Documentation: https://help.aliyun.com/zh/document_detail/2546039.html
  • DashText PyPI: https://pypi.org/project/dashtext/
  • BM25 Algorithm: Robertson & Zaragoza (2009)

Initialize the BM25 embedding function.

Parameters:

Name Type Description Default
corpus
Optional[list[str]]

Optional corpus for training custom encoder. If None, uses built-in encoder. Defaults to None.

None
encoding_type
Literal['query', 'document']

Text encoding mode. Use "query" for search queries, "document" for indexing. Defaults to "query".

'query'
language
Literal['zh', 'en']

Language for built-in encoder. "zh" for Chinese, "en" for English. Defaults to "zh".

'zh'
b
float

Document length normalization for BM25 [0, 1]. Only used with custom corpus. Defaults to 0.75.

0.75
k1
float

Term frequency saturation for BM25. Only used with custom corpus. Defaults to 1.2.

1.2
**kwargs

Additional DashText encoder parameters.

{}

Raises:

Type Description
ValueError

If corpus is provided but empty or invalid.

ImportError

If dashtext package is not installed.

RuntimeError

If encoder initialization or training fails.

Methods:

Name Description
__call__

Make the embedding function callable.

embed

Generate BM25 sparse embedding for the input text.

Attributes
corpus_size property
corpus_size: int

int: Number of documents in the training corpus (0 if using built-in encoder).

encoding_type property
encoding_type: str

str: The encoding type being used ("query" or "document").

language property
language: str

str: The language of the built-in encoder ("zh" or "en").

extra_params property
extra_params: dict

dict: Extra parameters for DashText encoder customization.

Functions
__call__
__call__(input: TEXT) -> SparseVectorType

Make the embedding function callable.

Parameters:

Name Type Description Default
input TEXT

Input text to embed.

required

Returns:

Name Type Description
SparseVectorType SparseVectorType

Sparse vector as dictionary.

embed cached
embed(input: TEXT) -> SparseVectorType

Generate BM25 sparse embedding for the input text.

This method computes BM25 scores for the input text using DashText's SparseVectorEncoder. The encoding behavior depends on the encoding_type:

  • encoding_type="query": Uses encode_queries() for search queries
  • encoding_type="document": Uses encode_documents() for documents

The result is a sparse vector where keys are term indices in the vocabulary and values are BM25 scores.

Parameters:

Name Type Description Default
input TEXT

Input text string to embed. Must be non-empty after stripping whitespace.

required

Returns:

Name Type Description
SparseVectorType SparseVectorType

A dictionary mapping vocabulary term index to BM25 score. Only non-zero scores are included. The dictionary is sorted by indices (keys) in ascending order for consistent output. Example: {1169440797: 0.29, 2045788977: 0.70, ...}

Raises:

Type Description
TypeError

If input is not a string.

ValueError

If input is empty or whitespace-only.

RuntimeError

If BM25 encoding fails.

Examples:

>>> bm25 = BM25EmbeddingFunction(language="zh", encoding_type="query")
>>> sparse_vec = bm25.embed("query text")
>>> isinstance(sparse_vec, dict)
True
>>> all(isinstance(k, int) and isinstance(v, float) for k, v in sparse_vec.items())
True
>>> # Verify sorted output
>>> keys = list(sparse_vec.keys())
>>> keys == sorted(keys)
True
>>> # Error: empty input
>>> bm25.embed("   ")
ValueError: Input text cannot be empty or whitespace only
>>> # Error: non-string input
>>> bm25.embed(123)
TypeError: Expected 'input' to be str, got int
Note
  • BM25 scores are relative to the vocabulary statistics
  • Output dictionary is always sorted by indices for consistency
  • Terms not in the vocabulary will have zero scores (not included)
  • This method is cached (maxsize=10) for performance
  • DashText automatically handles Chinese/English text segmentation

DenseEmbeddingFunction

Bases: Protocol[MD]

Protocol for dense vector embedding functions.

Dense embedding functions map multimodal input (text, image, or audio) to fixed-length real-valued vectors. This is a Protocol class that defines the interface - implementations should provide their own initialization and properties.

Class Type Parameters:

Name Bound or Constraints Description Default
MD

The type of input data (bound to Embeddable: TEXT, IMAGE, or AUDIO).

required
Note
  • This is a Protocol class - it only defines the embed() interface.
  • Implementations are free to define their own __init__, properties, and additional methods as needed.
  • The embed() method is the only required interface.

Examples:

>>> # Custom text embedding implementation
>>> class MyTextEmbedding:
...     def __init__(self, dimension: int, model_name: str):
...         self.dimension = dimension
...         self.model = load_model(model_name)
...
...     def embed(self, input: str) -> list[float]:
...         return self.model.encode(input).tolist()
>>> # Custom image embedding implementation
>>> class MyImageEmbedding:
...     def __init__(self, dimension: int = 512):
...         self.dimension = dimension
...         self.model = load_image_model()
...
...     def embed(self, input: Union[str, bytes, np.ndarray]) -> list[float]:
...         if isinstance(input, str):
...             image = load_image_from_path(input)
...         else:
...             image = input
...         return self.model.extract_features(image).tolist()
>>> # Using built-in implementations
>>> from zvec.extension import QwenDenseEmbedding
>>> text_emb = QwenDenseEmbedding(dimension=768, api_key="sk-xxx")
>>> vector = text_emb.embed("Hello world")

Methods:

Name Description
embed

Generate a dense embedding vector for the input data.

Functions
embed abstractmethod
embed(input: MD) -> DenseVectorType

Generate a dense embedding vector for the input data.

Parameters:

Name Type Description Default
input MD

Multimodal input data to embed. Can be: - TEXT (str): Text string - IMAGE (str | bytes | np.ndarray): Image file path, raw bytes, or array - AUDIO (str | bytes | np.ndarray): Audio file path, raw bytes, or array

required

Returns:

Name Type Description
DenseVectorType DenseVectorType

A dense vector representing the embedding. Can be list[float], list[int], or np.ndarray. Length should match the implementation's dimension.

SparseEmbeddingFunction

Bases: Protocol[MD]

Abstract base class for sparse vector embedding functions.

Sparse embedding functions map multimodal input (text, image, or audio) to a dictionary of {index: weight}, where only non-zero dimensions are stored. You can inherit this class to create custom sparse embedding functions.

Class Type Parameters:

Name Bound or Constraints Description Default
MD

The type of input data (bound to Embeddable: TEXT, IMAGE, or AUDIO).

required
Note

Subclasses must implement the embed() method.

Examples:

>>> # Using built-in text sparse embedding (e.g., BM25, TF-IDF)
>>> sparse_emb = SomeSparseEmbedding()
>>> vector = sparse_emb.embed("Hello world")
>>> # Returns: {0: 0.5, 42: 1.2, 100: 0.8}
>>> # Custom BM25 sparse embedding function
>>> class MyBM25Embedding(SparseEmbeddingFunction):
...     def __init__(self, vocab_size: int = 10000):
...         self.vocab_size = vocab_size
...         self.tokenizer = MyTokenizer()
...
...     def embed(self, input: str) -> dict[int, float]:
...         tokens = self.tokenizer.tokenize(input)
...         sparse_vector = {}
...         for token_id, weight in self._calculate_bm25(tokens):
...             if weight > 0:
...                 sparse_vector[token_id] = weight
...         return sparse_vector
...
...     def _calculate_bm25(self, tokens):
...         # BM25 calculation logic
...         pass
>>> # Custom sparse image feature extractor
>>> class MySparseImageEmbedding(SparseEmbeddingFunction):
...     def embed(self, input: Union[str, bytes, np.ndarray]) -> dict[int, float]:
...         image = self._load_image(input)
...         features = self._extract_sparse_features(image)
...         return {idx: val for idx, val in enumerate(features) if val != 0}

Methods:

Name Description
embed

Generate a sparse embedding for the input data.

Functions
embed abstractmethod
embed(input: MD) -> SparseVectorType

Generate a sparse embedding for the input data.

Parameters:

Name Type Description Default
input MD

Multimodal input data to embed. Can be: - TEXT (str): Text string - IMAGE (str | bytes | np.ndarray): Image file path, raw bytes, or array - AUDIO (str | bytes | np.ndarray): Audio file path, raw bytes, or array

required

Returns:

Name Type Description
SparseVectorType SparseVectorType

Mapping from dimension index to non-zero weight. Only dimensions with non-zero values are included.

RrfReRanker

RrfReRanker(topn: int = 10, rerank_field: Optional[str] = None, rank_constant: int = 60)

Bases: RerankFunction

Re-ranker using Reciprocal Rank Fusion (RRF) for multi-vector search.

RRF combines results from multiple vector queries without requiring relevance scores. It assigns higher weight to documents that appear early in multiple result lists.

The RRF score for a document at rank r is: 1 / (k + r + 1), where k is the rank constant.

Note

This re-ranker is specifically designed for multi-vector scenarios where query results from multiple vector fields need to be combined.

Parameters:

Name Type Description Default
topn
int

Number of top documents to return. Defaults to 10.

10
rerank_field
Optional[str]

Ignored by RRF. Defaults to None.

None
rank_constant
int

Smoothing constant k in RRF formula. Larger values reduce the impact of early ranks. Defaults to 60.

60

Methods:

Name Description
rerank

Apply Reciprocal Rank Fusion to combine multiple query results.

Attributes:

Name Type Description
topn int

int: Number of top documents to return after re-ranking.

rerank_field Optional[str]

Optional[str]: Field name used as re-ranking input.

Attributes
topn property
topn: int

int: Number of top documents to return after re-ranking.

rerank_field property
rerank_field: Optional[str]

Optional[str]: Field name used as re-ranking input.

Functions
rerank
rerank(query_results: dict[str, list[Doc]]) -> list[Doc]

Apply Reciprocal Rank Fusion to combine multiple query results.

Parameters:

Name Type Description Default
query_results dict[str, list[Doc]]

Results from one or more vector queries.

required

Returns:

Type Description
list[Doc]

list[Doc]: Re-ranked documents with RRF scores in the score field.

WeightedReRanker

WeightedReRanker(
    topn: int = 10,
    rerank_field: Optional[str] = None,
    metric: MetricType = L2,
    weights: Optional[dict[str, float]] = None,
)

Bases: RerankFunction

Re-ranker that combines scores from multiple vector fields using weights.

Each vector field's relevance score is normalized based on its metric type, then scaled by a user-provided weight. Final scores are summed across fields.

Note

This re-ranker is specifically designed for multi-vector scenarios where query results from multiple vector fields need to be combined with configurable weights.

Parameters:

Name Type Description Default
topn
int

Number of top documents to return. Defaults to 10.

10
rerank_field
Optional[str]

Ignored. Defaults to None.

None
metric
MetricType

Distance metric used for score normalization. Defaults to MetricType.L2.

L2
weights
Optional[dict[str, float]]

Weight per vector field. Fields not listed use weight 1.0. Defaults to None.

None
Note

Supported metrics: L2, IP, COSINE. Scores are normalized to [0, 1].

Methods:

Name Description
rerank

Combine scores from multiple vector fields using weighted sum.

Attributes:

Name Type Description
topn int

int: Number of top documents to return after re-ranking.

rerank_field Optional[str]

Optional[str]: Field name used as re-ranking input.

weights dict[str, float]

dict[str, float]: Weight mapping for vector fields.

metric MetricType

MetricType: Distance metric used for score normalization.

Attributes
topn property
topn: int

int: Number of top documents to return after re-ranking.

rerank_field property
rerank_field: Optional[str]

Optional[str]: Field name used as re-ranking input.

weights property
weights: dict[str, float]

dict[str, float]: Weight mapping for vector fields.

metric property
metric: MetricType

MetricType: Distance metric used for score normalization.

Functions
rerank
rerank(query_results: dict[str, list[Doc]]) -> list[Doc]

Combine scores from multiple vector fields using weighted sum.

Parameters:

Name Type Description Default
query_results dict[str, list[Doc]]

Results per vector field.

required

Returns:

Type Description
list[Doc]

list[Doc]: Re-ranked documents with combined scores in score field.

OpenAIDenseEmbedding

OpenAIDenseEmbedding(
    model: str = "text-embedding-3-small",
    dimension: Optional[int] = None,
    api_key: Optional[str] = None,
    base_url: Optional[str] = None,
    **kwargs
)

Bases: OpenAIFunctionBase, DenseEmbeddingFunction[TEXT]

Dense text embedding function using OpenAI API.

This class provides text-to-vector embedding capabilities using OpenAI's embedding models. It inherits from DenseEmbeddingFunction and implements dense text embedding via the OpenAI API.

The implementation supports various OpenAI embedding models with different dimensions and includes automatic result caching for improved performance.

Parameters:

Name Type Description Default
model
str

OpenAI embedding model identifier. Defaults to "text-embedding-3-small". Common options: - "text-embedding-3-small": 1536 dims, cost-efficient, good performance - "text-embedding-3-large": 3072 dims, highest quality - "text-embedding-ada-002": 1536 dims, legacy model

'text-embedding-3-small'
dimension
Optional[int]

Desired output embedding dimension. If None, uses model's default dimension. For text-embedding-3 models, you can specify custom dimensions (e.g., 256, 512, 1024, 1536). Defaults to None.

None
api_key
Optional[str]

OpenAI API authentication key. If None, reads from OPENAI_API_KEY environment variable. Obtain your key from: https://platform.openai.com/api-keys

None
base_url
Optional[str]

Custom API base URL for OpenAI-compatible services. Defaults to None (uses official OpenAI endpoint).

None

Attributes:

Name Type Description
dimension int

The embedding vector dimension.

data_type DataType

Always DataType.VECTOR_FP32 for this implementation.

model str

The OpenAI model name being used.

Raises:

Type Description
ValueError

If API key is not provided and not found in environment, or if API returns an error response.

TypeError

If input to embed() is not a string.

RuntimeError

If network error or OpenAI service error occurs.

Note
  • Requires Python 3.10, 3.11, or 3.12
  • Requires the openai package: pip install openai
  • Embedding results are cached (LRU cache, maxsize=10) to reduce API calls
  • Network connectivity to OpenAI API endpoints is required
  • API usage incurs costs based on your OpenAI subscription plan
  • Rate limits apply based on your OpenAI account tier

Examples:

>>> # Basic usage with default model
>>> from zvec.extension import OpenAIDenseEmbedding
>>> import os
>>> os.environ["OPENAI_API_KEY"] = "sk-..."
>>>
>>> emb_func = OpenAIDenseEmbedding()
>>> vector = emb_func.embed("Hello, world!")
>>> len(vector)
1536
>>> # Using specific model with custom dimension
>>> emb_func = OpenAIDenseEmbedding(
...     model="text-embedding-3-large",
...     dimension=1024,
...     api_key="sk-..."
... )
>>> vector = emb_func.embed("Machine learning is fascinating")
>>> len(vector)
1024
>>> # Using with custom base URL (e.g., Azure OpenAI)
>>> emb_func = OpenAIDenseEmbedding(
...     model="text-embedding-ada-002",
...     api_key="your-azure-key",
...     base_url="https://your-resource.openai.azure.com/"
... )
>>> vector = emb_func("Natural language processing")
>>> isinstance(vector, list)
True
>>> # Batch processing with caching benefit
>>> texts = ["First text", "Second text", "First text"]
>>> vectors = [emb_func.embed(text) for text in texts]
>>> # Third call uses cached result for "First text"
>>> # Error handling
>>> try:
...     emb_func.embed("")  # Empty string
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
See Also
  • DenseEmbeddingFunction: Base class for dense embeddings
  • QwenDenseEmbedding: Alternative using Qwen/DashScope API
  • DefaultDenseEmbedding: Local model without API calls
  • SparseEmbeddingFunction: Base class for sparse embeddings

Initialize the OpenAI dense embedding function.

Parameters:

Name Type Description Default
model
str

OpenAI model name. Defaults to "text-embedding-3-small".

'text-embedding-3-small'
dimension
Optional[int]

Target embedding dimension or None for default.

None
api_key
Optional[str]

API key or None to use environment variable.

None
base_url
Optional[str]

Custom API base URL or None for default.

None
**kwargs

Additional parameters for API calls. Examples: - encoding_format (str): Format of embeddings, "float" or "base64". - user (str): User identifier for tracking.

{}

Raises:

Type Description
ValueError

If API key is not provided and not in environment.

Methods:

Name Description
__call__

Make the embedding function callable.

embed

Generate dense embedding vector for the input text.

Attributes
model property
model: str

str: The OpenAI model name currently in use.

dimension property
dimension: int

int: The expected dimensionality of the embedding vector.

extra_params property
extra_params: dict

dict: Extra parameters for model-specific customization.

Functions
__call__
__call__(input: TEXT) -> DenseVectorType

Make the embedding function callable.

embed cached
embed(input: TEXT) -> DenseVectorType

Generate dense embedding vector for the input text.

This method calls the OpenAI Embeddings API to convert input text into a dense vector representation. Results are cached to improve performance for repeated inputs.

Parameters:

Name Type Description Default
input TEXT

Input text string to embed. Must be non-empty after stripping whitespace. Maximum length is 8191 tokens for most models.

required

Returns:

Name Type Description
DenseVectorType DenseVectorType

A list of floats representing the embedding vector. Length equals self.dimension. Example: [0.123, -0.456, 0.789, ...]

Raises:

Type Description
TypeError

If input is not a string.

ValueError

If input is empty/whitespace-only, or if the API returns an error or malformed response.

RuntimeError

If network connectivity issues or OpenAI service errors occur.

Examples:

>>> emb = OpenAIDenseEmbedding()
>>> vector = emb.embed("Natural language processing")
>>> len(vector)
1536
>>> isinstance(vector[0], float)
True
>>> # Error: empty input
>>> emb.embed("   ")
ValueError: Input text cannot be empty or whitespace only
>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int
Note
  • This method is cached (maxsize=10). Identical inputs return cached results.
  • The cache is based on exact string match (case-sensitive).
  • Consider pre-processing text (lowercasing, normalization) for better caching.

OpenAIFunctionBase

OpenAIFunctionBase(model: str, api_key: Optional[str] = None, base_url: Optional[str] = None)

Base class for OpenAI functions.

This base class provides common functionality for calling OpenAI APIs and handling responses. It supports embeddings (dense) operations.

This class is not meant to be used directly. Use concrete implementations: - OpenAIDenseEmbedding for dense embeddings

Parameters:

Name Type Description Default
model
str

OpenAI model identifier.

required
api_key
Optional[str]

OpenAI API authentication key.

None
base_url
Optional[str]

Custom API base URL.

None
Note
  • This is an internal base class for code reuse across OpenAI features
  • Subclasses should inherit from appropriate Protocol
  • Provides unified API connection and response handling

Initialize the base OpenAI functionality.

Parameters:

Name Type Description Default
model
str

OpenAI model name.

required
api_key
Optional[str]

API key or None to use environment variable.

None
base_url
Optional[str]

Custom API base URL or None for default.

None

Raises:

Type Description
ValueError

If API key is not provided and not in environment.

Attributes:

Name Type Description
model str

str: The OpenAI model name currently in use.

Attributes
model property
model: str

str: The OpenAI model name currently in use.

Functions

QwenDenseEmbedding

QwenDenseEmbedding(
    dimension: int, model: str = "text-embedding-v4", api_key: Optional[str] = None, **kwargs
)

Bases: QwenFunctionBase, DenseEmbeddingFunction[TEXT]

Dense text embedding function using Qwen (DashScope) API.

This class provides text-to-vector embedding capabilities using Alibaba Cloud's DashScope service and Qwen embedding models. It inherits from DenseEmbeddingFunction and implements dense text embedding.

The implementation supports various Qwen embedding models with configurable dimensions and includes automatic result caching for improved performance.

Parameters:

Name Type Description Default
dimension
int

Desired output embedding dimension. Common values: - 512: Balanced performance and accuracy - 1024: Higher accuracy, larger storage - 1536: Maximum accuracy for supported models

required
model
str

DashScope embedding model identifier. Defaults to "text-embedding-v4". Other options include: - "text-embedding-v3" - "text-embedding-v2" - "text-embedding-v1"

'text-embedding-v4'
api_key
Optional[str]

DashScope API authentication key. If None, reads from DASHSCOPE_API_KEY environment variable. Obtain your key from: https://dashscope.console.aliyun.com/

None
**kwargs

Additional DashScope API parameters. Supported options: - text_type (str): Specifies the text role in retrieval tasks. Options: "query" (search query) or "document" (indexed content). This parameter optimizes embeddings for asymmetric search scenarios.

Reference: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api

{}

Attributes:

Name Type Description
dimension int

The embedding vector dimension.

data_type DataType

Always DataType.VECTOR_FP32 for this implementation.

model str

The DashScope model name being used.

Raises:

Type Description
ValueError

If API key is not provided and not found in environment, or if API returns an error response.

TypeError

If input to embed() is not a string.

RuntimeError

If network error or DashScope service error occurs.

Note
  • Requires Python 3.10, 3.11, or 3.12
  • Requires the dashscope package: pip install dashscope
  • Embedding results are cached (LRU cache, maxsize=10) to reduce API calls
  • Network connectivity to DashScope API endpoints is required
  • API usage may incur costs based on your DashScope subscription plan

Parameter Guidelines:

  • Use text_type="query" for search queries and text_type="document" for indexed content to optimize asymmetric retrieval tasks.
  • For detailed API specifications and parameter usage, refer to: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api

Examples:

>>> # Basic usage with default model
>>> from zvec.extension import QwenDenseEmbedding
>>> import os
>>> os.environ["DASHSCOPE_API_KEY"] = "your-api-key"
>>>
>>> emb_func = QwenDenseEmbedding(dimension=1024)
>>> vector = emb_func.embed("Hello, world!")
>>> len(vector)
1024
>>> # Using specific model with explicit API key
>>> emb_func = QwenDenseEmbedding(
...     dimension=512,
...     model="text-embedding-v3",
...     api_key="sk-xxxxx"
... )
>>> vector = emb_func("Machine learning is fascinating")
>>> isinstance(vector, list)
True
>>> # Using with custom parameters (text_type)
>>> # For search queries - optimize for query-document matching
>>> emb_func = QwenDenseEmbedding(
...     dimension=1024,
...     text_type="query"
... )
>>> query_vector = emb_func.embed("What is machine learning?")
>>>
>>> # For document embeddings - optimize for being matched by queries
>>> doc_emb_func = QwenDenseEmbedding(
...     dimension=1024,
...     text_type="document"
... )
>>> doc_vector = doc_emb_func.embed(
...     "Machine learning is a subset of artificial intelligence..."
... )
>>> # Batch processing with caching benefit
>>> texts = ["First text", "Second text", "First text"]
>>> vectors = [emb_func.embed(text) for text in texts]
>>> # Third call uses cached result for "First text"
>>> # Error handling
>>> try:
...     emb_func.embed("")  # Empty string
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
See Also
  • DenseEmbeddingFunction: Base class for dense embeddings
  • SparseEmbeddingFunction: Base class for sparse embeddings

Initialize the Qwen dense embedding function.

Parameters:

Name Type Description Default
dimension
int

Target embedding dimension.

required
model
str

DashScope model name. Defaults to "text-embedding-v4".

'text-embedding-v4'
api_key
Optional[str]

API key or None to use environment variable.

None
**kwargs

Additional DashScope API parameters. Supported options: - text_type (str): Text role in asymmetric retrieval. * "query": Optimize for search queries (short, question-like). * "document": Optimize for indexed documents (longer content). Using appropriate text_type improves retrieval accuracy by optimizing the embedding space for query-document matching.

For detailed API documentation, see: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api

{}

Raises:

Type Description
ValueError

If API key is not provided and not in environment.

Methods:

Name Description
__call__

Make the embedding function callable.

embed

Generate dense embedding vector for the input text.

Attributes
model property
model: str

str: The DashScope embedding model name currently in use.

dimension property
dimension: int

int: The expected dimensionality of the embedding vector.

extra_params property
extra_params: dict

dict: Extra parameters for model-specific customization.

Functions
__call__
__call__(input: TEXT) -> DenseVectorType

Make the embedding function callable.

embed cached
embed(input: TEXT) -> DenseVectorType

Generate dense embedding vector for the input text.

This method calls the DashScope TextEmbedding API to convert input text into a dense vector representation. Results are cached to improve performance for repeated inputs.

Parameters:

Name Type Description Default
input TEXT

Input text string to embed. Must be non-empty after stripping whitespace. Maximum length depends on the model used (typically 2048-8192 tokens).

required

Returns:

Name Type Description
DenseVectorType DenseVectorType

A list of floats representing the embedding vector. Length equals self.dimension. Example: [0.123, -0.456, 0.789, ...]

Raises:

Type Description
TypeError

If input is not a string.

ValueError

If input is empty/whitespace-only, or if the API returns an error or malformed response.

RuntimeError

If network connectivity issues or DashScope service errors occur.

Examples:

>>> emb = QwenDenseEmbedding(dimension=1024)
>>> vector = emb.embed("Natural language processing")
>>> len(vector)
1024
>>> isinstance(vector[0], float)
True
>>> # Error: empty input
>>> emb.embed("   ")
ValueError: Input text cannot be empty or whitespace only
>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int
Note
  • This method is cached (maxsize=10). Identical inputs return cached results.
  • The cache is based on exact string match (case-sensitive).
  • Consider pre-processing text (lowercasing, normalization) for better caching.

QwenSparseEmbedding

QwenSparseEmbedding(
    dimension: int, model: str = "text-embedding-v4", api_key: Optional[str] = None, **kwargs
)

Bases: QwenFunctionBase, SparseEmbeddingFunction[TEXT]

Sparse text embedding function using Qwen (DashScope) API.

This class provides text-to-sparse-vector embedding capabilities using Alibaba Cloud's DashScope service and Qwen embedding models. It generates sparse keyword-weighted vectors suitable for lexical matching and BM25-style retrieval scenarios.

Sparse embeddings are particularly useful for: - Keyword-based search and exact matching - Hybrid retrieval (combining with dense embeddings) - Interpretable search results (weights show term importance)

Parameters:

Name Type Description Default
dimension
int

Desired output embedding dimension. Common values: - 512: Balanced performance and accuracy - 1024: Higher accuracy, larger storage - 1536: Maximum accuracy for supported models

required
model
str

DashScope embedding model identifier. Defaults to "text-embedding-v4". Other options include: - "text-embedding-v3" - "text-embedding-v2"

'text-embedding-v4'
api_key
Optional[str]

DashScope API authentication key. If None, reads from DASHSCOPE_API_KEY environment variable. Obtain your key from: https://dashscope.console.aliyun.com/

None
**kwargs

Additional DashScope API parameters. Supported options: - encoding_type (Literal["query", "document"]): Encoding type. * "query": Optimize for search queries (default). * "document": Optimize for indexed documents. This distinction is important for asymmetric retrieval tasks.

{}

Attributes:

Name Type Description
model str

The DashScope model name being used.

encoding_type str

The encoding type ("query" or "document").

Raises:

Type Description
ValueError

If API key is not provided and not found in environment, or if API returns an error response.

TypeError

If input to embed() is not a string.

RuntimeError

If network error or DashScope service error occurs.

Note
  • Requires Python 3.10, 3.11, or 3.12
  • Requires the dashscope package: pip install dashscope
  • Embedding results are cached (LRU cache, maxsize=10) to reduce API calls
  • Network connectivity to DashScope API endpoints is required
  • API usage may incur costs based on your DashScope subscription plan
  • Sparse vectors have only non-zero dimensions stored as dict
  • Output is sorted by indices (keys) in ascending order

Parameter Guidelines:

  • Use encoding_type="query" for search queries and encoding_type="document" for indexed content to optimize asymmetric retrieval tasks.
  • For detailed API specifications, refer to: https://help.aliyun.com/zh/model-studio/text-embedding-synchronous-api

Examples:

>>> # Basic usage for query embedding
>>> from zvec.extension import QwenSparseEmbedding
>>> import os
>>> os.environ["DASHSCOPE_API_KEY"] = "your-api-key"
>>>
>>> query_emb = QwenSparseEmbedding(dimension=1024, encoding_type="query")
>>> query_vec = query_emb.embed("machine learning")
>>> type(query_vec)
<class 'dict'>
>>> len(query_vec)  # Only non-zero dimensions
156
>>> # Document embedding
>>> doc_emb = QwenSparseEmbedding(dimension=1024, encoding_type="document")
>>> doc_vec = doc_emb.embed("Machine learning is a subset of AI")
>>> isinstance(doc_vec, dict)
True
>>> # Asymmetric retrieval example
>>> query_vec = query_emb.embed("what causes aging fast")
>>> doc_vec = doc_emb.embed(
...     "UV-A light causes tanning, skin aging, and cataracts..."
... )
>>>
>>> # Calculate similarity (dot product for sparse vectors)
>>> similarity = sum(
...     query_vec.get(k, 0) * doc_vec.get(k, 0)
...     for k in set(query_vec) | set(doc_vec)
... )
>>> # Output is sorted by indices
>>> list(query_vec.items())[:5]  # First 5 dimensions (by index)
[(10, 0.45), (23, 0.87), (56, 0.32), (89, 1.12), (120, 0.65)]
>>> # Hybrid retrieval (combining dense + sparse)
>>> from zvec.extension import QwenDenseEmbedding
>>> dense_emb = QwenDenseEmbedding(dimension=1024)
>>> sparse_emb = QwenSparseEmbedding(dimension=1024)
>>>
>>> query = "deep learning neural networks"
>>> dense_vec = dense_emb.embed(query)   # [0.1, -0.3, 0.5, ...]
>>> sparse_vec = sparse_emb.embed(query)  # {12: 0.8, 45: 1.2, ...}
>>> # Error handling
>>> try:
...     sparse_emb.embed("")  # Empty string
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
See Also
  • SparseEmbeddingFunction: Base class for sparse embeddings
  • QwenDenseEmbedding: Dense embedding using Qwen API
  • DefaultSparseEmbedding: Sparse embedding with SPLADE model

Initialize the Qwen sparse embedding function.

Parameters:

Name Type Description Default
dimension
int

Target embedding dimension.

required
model
str

DashScope model name. Defaults to "text-embedding-v4".

'text-embedding-v4'
api_key
Optional[str]

API key or None to use environment variable.

None
**kwargs

Additional DashScope API parameters. Supported options: - encoding_type (Literal["query", "document"]): Encoding type. * "query": Optimize for search queries (default). * "document": Optimize for indexed documents. This distinction is important for asymmetric retrieval tasks.

{}

Raises:

Type Description
ValueError

If API key is not provided and not in environment.

Methods:

Name Description
__call__

Make the embedding function callable.

embed

Generate sparse embedding vector for the input text.

Attributes
model property
model: str

str: The DashScope embedding model name currently in use.

extra_params property
extra_params: dict

dict: Extra parameters for model-specific customization.

Functions
__call__
__call__(input: TEXT) -> SparseVectorType

Make the embedding function callable.

embed cached
embed(input: TEXT) -> SparseVectorType

Generate sparse embedding vector for the input text.

This method calls the DashScope TextEmbedding API with sparse output type to convert input text into a sparse vector representation. The result is a dictionary where keys are dimension indices and values are importance weights (only non-zero values included).

The embedding is optimized based on the encoding_type specified during initialization: "query" for search queries or "document" for indexed content.

Parameters:

Name Type Description Default
input TEXT

Input text string to embed. Must be non-empty after stripping whitespace. Maximum length depends on the model used (typically 2048-8192 tokens).

required

Returns:

Name Type Description
SparseVectorType SparseVectorType

A dictionary mapping dimension index to weight. Only non-zero dimensions are included. The dictionary is sorted by indices (keys) in ascending order for consistent output. Example: {10: 0.5, 245: 0.8, 1023: 1.2, 5678: 0.5}

Raises:

Type Description
TypeError

If input is not a string.

ValueError

If input is empty/whitespace-only, or if the API returns an error or malformed response.

RuntimeError

If network connectivity issues or DashScope service errors occur.

Examples:

>>> emb = QwenSparseEmbedding(dimension=1024, encoding_type="query")
>>> sparse_vec = emb.embed("machine learning")
>>> isinstance(sparse_vec, dict)
True
>>>
>>> # Verify sorted output
>>> keys = list(sparse_vec.keys())
>>> keys == sorted(keys)
True
>>> # Error: empty input
>>> emb.embed("   ")
ValueError: Input text cannot be empty or whitespace only
>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int
Note
  • This method is cached (maxsize=10). Identical inputs return cached results.
  • The cache is based on exact string match (case-sensitive).
  • Output dictionary is always sorted by indices for consistency.

QwenFunctionBase

QwenFunctionBase(model: str, api_key: Optional[str] = None)

Base class for Qwen (DashScope) functions.

This base class provides common functionality for calling DashScope APIs and handling responses. It supports embeddings (dense and sparse) and re-ranking operations.

This class is not meant to be used directly. Use concrete implementations: - QwenDenseEmbedding for dense embeddings - QwenSparseEmbedding for sparse embeddings - QwenReRanker for semantic re-ranking

Parameters:

Name Type Description Default
model
str

DashScope model identifier.

required
api_key
Optional[str]

DashScope API authentication key.

None
Note
  • This is an internal base class for code reuse across Qwen features
  • Subclasses should inherit from appropriate Protocol/ABC
  • Provides unified API connection and response handling

Initialize the base Qwen embedding functionality.

Parameters:

Name Type Description Default
model
str

DashScope model name.

required
api_key
Optional[str]

API key or None to use environment variable.

None

Raises:

Type Description
ValueError

If API key is not provided and not in environment.

Attributes:

Name Type Description
model str

str: The DashScope embedding model name currently in use.

Attributes
model property
model: str

str: The DashScope embedding model name currently in use.

Functions

QwenReRanker

QwenReRanker(
    query: Optional[str] = None,
    topn: int = 10,
    rerank_field: Optional[str] = None,
    model: str = "gte-rerank-v2",
    api_key: Optional[str] = None,
)

Bases: QwenFunctionBase, RerankFunction

Re-ranker using Qwen (DashScope) cross-encoder API for semantic re-ranking.

This re-ranker leverages DashScope's TextReRank service to perform cross-encoder style re-ranking. It sends query and document pairs to the API and receives relevance scores based on deep semantic understanding.

The re-ranker is suitable for single-vector or multi-vector search scenarios where semantic relevance to a specific query is required.

Parameters:

Name Type Description Default
query
str

Query text for semantic re-ranking. Required.

None
topn
int

Maximum number of documents to return after re-ranking. Defaults to 10.

10
rerank_field
str

Document field name to use as re-ranking input text. Required (e.g., "content", "title", "body").

None
model
str

DashScope re-ranking model identifier. Defaults to "gte-rerank-v2".

'gte-rerank-v2'
api_key
Optional[str]

DashScope API authentication key. If not provided, reads from DASHSCOPE_API_KEY environment variable.

None

Raises:

Type Description
ValueError

If query is empty/None, rerank_field is None, or API key is not available.

Note
  • Requires dashscope Python package installed
  • Documents without valid content in rerank_field are skipped
  • API rate limits and quotas apply per DashScope subscription
Example

reranker = QwenReRanker( ... query="machine learning algorithms", ... topn=5, ... rerank_field="content", ... model="gte-rerank-v2", ... api_key="your-api-key" ... )

Use in collection.query(reranker=reranker)

Initialize QwenReRanker with query and configuration.

Parameters:

Name Type Description Default
query
Optional[str]

Query text for semantic matching. Required.

None
topn
int

Number of top results to return.

10
rerank_field
Optional[str]

Document field for re-ranking input.

None
model
str

DashScope model name.

'gte-rerank-v2'
api_key
Optional[str]

API key or None to use environment variable.

None

Raises:

Type Description
ValueError

If query is empty or API key is unavailable.

Methods:

Name Description
rerank

Re-rank documents using Qwen's TextReRank API.

Attributes:

Name Type Description
topn int

int: Number of top documents to return after re-ranking.

rerank_field Optional[str]

Optional[str]: Field name used as re-ranking input.

model str

str: The DashScope embedding model name currently in use.

query str

str: Query text used for semantic re-ranking.

Attributes
topn property
topn: int

int: Number of top documents to return after re-ranking.

rerank_field property
rerank_field: Optional[str]

Optional[str]: Field name used as re-ranking input.

model property
model: str

str: The DashScope embedding model name currently in use.

query property
query: str

str: Query text used for semantic re-ranking.

Functions
rerank
rerank(query_results: dict[str, list[Doc]]) -> list[Doc]

Re-rank documents using Qwen's TextReRank API.

Sends document texts to DashScope TextReRank service along with the query. Returns documents sorted by relevance scores from the cross-encoder model.

Parameters:

Name Type Description Default
query_results dict[str, list[Doc]]

Mapping from vector field names to lists of retrieved documents. Documents from all fields are deduplicated and re-ranked together.

required

Returns:

Type Description
list[Doc]

list[Doc]: Re-ranked documents (up to topn) with updated score fields containing relevance scores from the API.

Raises:

Type Description
ValueError

If no valid documents are found or API call fails.

Note
  • Duplicate documents (same ID) across fields are processed once
  • Documents with empty/missing rerank_field content are skipped
  • Returned scores are relevance scores from the cross-encoder model

ReRanker

ReRanker(topn: int = 10, rerank_field: Optional[str] = None)

Bases: ABC

Abstract base class for re-ranking search results.

Re-rankers refine the output of one or more vector queries by applying a secondary scoring strategy. They are used in the query() method of Collection via the reranker parameter.

Parameters:

Name Type Description Default
topn
int

Number of top documents to return after re-ranking. Defaults to 10.

10
rerank_field
Optional[str]

Field name used as input for re-ranking (e.g., document title or body). Defaults to None.

None
Note

Subclasses must implement the rerank() method.

Methods:

Name Description
rerank

Re-rank documents from one or more vector queries.

Attributes:

Name Type Description
topn int

int: Number of top documents to return after re-ranking.

rerank_field Optional[str]

Optional[str]: Field name used as re-ranking input.

Attributes
topn property
topn: int

int: Number of top documents to return after re-ranking.

rerank_field property
rerank_field: Optional[str]

Optional[str]: Field name used as re-ranking input.

Functions
rerank abstractmethod
rerank(query_results: dict[str, list[Doc]]) -> list[Doc]

Re-rank documents from one or more vector queries.

Parameters:

Name Type Description Default
query_results dict[str, list[Doc]]

Mapping from vector field name to list of retrieved documents (sorted by relevance).

required

Returns:

Type Description
list[Doc]

list[Doc]: Re-ranked list of documents (length ≤ topn), with updated score fields.

DefaultLocalDenseEmbedding

DefaultLocalDenseEmbedding(
    model_source: Literal["huggingface", "modelscope"] = "huggingface",
    device: Optional[str] = None,
    normalize_embeddings: bool = True,
    batch_size: int = 32,
    **kwargs
)

Bases: SentenceTransformerFunctionBase, DenseEmbeddingFunction[TEXT]

Default local dense embedding using all-MiniLM-L6-v2 model.

This is the default implementation for dense text embedding that uses the all-MiniLM-L6-v2 model from Hugging Face by default. This model provides a good balance between speed and quality for general-purpose text embedding.

The class provides text-to-vector dense embedding capabilities using the sentence-transformers library. It supports models from Hugging Face Hub and ModelScope, runs locally without API calls, and supports CPU/GPU acceleration.

The model produces 384-dimensional embeddings and is optimized for semantic similarity tasks. It runs locally without requiring API keys.

Parameters:

Name Type Description Default
model_source
Literal['huggingface', 'modelscope']

Model source. - "huggingface": Use Hugging Face Hub (default, for international users) - "modelscope": Use ModelScope (recommended for users in China) Defaults to "huggingface".

'huggingface'
device
Optional[str]

Device to run the model on. Options: "cpu", "cuda", "mps" (for Apple Silicon), or None for automatic detection. Defaults to None.

None
normalize_embeddings
bool

Whether to normalize embeddings to unit length (L2 normalization). Useful for cosine similarity. Defaults to True.

True
batch_size
int

Batch size for encoding. Defaults to 32.

32
**kwargs

Additional parameters for future extension.

{}

Attributes:

Name Type Description
dimension int

Always 384 for both models.

model_name str

"all-MiniLM-L6-v2" (HF) or "iic/nlp_gte_sentence-embedding_chinese-small" (MS).

model_source str

The model source being used.

device str

The device the model is running on.

Raises:

Type Description
ValueError

If the model cannot be loaded or input is invalid.

TypeError

If input to embed() is not a string.

RuntimeError

If model inference fails.

Note
  • Requires Python 3.10, 3.11, or 3.12
  • Requires the sentence-transformers package: pip install sentence-transformers
  • For ModelScope, also requires: pip install modelscope
  • First run downloads the model (~50-80MB) from chosen source
  • Hugging Face cache: ~/.cache/torch/sentence_transformers/
  • ModelScope cache: ~/.cache/modelscope/hub/
  • No API keys or network required after initial download
  • Inference speed: ~1000 sentences/sec on CPU, ~10000 on GPU

For users in China:

If you encounter Hugging Face access issues, use ModelScope instead:

.. code-block:: python

# Recommended for users in China
emb = DefaultLocalDenseEmbedding(model_source="modelscope")

Alternatively, use Hugging Face mirror:

.. code-block:: bash

export HF_ENDPOINT=https://hf-mirror.com
# Then use default Hugging Face mode

Examples:

>>> # Basic usage with Hugging Face (default)
>>> from zvec.extension import DefaultLocalDenseEmbedding
>>>
>>> emb_func = DefaultLocalDenseEmbedding()
>>> vector = emb_func.embed("Hello, world!")
>>> len(vector)
384
>>> isinstance(vector, list)
True
>>> # Recommended for users in China (uses ModelScope)
>>> emb_func = DefaultLocalDenseEmbedding(model_source="modelscope")
>>> vector = emb_func.embed("你好,世界!")  # Works well with Chinese text
>>> len(vector)
384
>>> # Alternative for China users: Use Hugging Face mirror
>>> import os
>>> os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
>>> emb_func = DefaultLocalDenseEmbedding()  # Uses HF mirror
>>> vector = emb_func.embed("Hello, world!")
>>> # Using GPU for faster inference
>>> emb_func = DefaultLocalDenseEmbedding(device="cuda")
>>> vector = emb_func("Machine learning is fascinating")
>>> # Normalized vector has unit length
>>> import numpy as np
>>> np.linalg.norm(vector)
1.0
>>> # Batch processing
>>> texts = ["First text", "Second text", "Third text"]
>>> vectors = [emb_func.embed(text) for text in texts]
>>> len(vectors)
3
>>> all(len(v) == 384 for v in vectors)
True
>>> # Semantic similarity
>>> v1 = emb_func.embed("The cat sits on the mat")
>>> v2 = emb_func.embed("A feline rests on a rug")
>>> v3 = emb_func.embed("Python programming")
>>> similarity_high = np.dot(v1, v2)  # Similar sentences
>>> similarity_low = np.dot(v1, v3)   # Different topics
>>> similarity_high > similarity_low
True
>>> # Error handling
>>> try:
...     emb_func.embed("")  # Empty string
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
See Also
  • DenseEmbeddingFunction: Base class for dense embeddings
  • DefaultLocalSparseEmbedding: Sparse embedding with SPLADE
  • QwenDenseEmbedding: Alternative using Qwen API

Initialize with all-MiniLM-L6-v2 model.

Parameters:

Name Type Description Default
model_source
Literal['huggingface', 'modelscope']

Model source. Defaults to "huggingface".

'huggingface'
device
Optional[str]

Target device ("cpu", "cuda", "mps", or None). Defaults to None (automatic detection).

None
normalize_embeddings
bool

Whether to L2-normalize output vectors. Defaults to True.

True
batch_size
int

Batch size for encoding. Defaults to 32.

32
**kwargs

Additional parameters for future extension.

{}

Raises:

Type Description
ImportError

If sentence-transformers or modelscope is not installed.

ValueError

If model cannot be loaded.

Methods:

Name Description
__call__

Make the embedding function callable.

embed

Generate dense embedding vector for the input text.

Attributes
model_name property
model_name: str

str: The Sentence Transformer model name currently in use.

model_source property
model_source: str

str: The model source being used ("huggingface" or "modelscope").

device property
device: str

str: The device the model is running on.

dimension property
dimension: int

int: The expected dimensionality of the embedding vector.

extra_params property
extra_params: dict

dict: Extra parameters for model-specific customization.

Functions
__call__
__call__(input: str) -> DenseVectorType

Make the embedding function callable.

embed
embed(input: str) -> DenseVectorType

Generate dense embedding vector for the input text.

This method uses the Sentence Transformer model to convert input text into a dense vector representation. The model runs locally without requiring API calls.

Parameters:

Name Type Description Default
input str

Input text string to embed. Must be non-empty after stripping whitespace. Maximum length depends on the model used (typically 128-512 tokens for most models).

required

Returns:

Name Type Description
DenseVectorType DenseVectorType

A list of floats representing the embedding vector. Length equals self.dimension. If normalize_embeddings=True, the vector has unit length. Example: [0.123, -0.456, 0.789, ...]

Raises:

Type Description
TypeError

If input is not a string.

ValueError

If input is empty or whitespace-only.

RuntimeError

If model inference fails.

Examples:

>>> emb = DefaultLocalDenseEmbedding()
>>> vector = emb.embed("Natural language processing")
>>> len(vector)
384
>>> isinstance(vector[0], float)
True
>>> # Normalized vectors have unit length
>>> import numpy as np
>>> emb = DefaultLocalDenseEmbedding(normalize_embeddings=True)
>>> vector = emb.embed("Test sentence")
>>> np.linalg.norm(vector)
1.0
>>> # Error: empty input
>>> emb.embed("   ")
ValueError: Input text cannot be empty or whitespace only
>>> # Error: non-string input
>>> emb.embed(123)
TypeError: Expected 'input' to be str, got int
>>> # Semantic similarity example
>>> v1 = emb.embed("The cat sits on the mat")
>>> v2 = emb.embed("A feline rests on a rug")
>>> similarity = np.dot(v1, v2)  # High similarity due to semantic meaning
>>> similarity > 0.7
True
Note
  • First call may be slower due to model loading
  • Subsequent calls are much faster as the model stays in memory
  • For batch processing, consider encoding multiple texts together (though this method handles single texts only)
  • GPU acceleration provides 5-10x speedup over CPU

DefaultLocalSparseEmbedding

DefaultLocalSparseEmbedding(
    model_source: Literal["huggingface", "modelscope"] = "huggingface",
    device: Optional[str] = None,
    encoding_type: Literal["query", "document"] = "query",
    **kwargs
)

Bases: SentenceTransformerFunctionBase, SparseEmbeddingFunction[TEXT]

Default local sparse embedding using SPLADE model.

This class provides sparse vector embedding using the SPLADE (SParse Lexical AnD Expansion) model. SPLADE generates sparse, interpretable representations where each dimension corresponds to a vocabulary term with learned importance weights. It's ideal for lexical matching, BM25-style retrieval, and hybrid search scenarios.

The default model is naver/splade-cocondenser-ensembledistil, which is publicly available without authentication. It produces sparse vectors with thousands of dimensions but only hundreds of non-zero values, making them efficient for storage and retrieval while maintaining strong lexical matching.

Model Caching:

This class uses class-level caching to share the SPLADE model across all instances with the same configuration (model_source, device). This significantly reduces memory usage when creating multiple instances for different encoding types (query vs document).

Cache Management:

The class provides methods to manage the model cache:

  • clear_cache(): Clear all cached models to free memory
  • get_cache_info(): Get information about cached models
  • remove_from_cache(model_source, device): Remove a specific model from cache

.. note:: Why not use splade-v3?

The newer ``naver/splade-v3`` model is gated (requires access approval).
We use ``naver/splade-cocondenser-ensembledistil`` instead.

**To use splade-v3 (if you have access):**

1. Request access at https://huggingface.co/naver/splade-v3
2. Get your Hugging Face token from https://huggingface.co/settings/tokens
3. Set environment variable:

   .. code-block:: bash

       export HF_TOKEN="your_huggingface_token"

4. Or login programmatically:

   .. code-block:: python

       from huggingface_hub import login
       login(token="your_huggingface_token")

5. To use a custom SPLADE model, you can subclass this class and override
   the model_name in ``__init__``, or create your own implementation
   inheriting from ``SentenceTransformerFunctionBase`` and
   ``SparseEmbeddingFunction``.

Parameters:

Name Type Description Default
model_source
Literal['huggingface', 'modelscope']

Model source. Defaults to "huggingface". ModelScope support may vary for SPLADE models.

'huggingface'
device
Optional[str]

Device to run the model on. Options: "cpu", "cuda", "mps" (for Apple Silicon), or None for automatic detection. Defaults to None.

None
encoding_type
Literal['query', 'document']

Encoding type. - "query": Optimize for search queries (default) - "document": Optimize for indexed documents

'query'
**kwargs

Additional parameters (currently unused, for future extension).

{}

Attributes:

Name Type Description
model_name str

Model identifier.

model_source str

The model source being used.

device str

The device the model is running on.

Raises:

Type Description
ValueError

If the model cannot be loaded or input is invalid.

TypeError

If input to embed() is not a string.

RuntimeError

If model inference fails.

Note
  • Requires Python 3.10, 3.11, or 3.12
  • Requires the sentence-transformers package: pip install sentence-transformers
  • First run downloads the model (~100MB) from Hugging Face
  • Cache location: ~/.cache/torch/sentence_transformers/
  • No API keys or authentication required
  • Sparse vectors have ~30k dimensions but only ~100-200 non-zero values
  • Best combined with dense embeddings for hybrid retrieval

SPLADE vs Dense Embeddings:

  • Dense: Continuous semantic vectors, good for semantic similarity
  • Sparse: Lexical keyword-based, interpretable, good for exact matching
  • Hybrid: Combine both for best retrieval performance

Examples:

>>> # Memory-efficient: both instances share the same model (~200MB)
>>> from zvec.extension import DefaultLocalSparseEmbedding
>>>
>>> # Query embedding
>>> query_emb = DefaultLocalSparseEmbedding(encoding_type="query")
>>> query_vec = query_emb.embed("machine learning algorithms")
>>> type(query_vec)
<class 'dict'>
>>> len(query_vec)  # Only non-zero dimensions
156
>>> # Document embedding (shares model with query_emb)
>>> doc_emb = DefaultLocalSparseEmbedding(encoding_type="document")
>>> doc_vec = doc_emb.embed("Machine learning is a subset of AI")
>>> # Total memory: ~200MB (not 400MB) thanks to model caching
>>> # Asymmetric retrieval example
>>> query_vec = query_emb.embed("what causes aging fast")
>>> doc_vec = doc_emb.embed(
...     "UV-A light causes tanning, skin aging, and cataracts..."
... )
>>>
>>> # Calculate similarity (dot product for sparse vectors)
>>> similarity = sum(
...     query_vec.get(k, 0) * doc_vec.get(k, 0)
...     for k in set(query_vec) | set(doc_vec)
... )
>>> # Batch processing
>>> queries = ["query 1", "query 2", "query 3"]
>>> query_vecs = [query_emb.embed(q) for q in queries]
>>>
>>> documents = ["doc 1", "doc 2", "doc 3"]
>>> doc_vecs = [doc_emb.embed(d) for d in documents]
>>> # Inspecting sparse dimensions (output is sorted by indices)
>>> query_vec = query_emb.embed("machine learning")
>>> list(query_vec.items())[:5]  # First 5 dimensions (by index)
[(10, 0.45), (23, 0.87), (56, 0.32), (89, 1.12), (120, 0.65)]
>>>
>>> # Sort by weight to find most important terms
>>> sorted_by_weight = sorted(query_vec.items(), key=lambda x: x[1], reverse=True)
>>> top_5 = sorted_by_weight[:5]  # Top 5 most important terms
>>> top_5
[(1023, 1.45), (245, 1.23), (8901, 0.98), (5678, 0.87), (12034, 0.76)]
>>> # Using GPU for faster inference
>>> sparse_emb = DefaultLocalSparseEmbedding(device="cuda")
>>> vector = sparse_emb.embed("natural language processing")
>>> # Hybrid retrieval example (combining dense + sparse)
>>> from zvec.extension import DefaultDenseEmbedding
>>> dense_emb = DefaultDenseEmbedding()
>>> sparse_emb = DefaultLocalSparseEmbedding()
>>>
>>> query = "deep learning neural networks"
>>> dense_vec = dense_emb.embed(query)   # [0.1, -0.3, 0.5, ...]
>>> sparse_vec = sparse_emb.embed(query)  # {12: 0.8, 45: 1.2, ...}
>>> # Error handling
>>> try:
...     sparse_emb.embed("")  # Empty string
... except ValueError as e:
...     print(f"Error: {e}")
Error: Input text cannot be empty or whitespace only
>>> # Cache management
>>> # Check cache status
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 1
>>>
>>> # Clear cache to free memory
>>> DefaultLocalSparseEmbedding.clear_cache()
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 0
>>>
>>> # Remove specific model from cache
>>> query_emb = DefaultLocalSparseEmbedding()  # Creates CPU model
>>> cuda_emb = DefaultLocalSparseEmbedding(device="cuda")  # Creates CUDA model
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 2
>>>
>>> # Remove only CPU model
>>> removed = DefaultLocalSparseEmbedding.remove_from_cache(device=None)
>>> print(f"Removed: {removed}")
True
>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 1
See Also
  • SparseEmbeddingFunction: Base class for sparse embeddings
  • DefaultDenseEmbedding: Dense embedding with all-MiniLM-L6-v2
  • QwenDenseEmbedding: Alternative using Qwen API
References
  • SPLADE Paper: https://arxiv.org/abs/2109.10086
  • Model: https://huggingface.co/naver/splade-cocondenser-ensembledistil

Initialize with SPLADE model.

Parameters:

Name Type Description Default
model_source
Literal['huggingface', 'modelscope']

Model source. Defaults to "huggingface".

'huggingface'
device
Optional[str]

Target device ("cpu", "cuda", "mps", or None). Defaults to None (automatic detection).

None
encoding_type
Literal['query', 'document']

Encoding type for embeddings. - "query": Optimize for search queries (default) - "document": Optimize for indexed documents This distinction is important for asymmetric retrieval tasks.

'query'
**kwargs

Additional parameters (reserved for future use).

{}

Raises:

Type Description
ImportError

If sentence-transformers is not installed.

ValueError

If model cannot be loaded.

Note

Multiple instances with the same (model_source, device) configuration will share the same underlying model to save memory. Different instances can use different encoding_type settings while sharing the model.

Model Selection:

Uses naver/splade-cocondenser-ensembledistil instead of the newer naver/splade-v3 because splade-v3 is a gated model requiring Hugging Face authentication. The cocondenser-ensembledistil variant:

  • Does not require authentication or API tokens
  • Is immediately available for all users
  • Provides comparable retrieval performance (~2% difference)
  • Avoids "Access to model is restricted" errors

If you need splade-v3 and have obtained access, you can subclass this class and override the model_name parameter.

Examples:

>>> # Both instances share the same model (saves memory)
>>> query_emb = DefaultLocalSparseEmbedding(encoding_type="query")
>>> doc_emb = DefaultLocalSparseEmbedding(encoding_type="document")
>>> # Only one model is loaded in memory

Methods:

Name Description
clear_cache

Clear all cached SPLADE models from memory.

get_cache_info

Get information about currently cached models.

remove_from_cache

Remove a specific model from cache.

__call__

Make the embedding function callable.

embed

Generate sparse embedding vector for the input text.

Attributes
model_name property
model_name: str

str: The Sentence Transformer model name currently in use.

model_source property
model_source: str

str: The model source being used ("huggingface" or "modelscope").

device property
device: str

str: The device the model is running on.

extra_params property
extra_params: dict

dict: Extra parameters for model-specific customization.

Functions
clear_cache classmethod
clear_cache() -> None

Clear all cached SPLADE models from memory.

This is useful for: - Freeing memory when models are no longer needed - Forcing a fresh model reload - Testing and debugging Examples: >>> # Clear cache to free memory >>> DefaultLocalSparseEmbedding.clear_cache()

>>> # Or in tests to ensure fresh model loading
>>> def test_something():
...     DefaultLocalSparseEmbedding.clear_cache()
...     emb = DefaultLocalSparseEmbedding()
...     # Test with fresh model
get_cache_info classmethod
get_cache_info() -> dict

Get information about currently cached models.

Returns:

Name Type Description
dict dict

Dictionary with cache statistics: - cached_models (int): Number of cached model instances - cache_keys (list): List of cache keys (model_name, model_source, device)

Examples:

>>> info = DefaultLocalSparseEmbedding.get_cache_info()
>>> print(f"Cached models: {info['cached_models']}")
Cached models: 2
>>> print(f"Cache keys: {info['cache_keys']}")
Cache keys: [('naver/splade-cocondenser-ensembledistil', 'huggingface', None),
            ('naver/splade-cocondenser-ensembledistil', 'huggingface', 'cuda')]
remove_from_cache classmethod
remove_from_cache(model_source: str = 'huggingface', device: Optional[str] = None) -> bool

Remove a specific model from cache.

Parameters:

Name Type Description Default
model_source str

Model source ("huggingface" or "modelscope"). Defaults to "huggingface".

'huggingface'
device Optional[str]

Device identifier. Defaults to None.

None

Returns:

Name Type Description
bool bool

True if model was found and removed, False otherwise.

Examples:

>>> # Remove CPU model from cache
>>> removed = DefaultLocalSparseEmbedding.remove_from_cache()
>>> print(f"Removed: {removed}")
True
>>> # Remove CUDA model from cache
>>> removed = DefaultLocalSparseEmbedding.remove_from_cache(device="cuda")
>>> print(f"Removed: {removed}")
True
__call__
__call__(input: str) -> SparseVectorType

Make the embedding function callable.

embed
embed(input: str) -> SparseVectorType

Generate sparse embedding vector for the input text.

This method uses the SPLADE model to convert input text into a sparse vector representation. The result is a dictionary where keys are dimension indices and values are importance weights (only non-zero values included).

The embedding is optimized based on the encoding_type specified during initialization: "query" for search queries or "document" for indexed content.

Parameters:

Name Type Description Default
input str

Input text string to embed. Must be non-empty after stripping whitespace.

required

Returns:

Name Type Description
SparseVectorType SparseVectorType

A dictionary mapping dimension index to weight. Only non-zero dimensions are included. The dictionary is sorted by indices (keys) in ascending order for consistent output. Example: {10: 0.5, 245: 0.8, 1023: 1.2, 5678: 0.5}

Raises:

Type Description
TypeError

If input is not a string.

ValueError

If input is empty or whitespace-only.

RuntimeError

If model inference fails.

Examples:

>>> # Query embedding
>>> query_emb = DefaultLocalSparseEmbedding(encoding_type="query")
>>> query_vec = query_emb.embed("machine learning")
>>> isinstance(query_vec, dict)
True
Note
  • First call may be slower due to model loading
  • Subsequent calls are much faster as the model stays in memory
  • GPU acceleration provides significant speedup
  • Sparse vectors are memory-efficient (only store non-zero values)

SentenceTransformerFunctionBase

SentenceTransformerFunctionBase(
    model_name: str,
    model_source: Literal["huggingface", "modelscope"] = "huggingface",
    device: Optional[str] = None,
)

Base class for Sentence Transformer functions (both dense and sparse).

This base class provides common functionality for loading and managing sentence-transformers models from Hugging Face or ModelScope. It supports both dense models (e.g., all-MiniLM-L6-v2) and sparse models (e.g., SPLADE).

This class is not meant to be used directly. Use concrete implementations: - SentenceTransformerEmbeddingFunction for dense embeddings - SentenceTransformerSparseEmbeddingFunction for sparse embeddings - DefaultDenseEmbedding for default dense embeddings - DefaultSparseEmbedding for default sparse embeddings

Parameters:

Name Type Description Default
model_name
str

Model identifier or local path.

required
model_source
Literal['huggingface', 'modelscope']

Model source.

'huggingface'
device
Optional[str]

Device to run the model on.

None
Note
  • This is an internal base class for code reuse
  • Subclasses should inherit from appropriate Protocol (Dense/Sparse)
  • Provides model loading and management functionality

Initialize the base Sentence Transformer functionality.

Parameters:

Name Type Description Default
model_name
str

Model identifier or local path.

required
model_source
Literal['huggingface', 'modelscope']

Model source.

'huggingface'
device
Optional[str]

Device to run the model on.

None

Raises:

Type Description
ValueError

If model_source is invalid.

Attributes:

Name Type Description
model_name str

str: The Sentence Transformer model name currently in use.

model_source str

str: The model source being used ("huggingface" or "modelscope").

device str

str: The device the model is running on.

Attributes
model_name property
model_name: str

str: The Sentence Transformer model name currently in use.

model_source property
model_source: str

str: The model source being used ("huggingface" or "modelscope").

device property
device: str

str: The device the model is running on.

Functions

DefaultLocalReRanker

DefaultLocalReRanker(
    query: Optional[str] = None,
    topn: int = 10,
    rerank_field: Optional[str] = None,
    model_name: str = "cross-encoder/ms-marco-MiniLM-L6-v2",
    model_source: Literal["huggingface", "modelscope"] = "huggingface",
    device: Optional[str] = None,
    batch_size: int = 32,
)

Bases: SentenceTransformerFunctionBase, RerankFunction

Re-ranker using Sentence Transformer cross-encoder models for semantic re-ranking.

This re-ranker leverages pre-trained cross-encoder models to perform deep semantic re-ranking of search results. It runs locally without API calls, supports GPU acceleration, and works with models from Hugging Face or ModelScope.

Cross-encoder models evaluate query-document pairs jointly, providing more accurate relevance scores than bi-encoder (embedding-based) similarity.

Parameters:

Name Type Description Default
query
str

Query text for semantic re-ranking. Required.

None
topn
int

Maximum number of documents to return after re-ranking. Defaults to 10.

10
rerank_field
Optional[str]

Document field name to use as re-ranking input text. Required (e.g., "content", "title", "body").

None
model_name
str

Cross-encoder model identifier or local path. Defaults to "cross-encoder/ms-marco-MiniLM-L6-v2" (MS MARCO MiniLM). Common options: - "cross-encoder/ms-marco-MiniLM-L6-v2": Lightweight, fast (~80MB, recommended) - "cross-encoder/ms-marco-MiniLM-L12-v2": Better accuracy (~120MB) - "BAAI/bge-reranker-base": BGE Reranker Base (~280MB) - "BAAI/bge-reranker-large": BGE Reranker Large (highest quality, ~560MB)

'cross-encoder/ms-marco-MiniLM-L6-v2'
model_source
Literal['huggingface', 'modelscope']

Model source. Defaults to "huggingface". - "huggingface": Load from Hugging Face Hub - "modelscope": Load from ModelScope (recommended for users in China)

'huggingface'
device
Optional[str]

Device to run the model on. Options: "cpu", "cuda", "mps" (for Apple Silicon), or None for automatic detection. Defaults to None.

None
batch_size
int

Batch size for processing query-document pairs. Larger values speed up processing but use more memory. Defaults to 32.

32

Attributes:

Name Type Description
query str

The query text used for re-ranking.

topn int

Maximum number of documents to return.

rerank_field Optional[str]

Field name used for re-ranking input.

model_name str

The cross-encoder model being used.

model_source str

The model source ("huggingface" or "modelscope").

device str

The device the model is running on.

Raises:

Type Description
ValueError

If query is empty/None, rerank_field is None, or model cannot be loaded.

TypeError

If input types are invalid.

RuntimeError

If model inference fails.

Note
  • Requires Python 3.10, 3.11, or 3.12
  • Requires sentence-transformers package: pip install sentence-transformers
  • For ModelScope support, also requires: pip install modelscope
  • First run downloads the model (~80-560MB depending on model) from chosen source
  • No API keys or network required after initial download
  • Cross-encoders are slower than bi-encoders but more accurate
  • GPU acceleration provides significant speedup (5-10x)

MS MARCO MiniLM-L6-v2 Model (Default):

The default model cross-encoder/ms-marco-MiniLM-L6-v2 is a lightweight and efficient cross-encoder trained on MS MARCO dataset. It provides:

  • Fast inference speed (suitable for real-time applications)
  • Small model size (~80MB, quick to download)
  • Good balance between speed and accuracy
  • Trained on 500K+ query-document pairs
  • Public availability without authentication

For users in China:

If you encounter Hugging Face access issues, use ModelScope instead:

.. code-block:: python

# Recommended for users in China
reranker = SentenceTransformerReRanker(
    query="机器学习算法",
    rerank_field="content",
    model_source="modelscope"
)

Alternatively, use Hugging Face mirror:

.. code-block:: bash

export HF_ENDPOINT=https://hf-mirror.com

Examples:

>>> # Basic usage with default MS MARCO MiniLM model
>>> from zvec.extension import SentenceTransformerReRanker
>>>
>>> reranker = SentenceTransformerReRanker(
...     query="machine learning algorithms",
...     topn=5,
...     rerank_field="content"
... )
>>>
>>> # Use in collection.query()
>>> results = collection.query(
...     data={"vector_field": query_vector},
...     reranker=reranker,
...     topk=20
... )
>>> # Using ModelScope for users in China
>>> reranker = SentenceTransformerReRanker(
...     query="深度学习",
...     topn=10,
...     rerank_field="content",
...     model_source="modelscope"
... )
>>> # Using larger model for better quality
>>> reranker = SentenceTransformerReRanker(
...     query="neural networks",
...     topn=5,
...     rerank_field="content",
...     model_name="BAAI/bge-reranker-large",
...     device="cuda",
...     batch_size=64
... )
>>> # Direct rerank call (for testing)
>>> query_results = {
...     "vector1": [
...         Doc(id="1", score=0.9, fields={"content": "Machine learning is..."}),
...         Doc(id="2", score=0.8, fields={"content": "Deep learning is..."}),
...     ]
... }
>>> reranked = reranker.rerank(query_results)
>>> for doc in reranked:
...     print(f"ID: {doc.id}, Score: {doc.score:.4f}")
ID: 2, Score: 0.9234
ID: 1, Score: 0.8567
See Also
  • RerankFunction: Abstract base class for re-rankers
  • QwenReRanker: Re-ranker using Qwen API
  • RrfReRanker: Multi-vector re-ranker using RRF
  • WeightedReRanker: Multi-vector re-ranker using weighted scores
References
  • MS MARCO Cross-Encoder: https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2
  • BGE Reranker: https://huggingface.co/BAAI/bge-reranker-base
  • Cross-Encoder vs Bi-Encoder: https://www.sbert.net/examples/applications/cross-encoder/README.html

Initialize SentenceTransformerReRanker with query and configuration.

Parameters:

Name Type Description Default
query
Optional[str]

Query text for semantic matching. Required.

None
topn
int

Number of top results to return.

10
rerank_field
Optional[str]

Document field for re-ranking input.

None
model_name
str

Cross-encoder model identifier.

'cross-encoder/ms-marco-MiniLM-L6-v2'
model_source
Literal['huggingface', 'modelscope']

Model source.

'huggingface'
device
Optional[str]

Target device ("cpu", "cuda", "mps", or None).

None
batch_size
int

Batch size for processing query-document pairs.

32

Raises:

Type Description
ValueError

If query is empty or model cannot be loaded.

Methods:

Name Description
rerank

Re-rank documents using Sentence Transformer cross-encoder model.

Attributes
topn property
topn: int

int: Number of top documents to return after re-ranking.

rerank_field property
rerank_field: Optional[str]

Optional[str]: Field name used as re-ranking input.

model_name property
model_name: str

str: The Sentence Transformer model name currently in use.

model_source property
model_source: str

str: The model source being used ("huggingface" or "modelscope").

device property
device: str

str: The device the model is running on.

query property
query: str

str: Query text used for semantic re-ranking.

batch_size property
batch_size: int

int: Batch size for processing query-document pairs.

Functions
rerank
rerank(query_results: dict[str, list[Doc]]) -> list[Doc]

Re-rank documents using Sentence Transformer cross-encoder model.

Evaluates each query-document pair using the cross-encoder model to compute relevance scores. Documents are then sorted by these scores and the top-k results are returned.

Parameters:

Name Type Description Default
query_results dict[str, list[Doc]]

Mapping from vector field names to lists of retrieved documents. Documents from all fields are deduplicated and re-ranked together.

required

Returns:

Type Description
list[Doc]

list[Doc]: Re-ranked documents (up to topn) with updated score fields containing relevance scores from the cross-encoder model.

Raises:

Type Description
ValueError

If no valid documents are found or model inference fails.

Note
  • Duplicate documents (same ID) across fields are processed once
  • Documents with empty/missing rerank_field content are skipped
  • Returned scores are logits from the cross-encoder model
  • Higher scores indicate higher relevance
  • Processing time is O(n) where n is the number of documents

Examples:

>>> reranker = SentenceTransformerReRanker(
...     query="machine learning",
...     topn=3,
...     rerank_field="content"
... )
>>> query_results = {
...     "vector1": [
...         Doc(id="1", score=0.9, fields={"content": "ML basics"}),
...         Doc(id="2", score=0.8, fields={"content": "DL tutorial"}),
...     ]
... }
>>> reranked = reranker.rerank(query_results)
>>> len(reranked) <= 3
True